Re: [Wireshark-dev] UTF8 vs. locale in error messages (bug 5715)

2011-07-11 Thread Stephen Fisher
On Tue, Jun 28, 2011 at 10:01:14AM -0700, Guy Harris wrote: I don't know what the various terminal emulators for Windows, e.g. cmd.exe, do. The popular SecureCRT terminal emulator defaults to default (same as local system) character encoding, at least on Windows systems. This is not

Re: [Wireshark-dev] UTF8 vs. locale in error messages (bug 5715)

2011-07-11 Thread Guy Harris
On Jul 11, 2011, at 4:00 PM, Stephen Fisher wrote: The popular SecureCRT terminal emulator defaults to default (same as local system) character encoding, at least on Windows systems. This is not compatible with UTF-8 in my experience. Not surprising, given that default/same as local

Re: [Wireshark-dev] UTF8 vs. locale in error messages (bug 5715)

2011-06-29 Thread Graham Bloice
On 28/06/2011 18:27, Guy Harris wrote: On Jun 28, 2011, at 6:10 AM, Stig Bjørlykke wrote: On Tue, Jun 28, 2011 at 2:58 AM, Guy Harris g...@alum.mit.edu wrote: 1) UN*Xes where LANG etc. aren't set to a locale with UTF-8 as the encoding (are you seeing the issue with Norwegian

Re: [Wireshark-dev] UTF8 vs. locale in error messages (bug 5715)

2011-06-29 Thread Guy Harris
On Jun 29, 2011, at 2:37 AM, Graham Bloice wrote: For reference, here's the test executable output on Win7, using the SDK 7.0 build environment (a cmd.prompt): Not surprisingly, it doesn't work. Microsoft introduced Unicode support when they introduced Win32; as they were introducing a new

Re: [Wireshark-dev] UTF8 vs. locale in error messages (bug 5715)

2011-06-29 Thread Stig Bjørlykke
On Tue, Jun 28, 2011 at 7:01 PM, Guy Harris g...@alum.mit.edu wrote: In any case, that means that using strerror() is probably not going to be sufficient to fix the problem.  What we might want to do is use UTF-8 everywhere we can, and, for non-GUI output, convert to the appropriate

Re: [Wireshark-dev] UTF8 vs. locale in error messages (bug 5715)

2011-06-29 Thread Guy Harris
On Jun 29, 2011, at 1:45 PM, Stig Bjørlykke wrote: Ok, what about trying to convert back to locale when output error messages from tshark? Something like the attached patch, maybe? Something like that, but with a g_free() of string afterwards. :-)

Re: [Wireshark-dev] UTF8 vs. locale in error messages (bug 5715)

2011-06-28 Thread Jakub Zawadzki
On Mon, Jun 27, 2011 at 05:58:35PM -0700, Guy Harris wrote: We have about 240 calls to strerror(). ...and, unfortunately, a variant that converts to UTF-8 and is API-compatible is non-trivial, as any version that allocates a buffer for the result of the conversion would leak memory we

Re: [Wireshark-dev] UTF8 vs. locale in error messages (bug 5715)

2011-06-28 Thread Stig Bjørlykke
On Tue, Jun 28, 2011 at 9:35 AM, Jakub Zawadzki darkjames...@darkjames.pl wrote: g_strerror() ? Yes, of course :) Thank you. -- Stig Bjørlykke ___ Sent via:Wireshark-dev mailing list wireshark-dev@wireshark.org

Re: [Wireshark-dev] UTF8 vs. locale in error messages (bug 5715)

2011-06-28 Thread Graham Bloice
On 28/06/2011 01:58, Guy Harris wrote: 2) Windows, where Unicode generally means UTF-16, and APIs that return strings encoded as sequences of octets rather than hexadectets probably return strings in the local code page. Is this a first sighting of a new word hexadectet? Google

Re: [Wireshark-dev] UTF8 vs. locale in error messages (bug 5715)

2011-06-28 Thread Jakub Zawadzki
On Tue, Jun 28, 2011 at 10:14:34AM +0200, Stig Bj?rlykke wrote: On Tue, Jun 28, 2011 at 9:35 AM, Jakub Zawadzki darkjames...@darkjames.pl wrote: g_strerror() ? Yes, of course :) Thank you. no problem ;-) Btw. I know that nowadays I'm the only one who uses non-utf locales on console, but

Re: [Wireshark-dev] UTF8 vs. locale in error messages (bug 5715)

2011-06-28 Thread Stig Bjørlykke
On Tue, Jun 28, 2011 at 12:22 PM, Jakub Zawadzki darkjames...@darkjames.pl wrote: Btw. I know that nowadays I'm the only one who uses non-utf locales on console, but when we print on console (stdout/stderr) I think we should use strerror() from libc, i.e. strerror() which don't recode

Re: [Wireshark-dev] UTF8 vs. locale in error messages (bug 5715)

2011-06-28 Thread Stig Bjørlykke
On Tue, Jun 28, 2011 at 2:58 AM, Guy Harris g...@alum.mit.edu wrote:        1) UN*Xes where LANG etc. aren't set to a locale with UTF-8 as the encoding (are you seeing the issue with Norwegian characters on your system?   If so, what's the setting of LANG?); I only had issues with Norwegian

Re: [Wireshark-dev] UTF8 vs. locale in error messages (bug 5715)

2011-06-28 Thread Guy Harris
On Jun 28, 2011, at 2:25 AM, Graham Bloice wrote: On 28/06/2011 01:58, Guy Harris wrote: 2) Windows, where Unicode generally means UTF-16, and APIs that return strings encoded as sequences of octets rather than hexadectets probably return strings in the local code page. Is this a

Re: [Wireshark-dev] UTF8 vs. locale in error messages (bug 5715)

2011-06-28 Thread Guy Harris
On Jun 28, 2011, at 3:22 AM, Jakub Zawadzki wrote: Btw. I know that nowadays I'm the only one who uses non-utf locales on console, but when we print on console (stdout/stderr) I think we should use strerror() from libc, i.e. strerror() which don't recode message to utf-8. It's more

Re: [Wireshark-dev] UTF8 vs. locale in error messages (bug 5715)

2011-06-28 Thread Guy Harris
On Jun 28, 2011, at 3:33 AM, Stig Bjørlykke wrote: Do we always know where the error message is used? I suspect file_open_error_message() is used both in GUI and tshark. Yes - it's in epan. ___ Sent via:Wireshark-dev

Re: [Wireshark-dev] UTF8 vs. locale in error messages (bug 5715)

2011-06-28 Thread Guy Harris
On Jun 28, 2011, at 6:10 AM, Stig Bjørlykke wrote: On Tue, Jun 28, 2011 at 2:58 AM, Guy Harris g...@alum.mit.edu wrote: 1) UN*Xes where LANG etc. aren't set to a locale with UTF-8 as the encoding (are you seeing the issue with Norwegian characters on your system? If so, what's the

Re: [Wireshark-dev] UTF8 vs. locale in error messages (bug 5715)

2011-06-28 Thread Guy Harris
On Jun 28, 2011, at 10:01 AM, Guy Harris wrote: In any case, that means that using strerror() is probably not going to be sufficient to fix the problem. What we might want to do is use UTF-8 everywhere we can, and, for non-GUI output, convert to the appropriate character encoding -

Re: [Wireshark-dev] UTF8 vs. locale in error messages (bug 5715)

2011-06-28 Thread Guy Harris
On Jun 28, 2011, at 10:27 AM, Guy Harris wrote: when putting them into a textual representation of the protocol tree or into columns or something else to be shown to humans, map them to UTF-8, with anything that can't be mapped to UTF-8 - including, if the encoding is putatively

Re: [Wireshark-dev] UTF8 vs. locale in error messages (bug 5715)

2011-06-28 Thread Guy Harris
On Jun 28, 2011, at 10:43 AM, Guy Harris wrote: On Jun 28, 2011, at 10:27 AM, Guy Harris wrote: when putting them into a textual representation of the protocol tree or into columns or something else to be shown to humans, map them to UTF-8, with anything that can't be mapped to

Re: [Wireshark-dev] UTF8 vs. locale in error messages (bug 5715)

2011-06-28 Thread Guy Harris
On Jun 28, 2011, at 10:27 AM, Guy Harris wrote: We have an issue regarding strings in packets in general. Strings might be in a number of encodings, including ASCII (meaning that any byte with the 8th bit set is something that shouldn't be there), other national variants of ISO 646,

Re: [Wireshark-dev] UTF8 vs. locale in error messages (bug 5715)

2011-06-28 Thread Stig Bjørlykke
On Tue, Jun 28, 2011 at 7:27 PM, Guy Harris g...@alum.mit.edu wrote: OK, what OS are you using? Snow:~ stig$ uname -a Darwin Snow.local 10.8.0 Darwin Kernel Version 10.8.0: Tue Jun 7 16:33:36 PDT 2011; root:xnu-1504.15.3~1/RELEASE_I386 i386 Snow:~ stig$ echo $LANG Snow:~ stig$ gcc norsk.c -o

Re: [Wireshark-dev] UTF8 vs. locale in error messages (bug 5715)

2011-06-28 Thread Guy Harris
On Jun 28, 2011, at 12:25 PM, Stig Bjørlykke wrote: On Tue, Jun 28, 2011 at 7:27 PM, Guy Harris g...@alum.mit.edu wrote: OK, what OS are you using? Snow:~ stig$ uname -a Darwin ... Well, that answers *that* question. :-) So the locale's encoding should probably be UTF-8, given that it's

Re: [Wireshark-dev] UTF8 vs. locale in error messages (bug 5715)

2011-06-28 Thread Stig Bjørlykke
On Tue, Jun 28, 2011 at 9:37 PM, Guy Harris g...@alum.mit.edu wrote: However, if LANG is blank, you presumably don't have Terminal set up to Set local enviornment variables on startup (Preferences Settings Advanced, at the bottom); Actually I have Set local environment variables on startup

[Wireshark-dev] UTF8 vs. locale in error messages (bug 5715)

2011-06-27 Thread Stig Bjørlykke
Hi. When looking at bug 5715 I found that we use both UTF8 (from file names) and locale (from strerror()) in the error messages presented from simple_dialog(). In vsimple_dialog() we convert all messages with g_locale_to_utf8(), which will wrongly convert the file name (like in the bug report).

Re: [Wireshark-dev] UTF8 vs. locale in error messages (bug 5715)

2011-06-27 Thread Guy Harris
On Jun 27, 2011, at 11:54 AM, Stig Bjørlykke wrote: When looking at bug 5715 I found that we use both UTF8 (from file names) and locale (from strerror()) in the error messages presented from simple_dialog(). In vsimple_dialog() we convert all messages with g_locale_to_utf8(), which will