This has been discussed before, but no formal decisions were made on the matter. We need the ability to support non-ASCII character sets in Wireshark, in particular Unicode.
I know of at least two bugs off the top of my head that would be fixed by adding Unicode support in Wireshark (1827 & 1867). Another bug, #1372, is titled "Wireshark doesn't support non-ASCII strings well" and refers to the UTF-16 nature of some Windows file sharing protocol traffic. In that bug's case, our fake Unicode functions are mangling the actual UTF-16 characters. Guy wrote a detailed comment on that bug on Feb '07 on one method we could use to handle arbitrary character sets. After some thought and research, I think it would be best to convert all strings into UTF-8 once read in from disk/network/user and keep them in UTF-8 all the way to display in GTK. Pango writes strings for GTK and uses UTF-8, so GTK in turn uses UTF-8. In fact, Pango blows up if you don't pas it a UTF-8 compatible string. We're only getting by now because UTF-8 and standard ASCII are compatible. I would like to start implementing some Unicode support in Wireshark, but we need to have a consensus first on going this way and how we're going to tackle it. It should be possible to do it incrementally without causing any problems. The GLib documentation on Unicode support is here: http://library.gnome.org/devel/glib/unstable/glib-Unicode-Manipulation.html It offers unichar characters that are always 4 bytes long. Those would be wasteful. It then goes on to describe many UTF8 handling functions that use a typical gchar/char string since multi-byte characters are handled in UTF-8 by using as many bytes as needed to represent the character (1, 2, 3 or 4). Thoughts? Concerns? Steve _______________________________________________ Wireshark-dev mailing list Wireshark-dev@wireshark.org https://wireshark.org/mailman/listinfo/wireshark-dev