The problem is the sigma. The black diamond with a question mark is the
UTF-8 REPLACMENT CHARACTER, which
is being inserted twice for the two bytes that make up the character.

There was an issue with UTF-8 sigma and other Greek letters (
https://gitlab.com/wireshark/wireshark/-/issues/17070)
that was fixed in the recently released 3.2.9, 3.4.1, and master, but would
be broken in 3.3.0, where it would appear as that.

A workaround would be to use proto_tree_add_string_format_value() with the
last two parameters "%s" and your string value again,
which ends up bypassing the flawed format_text() function in that version.
Or upgrade or get the patch from that bug.

John Thacker


On Sat, Dec 12, 2020 at 1:43 PM <[email protected]> wrote:

> I create a GString str = “A{Dagger}B{Sigma}C”; (i.e.
> “\x41\xE2\x80\xA0\x42\xCE\xA3\x43” where \xE2\x80\xA0 is Dagger and
> \xCE\xA3 is Sigma).
>
> The Dagger is the correct UTF-8 code (
> https://www.fileformat.info/info/unicode/char/2020/index.htm)
>
> and the Sigma is the correct UTF-8 code (
> https://www.fileformat.info/info/unicode/char/03a3/index.htm).
>
>
>
> I use col_append_lstr(pinfo->cinfo, COL_INFO, str,
> COL_ADD_LSTR_TERMINATOR);
>
> The display is “A{Dagger}B{Sigma}C” where the {Dagger} and {Sigma} are the
> correct visual single characters.
>
>
>
> I use proto_string_add_string(…, str);
>
> The display is
> “A{Dagger}B{black-diamond-with-question-mark}{black-diamond-with-question-mark}C”
> where the {black-diamond-with-question-mark} is the visual single character
> of a black diamond with a question mark (and it is displayed twice).
>
>
>
> So col_append_lstr handles UTF-8 and proto_string_add_string partially
> handles UTF-8.
>
>
>
> How can I get a proto_string_* function that will display UTF-8 correctly
> like col_append_lstr does?
>
> I do not need any string function to validate my UTF-8 bytes (if I make a
> mistake, that’s my problem). I just want a consistent display.
>
>
>
> Environment:
>
> Windows 10 Enterprise (10.0.18363) x64
>
> Microsoft Visual Studio Community 2019 Version 16.7.1
>
> QT v5.15.0 using msvc2019_64
>
> Wireshark 3.3.0 with customer dissector
>
> Wireshark Font Consolas Regular 12.0
> ___________________________________________________________________________
> Sent via:    Wireshark-dev mailing list <[email protected]>
> Archives:    https://www.wireshark.org/lists/wireshark-dev
> Unsubscribe: https://www.wireshark.org/mailman/options/wireshark-dev
>              mailto:[email protected]
> ?subject=unsubscribe
___________________________________________________________________________
Sent via:    Wireshark-dev mailing list <[email protected]>
Archives:    https://www.wireshark.org/lists/wireshark-dev
Unsubscribe: https://www.wireshark.org/mailman/options/wireshark-dev
             mailto:[email protected]?subject=unsubscribe

Reply via email to