Re: [Wireshark-dev] tvb_get_string_enc() doesn't always return valid UTF-8

Evan Huus Mon, 20 Jan 2014 19:46:11 -0800

On Mon, Jan 20, 2014 at 8:27 PM, Guy Harris <[email protected]> wrote:
>
> On Jan 20, 2014, at 1:49 PM, Martin Kaiser <[email protected]> wrote:
>
>> I committed the change to tvb_get_string() in r54864.
>
> I've changed that *not* to map bytes with the 8th bit set to REPLACEMENT 
> CHARACTER for UTF-8 strings.  For UTF-8 strings, we need to do a more 
> complicated check and map invalid octet sequences to REPLACEMENT CHARACTER.  
> (We also need to do some more stuff for UCS-2, UTF-16, and UCS-4.)
>
> tvb_get_string() still treats the string as ASCII.


In which case is dumb search-and-replace of tvb_get_string with
tvb_get_string_enc and ENC_ASCII an easy way to make (part of) the API
transition? We'll still have to audit for dissectors that really meant
ENC_SOMETHING_ELSE (probably ENC_UTF8 in most cases) but it'll be easy
progress without any behavioural changes.

>> I'll have a look at tvb_get_stringz() tomorrow.
>
> I've added that (with the same change *not* to do it for UTF-8 strings).  
> tvb_get_stringz() treats the string as ASCII.
>
> ___________________________________________________________________________
> Sent via:    Wireshark-dev mailing list <[email protected]>
> Archives:    http://www.wireshark.org/lists/wireshark-dev
> Unsubscribe: https://wireshark.org/mailman/options/wireshark-dev
>              mailto:[email protected]?subject=unsubscribe
___________________________________________________________________________
Sent via:    Wireshark-dev mailing list <[email protected]>
Archives:    http://www.wireshark.org/lists/wireshark-dev
Unsubscribe: https://wireshark.org/mailman/options/wireshark-dev
             mailto:[email protected]?subject=unsubscribe

Re: [Wireshark-dev] tvb_get_string_enc() doesn't always return valid UTF-8

Reply via email to