On Mar 19, 2018, at 4:53 AM, Roberto Ayuso <[email protected]> wrote:

> Really I mean two fields http.file_data and http,request_uri, both can have 
> non ascii chars but are treated only as ascii on the source code.
> 
> Cannot be added a option to manage that?

The question is whether it *should* be added, i.e. whether it would be correct 
to do so, not whether it *could* be added.  Please read my reply in detail.

The body of an HTTP request or response is *not* necessarily text, so always 
treating it as text is an error, and if it *is* text, either the content type 
specifies the character encoding, or the encoding is the default ASCII 
encoding, so if we *do* treat it as text, we should use the content type, *not* 
a user preference, to control the encoding.

So either it should be an FT_BYTES field, which has no character encoding and 
thus would neither be ASCII nor UTF-8, or the dissector should determine 
whether the content type corresponds to text or not and:

        if it's not text, it should add it as an FT_BYTES version of 
http.file_data;

        if it is text, it should be extracted using whatever the character 
encoding specified by the content type is, and added as an FT_STRING version of 
http.file_data (if it's a character encoding Wireshark currently doesn't 
support, fall back on FT_BYTES).

As for the request URI, are there non-ASCII characters because of 
percent-escaping, or because the URI uses RFC 2047-style indicators, or because 
the sending machine just added octets with the 8th bit set in *some* encoding, 
not necessarily UTF-8?  Those three possibilities would have to be handled in 
different ways.
___________________________________________________________________________
Sent via:    Wireshark-dev mailing list <[email protected]>
Archives:    https://www.wireshark.org/lists/wireshark-dev
Unsubscribe: https://www.wireshark.org/mailman/options/wireshark-dev
             mailto:[email protected]?subject=unsubscribe

Reply via email to