Thanks

Really I mean two fields http.file_data and http,request_uri, both can have
non ascii chars but are treated only as ascii on the source code.

Cannot be added a option to manage that?

Best
Roberto.

2018-03-19 9:54 GMT+01:00 Guy Harris <[email protected]>:

> (Don't CC individual developers on messages to wireshark-dev; we're all on
> that list, and we shouldn't be singled out, as none of us individually
> "own" this issue.)
>
> On Mar 18, 2018, at 11:28 PM, Roberto Ayuso <[email protected]>
> wrote:
>
> > I have seen that http dissector only manages content on ASCII, I
> modified the source for my project changing it with ENC_UTF_8 on
> http.request_uri and http.data
> >
> > Can you consider put it as an option on the tshark command line? I have
> no enough skills to do by myself.
>
> For request/response fields and headers:
>
> To quote RFC 7230:
>
>         Historically, HTTP has allowed field content with text in the
> ISO-8859-1 charset [ISO-8859-1], supporting other charsets only through use
> of [RFC2047] encoding.  In practice, most HTTP header field values use only
> a subset of the US-ASCII charset [USASCII].  Newly defined header fields
> SHOULD limit their field values to US-ASCII octets.  A recipient SHOULD
> treat other octets in field content (obs-text) as opaque data.
>
> RFC 2047 is "MIME (Multipurpose Internet Mail Extensions) Part Three:
> Message Header Extensions for Non-ASCII Text", which describes the
> "=?iso-8859-1?q?this=20is=20some=20text?=" mechanism used to encode
> non-ASCII - and not necessarily UTF-8 - text in mail message headers.
>
> So:
>
>         1) There appear to be "extended ASCII" encodings other than UTF-8
> that have been used in HTTP requests and replies, so an option of that sort
> should perhaps allow more than just UTF-8 to be specified as the "default"
> encoding.   (It would be implemented as a preference for the HTTP
> dissector, so it would allow a setting on the command line such as "-o
> http.charset=utf-8", but would also be settable through the GUI in
> Wireshark.)
>
>         2) Are there HTTP headers that are not in ASCII and that don't use
> percent-escaping for the non-ASCII characters?
>
>         3) RFC 3986 seems to be at least suggesting that percent-escape
> sequences in URLs represent UTF-8 encodings of characters (rather than,
> say, ISO 8859-n encodings, for some value of n); if that's the case, it
> would probably be appropriate to display the URL exactly as it appears in
> the message, *but* to also provide, as a separate field, the result of
> unescaping, *if* the result is valid UTF-8.
>
> For the body:
>
> There is no such field as "http.data".  Did you mean "http.file_data", or
> something else?
>
> The Content-Type header should, if the body is text, what character
> encoding is used, e.g.
>
>         Content-Type: text/plain;charset=utf-8
>
> To quote RFC 2046:
>
>         4.1.2.  Charset Parameter
>
>
>            A critical parameter that may be specified in the Content-Type
> field
>            for "text/plain" data is the character set.  This is specified
> with a
>            "charset" parameter, as in:
>
>              Content-type: text/plain; charset=iso-8859-1
>
>            Unlike some other parameter values, the values of the charset
>            parameter are NOT case sensitive.  The default character set,
> which
>            must be assumed in the absence of a charset parameter, is
> US-ASCII.
>
> so if there's no "charset=", the character set must be assumed to be
> ASCII, not UTF-8.
___________________________________________________________________________
Sent via:    Wireshark-dev mailing list <[email protected]>
Archives:    https://www.wireshark.org/lists/wireshark-dev
Unsubscribe: https://www.wireshark.org/mailman/options/wireshark-dev
             mailto:[email protected]?subject=unsubscribe

Reply via email to