On Apr 4, 2014, at 2:01 PM, Hadriel Kaplan <[email protected]> wrote:

> For protocols which are actually truly UTF-8, I'm planning to just assume 
> treating them as ASCII is ok, because as far as I know the atoi/strtol/etc. 
> functions don't actually care: if they see the ASCII characters for digits 
> (and +/-/etc.) they'll parse it, else not. So any non-ASCII UTF-8 character 
> in the sequence is meaningless to them and they stop parsing at that 
> character.

Yes, the only valid octets in a number in any "extended ASCII" would be:

        0x2b, 0x2d, 0x30, 0x31, 0x32, 0x33, 0x34, 0x35, 0x36, 0x37

        0x38 and 0x39 if the radix is 10 or 16;

        0x41, 0x42, 0x43, 0x44, 0x45, 0x46, 0x61, 0x62, 0x63, 0x64, 0x65, and 
0x66 if the radix is 16;

so anything with the 8th bit set is not valid, meaning that the same routine 
can handle ASCII, ISO 8859-n, various Windows code pages, various Mac code 
pages, and UTF-8 - the actual character encoding is irrelevant, as long as 
ASCII characters are encoded as a single octet having the ASCII code point 
value.

___________________________________________________________________________
Sent via:    Wireshark-dev mailing list <[email protected]>
Archives:    http://www.wireshark.org/lists/wireshark-dev
Unsubscribe: https://wireshark.org/mailman/options/wireshark-dev
             mailto:[email protected]?subject=unsubscribe

Reply via email to