Re: [Wireshark-dev] ctype.h calls

Guy Harris Tue, 28 Oct 2014 12:31:53 -0700

On Oct 28, 2014, at 10:56 AM, Jeff Morriss <[email protected]> wrote:


> Just catching up on 3 weeks of traffic on the the -commits list...
> 
> Is there any reason the remaining ctype.h calls in master shouldn't be 
> removed [and the functions put on the prohibited list in checkAPIs.pl]?

The remaining calls in Wireshark proper (I'm leaving the build tools out, at 
least for now), at least based on what files are still including ctype.h, are:

        in the H.245 dissector, a call to isascii() used to decide whether to 
display something as text or hex;

        in the S1AP dissector, a call to isalpha(), which is in a loop that is 
being used to check whether something should be displayed as a text string;

        in file.c, calls to toupper() and tolower() in string matching code;

        in wsutil/strnatcmp.c, calls to several functions in the "Perform 
'natural order' comparisons of strings in C" routine;

        in wsutil/strptime.c, isspace() used when matching white space in an 
input string.

In the first two, I *suspect* that what's really intended is "is this printable 
ASCII?", in which case both should use g_ascii_isprint(), although if the S1AP 
dissector really wants to check for *alphabetic* characters, g_ascii_isalpha() 
could be used.

In file.c, I think that code is primarily (and possibly exclusively) used for 
the Find function in Wireshark and, for that, if the user requested a 
case-insensitive search:

        when searching packet summary lines and lines from the detailed 
dissection, they might want a search that's case-insensitive, *using the rules 
of their locale*, *and treating both the string being searched for and the 
strings in which the search is being done as being encoded as UTF-8* (which is 
what they both should be), which is a significant change;

        when searching raw packet data, making the search automatically "do the 
right thing" would be extremely difficult (as the raw packet data might be in 
arbitrary encodings, and the only way to determine the encoding of a particular 
set of bytes would be to see what encoding was specified when it was dissected) 
- currently we support a vague sort of byte-oriented encoding that I guess is 
ASCII and a vague sort of 2-byte-oriented encoding that I guess you could think 
of as UTF-16 but it never matches anything outside the ASCII range, and maybe 
we should just have both matches never match anything outside the ASCII range.

In wsutil/strnatcmp.c, the "natural order" appears, from

        http://sourcefrog.net/projects/natsort/

to sort strings such that numbers in the strings are sorted in numerical order:

        Computer string sorting algorithms generally don't order strings 
containing numbers in the same way that a human would do. Consider:

                rfc1.txt
                rfc2086.txt
                rfc822.txt

        It would be more friendly if the program listed the files as

                rfc1.txt
                rfc822.txt
                rfc2086.txt

        Filenames sort properly if people insert leading zeros, but they don't 
always do that.

The routines in there are used to sort encapsulation type names for "-T" in the 
help output from editcap and mergecap; those are all ASCII, so using the 
g_ascii_XXX() routines would work.  If we want to sort strings that might *not* 
be nerd tokens in a natural order, in order to show them to a user, we might 
want to do a "dictionary sort", which would be locale-dependent.  I'd vote for 
stuffing "ascii" into the names of the wsutil/strnatcmp.c routines, to make it 
clear that the case-insensitive "natural order" sort routine will always treat 
A-Z and a-z as equivalent (including treating "I" and "i" as equivalent, with 
neither being equivalent to "İ" or "ı") and will not ever treat anything else 
as equivalent (including not, for example, treating "Ä" and "ä" as equivalent), 
and, if we ever need a "natural human dictionary order" sort, worrying about 
that problem at that point.
___________________________________________________________________________
Sent via:    Wireshark-dev mailing list <[email protected]>
Archives:    http://www.wireshark.org/lists/wireshark-dev
Unsubscribe: https://wireshark.org/mailman/options/wireshark-dev
             mailto:[email protected]?subject=unsubscribe

Re: [Wireshark-dev] ctype.h calls

Reply via email to