On Oct 28, 2014, at 10:56 AM, Jeff Morriss <[email protected]> wrote:
> Just catching up on 3 weeks of traffic on the the -commits list...
>
> Is there any reason the remaining ctype.h calls in master shouldn't be
> removed [and the functions put on the prohibited list in checkAPIs.pl]?
The remaining calls in Wireshark proper (I'm leaving the build tools out, at
least for now), at least based on what files are still including ctype.h, are:
in the H.245 dissector, a call to isascii() used to decide whether to
display something as text or hex;
in the S1AP dissector, a call to isalpha(), which is in a loop that is
being used to check whether something should be displayed as a text string;
in file.c, calls to toupper() and tolower() in string matching code;
in wsutil/strnatcmp.c, calls to several functions in the "Perform
'natural order' comparisons of strings in C" routine;
in wsutil/strptime.c, isspace() used when matching white space in an
input string.
In the first two, I *suspect* that what's really intended is "is this printable
ASCII?", in which case both should use g_ascii_isprint(), although if the S1AP
dissector really wants to check for *alphabetic* characters, g_ascii_isalpha()
could be used.
In file.c, I think that code is primarily (and possibly exclusively) used for
the Find function in Wireshark and, for that, if the user requested a
case-insensitive search:
when searching packet summary lines and lines from the detailed
dissection, they might want a search that's case-insensitive, *using the rules
of their locale*, *and treating both the string being searched for and the
strings in which the search is being done as being encoded as UTF-8* (which is
what they both should be), which is a significant change;
when searching raw packet data, making the search automatically "do the
right thing" would be extremely difficult (as the raw packet data might be in
arbitrary encodings, and the only way to determine the encoding of a particular
set of bytes would be to see what encoding was specified when it was dissected)
- currently we support a vague sort of byte-oriented encoding that I guess is
ASCII and a vague sort of 2-byte-oriented encoding that I guess you could think
of as UTF-16 but it never matches anything outside the ASCII range, and maybe
we should just have both matches never match anything outside the ASCII range.
In wsutil/strnatcmp.c, the "natural order" appears, from
http://sourcefrog.net/projects/natsort/
to sort strings such that numbers in the strings are sorted in numerical order:
Computer string sorting algorithms generally don't order strings
containing numbers in the same way that a human would do. Consider:
rfc1.txt
rfc2086.txt
rfc822.txt
It would be more friendly if the program listed the files as
rfc1.txt
rfc822.txt
rfc2086.txt
Filenames sort properly if people insert leading zeros, but they don't
always do that.
The routines in there are used to sort encapsulation type names for "-T" in the
help output from editcap and mergecap; those are all ASCII, so using the
g_ascii_XXX() routines would work. If we want to sort strings that might *not*
be nerd tokens in a natural order, in order to show them to a user, we might
want to do a "dictionary sort", which would be locale-dependent. I'd vote for
stuffing "ascii" into the names of the wsutil/strnatcmp.c routines, to make it
clear that the case-insensitive "natural order" sort routine will always treat
A-Z and a-z as equivalent (including treating "I" and "i" as equivalent, with
neither being equivalent to "İ" or "ı") and will not ever treat anything else
as equivalent (including not, for example, treating "Ä" and "ä" as equivalent),
and, if we ever need a "natural human dictionary order" sort, worrying about
that problem at that point.
___________________________________________________________________________
Sent via: Wireshark-dev mailing list <[email protected]>
Archives: http://www.wireshark.org/lists/wireshark-dev
Unsubscribe: https://wireshark.org/mailman/options/wireshark-dev
mailto:[email protected]?subject=unsubscribe