On Dec 4, 2025, at 11:22 AM, Michael Richardson <[email protected]> wrote:
> Guy, we have this lovely table in pcap.c:
>
> static struct dlt_choice dlt_choices[] = {
> DLT_CHOICE(NULL, "BSD loopback"),
> DLT_CHOICE(EN10MB, "Ethernet"),
> DLT_CHOICE(EN3MB, "experimental Ethernet (3Mb/s)"),
> DLT_CHOICE(AX25, "AX.25 layer 2"),
> DLT_CHOICE(PRONET, "Proteon ProNET Token Ring"),
> ...
>
> I feel like it ought to be indexed by LINKTYPE instead.
I think the capture code in tcpdump, which I think later became the separate
libpcap library, originally just supported the BPF capture mechanism, which
used DLT_ values to indicate link-layer types. Thus, libpcap used DLT_ values
in its APIs; there *were* no LINKTYPE_ values.
Unfortunately, when they added new link-layer types to BPF, various OSes that
picked up BPF sometimes chose values such that the same numerical value
corresponded to *different* DLT_ names and the same DLT_ name had *different*
values in different OSes.
Equally unfortunately, pcap files used DLT_ values to indicate the link-layer
type, meaning that a file captured using one of the offending DLT_ types would
not be read correctly on machines with different choices.
So I cooked up the LINKTYPE_ list.
For the original DLT_ assignments, which everybody left alone (and for which,
in several cases, ARP hardware values were used, hence separate "Ethernet" and
"IEEE 802" values, the latter of which was repurposed for 802.5), the LINKTYPE_
value was the same as the DLT_ value. (I changed the *name* for type 1 to
LINKTYPE_ETHERNET; DLT_EN10MB was named to distinguish it from DLT_EN3MB, where
the former is D/I/X Ethernet and the latter is the Xerox experimental Ethernet,
the link-layer headers and link-layer types for which are different.)
For other DLT_ assignments that weren't all over the map, I again went with
LINKTYPE_ = DLT_.
For the inconsistent DLT_ values, I assigned a separate LINKTYPE_ value, in the
"100 and above" range.
Libpcap has internal routines to map between LINKTYPE_ values and DLT_ values -
dlt_to_linktype() and linktype_to_dlt(), in pcap-common.c; they're algorithmic
rather than purely table-driven.
Libpcap currently doesn't expose LINKTYPE_ values; they're #defined inside
pcap-common.c.
Making *existing* routines either accept LINKTYPE_ values rather than DLT_
values or return LINKTYPE_ rather than DLT_ values can break binary
compatibility for those values where LINKTYPE_XXX != DLT_XXX.
> (This came up as I poked someone about better references for the many JUNIPER
> entries)
> Was there a table that did DLT<->LINKTYPE? (I realize it's not always 1:1).
Table, no. As noted, the conversion routines are algorithmic. That's easier to
maintain, as, for the vast majority of LINKTYPE_ values, LINKTYPE_XXX =
DLT_XXX. The code handles the exceptions on a case-by-case basis.
> Maybe dlt_choice should have both values in the table.
dlt_choice() is just used to map between DLT_ values and DLT_ names - and to
map DLT_ values to DLT_ descriptions, for pcap_datalink_name_to_val(),
pcap_datalink_val_to_name(), and pcap_datalink_val_to_description().
Mapping between DLT_ values and LINKTYPE_ values is a separate operation, and
is done solely inside libpcap when reading or writing pcap or pcapng files:
LINKTYPE_ values in files are mapped to DLT_ values when reading
(unknown LINKTYPE_ values are passed through, in case they're really DLT_
values from before LINKTYPE_ value were used);
DLT_ values are mapped to LINKTYPE_ values when writing.
> (as an aside, I wonder if pcapint_strcasecmp() is still needed in 2025, given
> UTF-8, etc.
pcapint_strcasecmp() is used only to compare against ASCII strings. There are
cases where, for user convenience, we do case-insensitive mapping, so that both
upper-case and lower-case versions of said ASCII strings work.
It doesn't care about non-ASCII characters; it just leaves them alone (see
below).
It exists to 1) make sure the mapping is *locale-independent* (Wireshark, which
is linked with GLib from the GTK/GNOME project, uses g_ascii_strcasecmp() for
the same purpose), and 2) deal with platform that don't have strcasecmp().
Locale-independence is necessary because, in a Turkish locale, capital-I is
mapped to lower-case ı (LATIN SMALL LETTER DOTLESS i) and lower-case-i is
mapped to upper-case İ (LATIN CAPITAL LETTER I WITH DOT ABOVE).
Not using g_ascii_strcasecmp() in Wireshark caused a *crashing bug* in
Wireshark in a Turkish locale (it was in code that was parsing some text
configuration file; the problem was that a keyword was being compared with
strcasecmp(), and the keyword contained the Roman-alphabet "i", and the match
failed in a Turkish locale when case-insensitivity was required, and that was
compounded by a null pointer being returned in the mismatch case and the
validity of the pointer *not* being checked).
After fixing both 1) the case-insensitive comparison and 2) the lack of a
null-pointer check, I've remembered that quirk.
https://en.wikipedia.org/wiki/Dotless_I
https://en.wikipedia.org/wiki/%C4%B0 (capital dotted-i)
https://en.wikipedia.org/wiki/Dotted_and_dotless_I_in_computing
> I guess that the upper-128 of that chatmap table is Latin-1? yet it
> seems to map the upper-control codes to... I'm not sure what.
The lower 128 positions are, obviously, for ASCII. (Anybody who wants to port
libpcap to, say, z/OS, with APIs that accept EBCDIC strings, is on their own.)
The only mapping they do is to map upper case letters to the corresponding
lower-case letters.
The upper 128 don't do any mapping, so they leave non-ASCII ISO 8859-n
characters, non-ASCII UTF-8 characters (which are up entirely of octets with
the high bit set), etc. alone.
> Did we need to map å -> a?.
No. This is only case-insensitive, not diacritic-insensitive.
> If not, wonder why charmap is 256 entries)
So that we can just use the mapped value for all 256 octet values.
_______________________________________________
tcpdump-workers mailing list -- [email protected]
To unsubscribe send an email to [email protected]
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s