[tcpdump-workers] Re: Accurate ECN support in tcpdump/libpcap

Denis Ovsienko Fri, 06 Feb 2026 08:40:59 -0800

Helo all.

So, I have spent a few weeks researching this direction in more detail,
you can find the current results below.




On Wed, 12 Nov 2025 18:52:20 +0000
Denis Ovsienko <[email protected]> wrote:

[...]
> With this in mind, one potential solution could be a new arithmetic
> expression, something that would work similarly to the existing
> "length" and would be recognisable as TCP header flags.  Let's call it
> "tcphf" for the sake of comparison.  Then the following would be valid
> regular arithmetic expressions that evaluate to an integer in the
> range [0x000, 0x1FF] ([0b000000000, 0b111111111]):
> 
> * "tcphf" -- same as "tcp[12:2] & 0x1FF"
> * "tcphf & tcp-fin" -- same as "tcp[13] & tcp-fin"
> * "tcphf & tcp-syn" -- same as "tcp[13] & tcp-syn"
> * "tcphf & tcp-rst" -- same as "tcp[13] & tcp-rst"
> * "tcphf & tcp-push" -- same as "tcp[13] & tcp-push"
> * "tcphf & tcp-ack" -- same as "tcp[13] & tcp-ack"
> * "tcphf & tcp-urg" -- same as "tcp[13] & tcp-urg"
> * "tcphf & tcp-ece" -- same as "tcp[13] & tcp-ece"
> * "tcphf & tcp-cwr" -- same as "tcp[13] & tcp-cwr"
> * "tcphf & tcp-ae" -- same as "tcp[12] & tcp-ae"
> * "tcphf & (tcp-syn | tcp-ack) != 0" -- true iff either SYN or ACK is
>   set
> * "tcphf & (tcp-fin | tcp-rst) == 0" -- true iff neither FIN nor RST
> is set
> * "tcphf & (tcp-ece | tcp-cwr) == (tcp-ece | tcp-cwr)" -- true iff
> both ECE and CWR are set
> 
> This would be not perfect, but certainly as convenient (or not) as the
> established bitwise syntax for "tcp[tcpflags]".
> 
> To manage the forward compatibility of this, it would take to declare
> that "tcphf" means a bitmask that is the bitwise AND of all named TCP
> flags, that is, if some hypothetical future "tcp-abc" does not resolve
> to a number in a particular version of libpcap, there is no point in
> ANDing the raw binary flag value with "tcphf" because that would
> quetly fail to match.  In other words, "tcphf", if used with named
> flags, would always either work as expected or fail to compile.

This is a work in progress.

I tried modelling the arithmetic "tcphf" after tcp[] and realised that
tcp[] as it is now is not a good model because it has always been
IPv4-only (for the avoidance of doubt, the current tcp[tcpflags] is a
case of the same problem).  Then I prototyped a draft of IPv6 support in
tcp[] -- it works in principle, but would need to be done right before
the arithmetic "tcphf" could reproduce the method.  This does not look
a good fit, but before deciding on the final syntax for TCP flags it
seems worthwhile to research this direction a little bit more.

> Since TCP header flags are often tested as a set, a slightly more
> generic potential solution would be using the less known, but
> pre-existing "value list" syntax, which means the primitive is true if
> any of the given values matches):
> 
> * "tcphf tcp-fin" -- true iff the flag is set
> * "tcphf tcp-syn" -- true iff the flag is set
> * "tcphf tcp-rst" -- true iff the flag is set
> * "tcphf tcp-push" -- true iff the flag is set
> * "tcphf tcp-ack" -- true iff the flag is set
> * "tcphf tcp-urg" -- true iff the flag is set
> * "tcphf tcp-ece" -- true iff the flag is set
> * "tcphf tcp-cwr" -- true iff the flag is set
> * "tcphf tcp-ae" -- true iff the flag is set
> * "tcphf (tcp-syn or tcp-ack)" -- true iff at least one of SYN or ACK
> is set
> * "not tcphf (tcp-fin or tcp-rst)" -- true iff neither FIN nor RST is
>   set
> * "tcphf tcp-ece and tcphf tcp-cwr" -- true iff both ECE and CWR are
> set
> 
> An advantage of this is that the syntax does not allow mixing the
> "not" with the list values, which eliminates a space for confusion.  A
> disadvantage of this could be a possibility to specify ORed flag bits
> as list values:
> 
> * "tcphf (0x0f or 0xf0)" -- ?
> 
> Would it mean a multiple-bit value is an illegal argument, or all set
> bits in a list value must match, or at least one set bits in a list
> value must match?

I am not going to prototype this syntax because it does not look good
even on paper.

Besides the above ambiguities, this would also be a case of the "not"
caveat:

tcphf not tcp-fin

would seem to mean "IPv4/IPv6 TCP packets with FIN flag cleared".
However, it would actually mean:

not tcphf tcp-fin

which is the same (assuming IPv4+IPv6 implementation) as "(not IPv4 or
(is an IPv4 fragment and is not the first fragment) or is not TCP or
TCP flag FIN is cleared) or (not IPv6 or is not TCP or TCP flag FIN is
cleared)", which is obviously too different from the expected behaviour.

> A more generic potential solution could be introducing a new /type/
> qualifier, making it valid for certain values of /proto/ qualifiers
> including "tcp", but not for any explicit /dir/ qualifiers.  The
> identifier for this regular primitive would be an integer, that is, a
> bitmask:
> 
> * "tcp flags tcp-fin" -- true iff the flag is set
> * "tcp flags tcp-syn" -- true iff the flag is set
> * "tcp flags tcp-rst" -- true iff the flag is set
> * "tcp flags tcp-push" -- true iff the flag is set
> * "tcp flags tcp-ack" -- true iff the flag is set
> * "tcp flags tcp-urg" -- true iff the flag is set
> * "tcp flags tcp-ece" -- true iff the flag is set
> * "tcp flags tcp-cwr" -- true iff the flag is set
> * "tcp flags tcp-ae" -- true iff the flag is set
> * "tcp flags tcp-syn or tcp-ack" -- true iff at least one of SYN and
>   ACK is set
> * "tcp flags tcp-syn | tcp-ack" -- ?
> * "not tcp flags tcp-fin | tcp-rst" -- ?
> * "tcp flags tcp-ece and tcp-cwr -- true iff both ECE and CWR are set
> * "tcp flags tcp-ece & tcp-cwr -- formally true iff no flags set, but
>   in practice most likely a user error
> 
> In this case, if the bitmask comprises more than one TCP header flag,
> the meaning would depend on (and would not be immediately obvious)
> whether "tcp flags NUM" tests for any bit set ("tcp[12:2] & 0x1ff &
> NUM != 0") or all bits set ("tcp[12:2] & 0x1ff & NUM == NUM").

I am not going to prototype this syntax because it comes with the same
problem space as the above.

> Another potential syntax of the above could be using a string for the
> identifier, which in this case would mean the flag names would be
> scoped and would not need to keep the "tcp-" prefix:
> 
> * "tcp flag fin" -- true iff the flag is set
> * "tcp flag syn" -- true iff the flag is set
> * "tcp flag rst" -- true iff the flag is set
> * "tcp flag push" -- true iff the flag is set
> * "tcp flag ack" -- true iff the flag is set
> * "tcp flag urg" -- true iff the flag is set
> * "tcp flag ece" -- true iff the flag is set
> * "tcp flag cwr" -- true iff the flag is set
> * "tcp flag ae" -- true iff the flag is set
> * "tcp flag syn or tcp flag ack" -- true iff at least one of SYN and
>   ACK is set, equivalent to "tcp flag syn or ack"
> * "not (tcp flag fin or rst)" -- true iff neither FIN nor
>   RST is set, unfortunately, in the established grammar this would be
>   equivalent to "not tcp flag fin and not tcp flag rst", but not to
>   "not tcp flag fin or rst", which is a know and documented
> peculiarity
> * "tcp flag ece and tcp flag cwr" -- true iff both ECE and CWR are
> set, equivalent to "tcp flag ece and cwr"
> 
> Using this approach, managing the forward compatibility would be as
> simple as recognising (or not) specific strings as the flag names
> (i.e. "tcp flag abc" would be invalid syntax and there would be no
> syntax to specify a numeric value to try working around that, whether
> successfully or not).
> 
> Speaking of "tcp flag ID" or "tcp flags NUM" with regard to other
> existing protocol names and index operations, "ip" and "igrp"
> potentially could also be a part of the same solution space, but I do
> not immediately see any other protocols that could use it.

I have studied and prototyped this syntax as much as is practicable
without external input, the prototype can be seen in libpcap pull
request 1621.

The implementation is straightforward; originally it was not: the
[first since 2005] new type qualifier exposed some technical debt in
the interface between the grammar and the generator, but this has been
addressed in the master branch already.  It trivially extends to IPv4
header flags (which the implementation includes as well) and
potentially EIGRP or PGM (if required in future).  The flag names are
specific to the protocol and opaque to the grammar.

However, the syntax required more work.  Because in this primitive the
ID is a string, this syntax is exempt from the problem space of bitwise
arithmetic expressions and integers (whether explicit or named).
However, it is not exempt from the problem space of "not", for example,

tcp flag not fin

would mean "(not IPv4 or (is an IPv4 fragment...", as noted above.  Thus
to keep this syntax as surprise-free as possible, there needs to be a
way to specify that the negation applies to the flag state only rather
than the entire primitive.  It would be nice to be able to use

tcp flag !fin

except in the lexer "not" is an alias for "!", so in the grammar it
would be exactly the same as the above.  I considered a few other syntax
possibilities and found the most sense in joining the flag and its state
into a single string.  To that end, examples of the alternatives I have
considered are:

* "not-fin" and "no-fin" (easy to confuse with "not fin"),
* "fin-0" (is 0 the bit value or the bit position?),
* "fin-unset" ("unset" is a verb, not a state, and could be confused
  with an assignment), and
* "~fin" ("~" is not a valid bitwise unary operator in the current
  implementation, but it may be introduced later, also there is an
  ambiguity to whether besides FIN flag cleared it means all other
  flags set).

After quite a few drafts the prototype eventually became this:

       ip flag flagstate
              True if the packet is an  IPv4  packet  and  the  IPv4
              header  flag (MF or DF) is set (if flagstate is one of
              {mf-set, df-set}) or cleared (if flagstate is  one  of
              {mf-cleared,  df-cleared}).   The  correct way to test
              for a cleared flag is by using  the  -cleared  suffix;
              for example,
                   ip flag df-cleared
              correctly  does  not  match  packets that are not IPv4
              packets, but
                   ip flag not df-set
              does (correctly from the grammar perspective, but usu‐
              ally  incorrectly from the use case perspective) match
              non-IPv4 packets because it means the same as
                   not (ip and ip flag df-set)

       tcp flag flagstate
              True if the packet is an IPv4/IPv6 TCP packet and  the
              TCP  header  flag  (FIN, SYN, RST, PSH, ACK, URG, ECE,
              CWR or AE) is set (if flagstate is  one  of  {fin-set,
              syn-set,  rst-set, psh-set, ack-set, urg-set, ece-set,
              cwr-set, ae-set}) or cleared (if flagstate is  one  of
              {fin-cleared,  syn-cleared,  rst-cleared, psh-cleared,
              ack-cleared,  urg-cleared,  ece-cleared,  cwr-cleared,
              ae-cleared}).   For  IPv4  this also verifies that the
              datagram is the first fragment or is  not  fragmented.
              For  the same reasons as the above, the correct way to
              test for a cleared flag is by using the -cleared  suf‐
              fix, for example:
                   tcp flag syn-cleared and ack-cleared

If anybody is interested to experiment with this and to provide
feedback, the pull request is going to stay open for a couple weeks.  If
everything goes well, before long the "tcphf" research above will be
done and the final syntax will settle before libpcap 1.11.0.

-- 
    Denis Ovsienko
_______________________________________________
tcpdump-workers mailing list -- [email protected]
To unsubscribe send an email to [email protected]
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[tcpdump-workers] Re: Accurate ECN support in tcpdump/libpcap

Reply via email to