Helo all. So, I have spent a few weeks researching this direction in more detail, you can find the current results below.
On Wed, 12 Nov 2025 18:52:20 +0000 Denis Ovsienko <[email protected]> wrote: [...] > With this in mind, one potential solution could be a new arithmetic > expression, something that would work similarly to the existing > "length" and would be recognisable as TCP header flags. Let's call it > "tcphf" for the sake of comparison. Then the following would be valid > regular arithmetic expressions that evaluate to an integer in the > range [0x000, 0x1FF] ([0b000000000, 0b111111111]): > > * "tcphf" -- same as "tcp[12:2] & 0x1FF" > * "tcphf & tcp-fin" -- same as "tcp[13] & tcp-fin" > * "tcphf & tcp-syn" -- same as "tcp[13] & tcp-syn" > * "tcphf & tcp-rst" -- same as "tcp[13] & tcp-rst" > * "tcphf & tcp-push" -- same as "tcp[13] & tcp-push" > * "tcphf & tcp-ack" -- same as "tcp[13] & tcp-ack" > * "tcphf & tcp-urg" -- same as "tcp[13] & tcp-urg" > * "tcphf & tcp-ece" -- same as "tcp[13] & tcp-ece" > * "tcphf & tcp-cwr" -- same as "tcp[13] & tcp-cwr" > * "tcphf & tcp-ae" -- same as "tcp[12] & tcp-ae" > * "tcphf & (tcp-syn | tcp-ack) != 0" -- true iff either SYN or ACK is > set > * "tcphf & (tcp-fin | tcp-rst) == 0" -- true iff neither FIN nor RST > is set > * "tcphf & (tcp-ece | tcp-cwr) == (tcp-ece | tcp-cwr)" -- true iff > both ECE and CWR are set > > This would be not perfect, but certainly as convenient (or not) as the > established bitwise syntax for "tcp[tcpflags]". > > To manage the forward compatibility of this, it would take to declare > that "tcphf" means a bitmask that is the bitwise AND of all named TCP > flags, that is, if some hypothetical future "tcp-abc" does not resolve > to a number in a particular version of libpcap, there is no point in > ANDing the raw binary flag value with "tcphf" because that would > quetly fail to match. In other words, "tcphf", if used with named > flags, would always either work as expected or fail to compile. This is a work in progress. I tried modelling the arithmetic "tcphf" after tcp[] and realised that tcp[] as it is now is not a good model because it has always been IPv4-only (for the avoidance of doubt, the current tcp[tcpflags] is a case of the same problem). Then I prototyped a draft of IPv6 support in tcp[] -- it works in principle, but would need to be done right before the arithmetic "tcphf" could reproduce the method. This does not look a good fit, but before deciding on the final syntax for TCP flags it seems worthwhile to research this direction a little bit more. > Since TCP header flags are often tested as a set, a slightly more > generic potential solution would be using the less known, but > pre-existing "value list" syntax, which means the primitive is true if > any of the given values matches): > > * "tcphf tcp-fin" -- true iff the flag is set > * "tcphf tcp-syn" -- true iff the flag is set > * "tcphf tcp-rst" -- true iff the flag is set > * "tcphf tcp-push" -- true iff the flag is set > * "tcphf tcp-ack" -- true iff the flag is set > * "tcphf tcp-urg" -- true iff the flag is set > * "tcphf tcp-ece" -- true iff the flag is set > * "tcphf tcp-cwr" -- true iff the flag is set > * "tcphf tcp-ae" -- true iff the flag is set > * "tcphf (tcp-syn or tcp-ack)" -- true iff at least one of SYN or ACK > is set > * "not tcphf (tcp-fin or tcp-rst)" -- true iff neither FIN nor RST is > set > * "tcphf tcp-ece and tcphf tcp-cwr" -- true iff both ECE and CWR are > set > > An advantage of this is that the syntax does not allow mixing the > "not" with the list values, which eliminates a space for confusion. A > disadvantage of this could be a possibility to specify ORed flag bits > as list values: > > * "tcphf (0x0f or 0xf0)" -- ? > > Would it mean a multiple-bit value is an illegal argument, or all set > bits in a list value must match, or at least one set bits in a list > value must match? I am not going to prototype this syntax because it does not look good even on paper. Besides the above ambiguities, this would also be a case of the "not" caveat: tcphf not tcp-fin would seem to mean "IPv4/IPv6 TCP packets with FIN flag cleared". However, it would actually mean: not tcphf tcp-fin which is the same (assuming IPv4+IPv6 implementation) as "(not IPv4 or (is an IPv4 fragment and is not the first fragment) or is not TCP or TCP flag FIN is cleared) or (not IPv6 or is not TCP or TCP flag FIN is cleared)", which is obviously too different from the expected behaviour. > A more generic potential solution could be introducing a new /type/ > qualifier, making it valid for certain values of /proto/ qualifiers > including "tcp", but not for any explicit /dir/ qualifiers. The > identifier for this regular primitive would be an integer, that is, a > bitmask: > > * "tcp flags tcp-fin" -- true iff the flag is set > * "tcp flags tcp-syn" -- true iff the flag is set > * "tcp flags tcp-rst" -- true iff the flag is set > * "tcp flags tcp-push" -- true iff the flag is set > * "tcp flags tcp-ack" -- true iff the flag is set > * "tcp flags tcp-urg" -- true iff the flag is set > * "tcp flags tcp-ece" -- true iff the flag is set > * "tcp flags tcp-cwr" -- true iff the flag is set > * "tcp flags tcp-ae" -- true iff the flag is set > * "tcp flags tcp-syn or tcp-ack" -- true iff at least one of SYN and > ACK is set > * "tcp flags tcp-syn | tcp-ack" -- ? > * "not tcp flags tcp-fin | tcp-rst" -- ? > * "tcp flags tcp-ece and tcp-cwr -- true iff both ECE and CWR are set > * "tcp flags tcp-ece & tcp-cwr -- formally true iff no flags set, but > in practice most likely a user error > > In this case, if the bitmask comprises more than one TCP header flag, > the meaning would depend on (and would not be immediately obvious) > whether "tcp flags NUM" tests for any bit set ("tcp[12:2] & 0x1ff & > NUM != 0") or all bits set ("tcp[12:2] & 0x1ff & NUM == NUM"). I am not going to prototype this syntax because it comes with the same problem space as the above. > Another potential syntax of the above could be using a string for the > identifier, which in this case would mean the flag names would be > scoped and would not need to keep the "tcp-" prefix: > > * "tcp flag fin" -- true iff the flag is set > * "tcp flag syn" -- true iff the flag is set > * "tcp flag rst" -- true iff the flag is set > * "tcp flag push" -- true iff the flag is set > * "tcp flag ack" -- true iff the flag is set > * "tcp flag urg" -- true iff the flag is set > * "tcp flag ece" -- true iff the flag is set > * "tcp flag cwr" -- true iff the flag is set > * "tcp flag ae" -- true iff the flag is set > * "tcp flag syn or tcp flag ack" -- true iff at least one of SYN and > ACK is set, equivalent to "tcp flag syn or ack" > * "not (tcp flag fin or rst)" -- true iff neither FIN nor > RST is set, unfortunately, in the established grammar this would be > equivalent to "not tcp flag fin and not tcp flag rst", but not to > "not tcp flag fin or rst", which is a know and documented > peculiarity > * "tcp flag ece and tcp flag cwr" -- true iff both ECE and CWR are > set, equivalent to "tcp flag ece and cwr" > > Using this approach, managing the forward compatibility would be as > simple as recognising (or not) specific strings as the flag names > (i.e. "tcp flag abc" would be invalid syntax and there would be no > syntax to specify a numeric value to try working around that, whether > successfully or not). > > Speaking of "tcp flag ID" or "tcp flags NUM" with regard to other > existing protocol names and index operations, "ip" and "igrp" > potentially could also be a part of the same solution space, but I do > not immediately see any other protocols that could use it. I have studied and prototyped this syntax as much as is practicable without external input, the prototype can be seen in libpcap pull request 1621. The implementation is straightforward; originally it was not: the [first since 2005] new type qualifier exposed some technical debt in the interface between the grammar and the generator, but this has been addressed in the master branch already. It trivially extends to IPv4 header flags (which the implementation includes as well) and potentially EIGRP or PGM (if required in future). The flag names are specific to the protocol and opaque to the grammar. However, the syntax required more work. Because in this primitive the ID is a string, this syntax is exempt from the problem space of bitwise arithmetic expressions and integers (whether explicit or named). However, it is not exempt from the problem space of "not", for example, tcp flag not fin would mean "(not IPv4 or (is an IPv4 fragment...", as noted above. Thus to keep this syntax as surprise-free as possible, there needs to be a way to specify that the negation applies to the flag state only rather than the entire primitive. It would be nice to be able to use tcp flag !fin except in the lexer "not" is an alias for "!", so in the grammar it would be exactly the same as the above. I considered a few other syntax possibilities and found the most sense in joining the flag and its state into a single string. To that end, examples of the alternatives I have considered are: * "not-fin" and "no-fin" (easy to confuse with "not fin"), * "fin-0" (is 0 the bit value or the bit position?), * "fin-unset" ("unset" is a verb, not a state, and could be confused with an assignment), and * "~fin" ("~" is not a valid bitwise unary operator in the current implementation, but it may be introduced later, also there is an ambiguity to whether besides FIN flag cleared it means all other flags set). After quite a few drafts the prototype eventually became this: ip flag flagstate True if the packet is an IPv4 packet and the IPv4 header flag (MF or DF) is set (if flagstate is one of {mf-set, df-set}) or cleared (if flagstate is one of {mf-cleared, df-cleared}). The correct way to test for a cleared flag is by using the -cleared suffix; for example, ip flag df-cleared correctly does not match packets that are not IPv4 packets, but ip flag not df-set does (correctly from the grammar perspective, but usu‐ ally incorrectly from the use case perspective) match non-IPv4 packets because it means the same as not (ip and ip flag df-set) tcp flag flagstate True if the packet is an IPv4/IPv6 TCP packet and the TCP header flag (FIN, SYN, RST, PSH, ACK, URG, ECE, CWR or AE) is set (if flagstate is one of {fin-set, syn-set, rst-set, psh-set, ack-set, urg-set, ece-set, cwr-set, ae-set}) or cleared (if flagstate is one of {fin-cleared, syn-cleared, rst-cleared, psh-cleared, ack-cleared, urg-cleared, ece-cleared, cwr-cleared, ae-cleared}). For IPv4 this also verifies that the datagram is the first fragment or is not fragmented. For the same reasons as the above, the correct way to test for a cleared flag is by using the -cleared suf‐ fix, for example: tcp flag syn-cleared and ack-cleared If anybody is interested to experiment with this and to provide feedback, the pull request is going to stay open for a couple weeks. If everything goes well, before long the "tcphf" research above will be done and the final syntax will settle before libpcap 1.11.0. -- Denis Ovsienko _______________________________________________ tcpdump-workers mailing list -- [email protected] To unsubscribe send an email to [email protected] %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
