[tcpdump-workers] Re: Flush OS buffer before termination

Garri Djavadyan Sun, 20 Oct 2024 15:34:36 -0700

On Sun, 2024-10-20 at 10:27 -0700, Guy Harris wrote:
> On Oct 20, 2024, at 2:57 AM, Garri Djavadyan <g.djavad...@gmail.com>
> wrote:
> 
> > > > I have to use a very big buffer with a very slow storage, much
> > > > slower
> > > > than the rate of coming packets received by the filter, and it
> > > > is
> > > > preferred not to lose a single packet after initiating
> > > > termination
> > > > the
> > > > process.
> > > 
> > > What do you mean by "with a very slow storage"?  You can set the
> > > size
> > > with -B, but that just tells the capture mechanism in the kernel
> > > how
> > > big a buffer to allocate.  It's not as if it tells it to be
> > > stored in
> > > some slower form of memory.
> > 
> > Let me show an example. To demonstrate the issue, I am generating
> > 2MB/s
> > stream of dummy packets:
> > 
> > [src]# pv -L 2M /dev/zero | dd bs=1472 > /dev/udp/192.168.0.1/12345
> > 
> > 
> > and dumping them to a storage, with cgroup-v2-restricted write
> > speed of
> > 1MB/s:
> > 
> > [dst]# lsblk /dev/loop0
> > NAME  MAJ:MIN RM  SIZE RO TYPE MOUNTPOINTS
> > loop0   7:0    0  3.9G  0 loop /mnt/test
> > 
> > [dst]# cat /sys/fs/cgroup/test/io.max
> > 7:0 rbps=max wbps=1024000 riops=max wiops=max
> > 
> > 
> > To temporarily avoid kernel-level drops,
> 
> Emphasis on *temporarily* - 2MB/s worth of packet data can only be
> saved in its entirety if you have 2MB/s or greater write speed.


That is right. However, it also depends on how long one needs to
mediate mismatching rates using a large input buffer. For example, with
a 2GB input buffer and 1MB/s rate difference, one could safely be
filling the buffer for more than half an hour. Safe buffer draining
would help a lot in such situations.


> > it is clearly seen that the input buffer is being filled at 1MB/s
> > rate
> > (the diff between the generated traffic rate (2MB/s) and the
> > writing
> > speed of the storage (1MB/s):
> > 
> > tcpdump: 0 packets captured, 0 packets received by filter, 0
> > packets
> > dropped by kernel
> > tcpdump: 218 packets captured, 715 packets received by filter, 0
> > packets dropped by kernel
> 
> On all platforms, "packets captured" means "packets read from libpcap
> and written to the capture file".
> 
> On Linux, "packets received by filter" means "packets that passed the
> filter" (rather than "packets that were run through the filter,
> whether or not they passed the filter", which is what it means on
> *BSD/macOS/Solaris 11/AIX; unfortunately, you can't get the latter
> value from Linux and can't get the former value from BSD, so that
> value *can't* be made to mean the same thing on all platforms).  It
> includes packets that passed the filter but could not be added to the
> buffer because the buffer was full.
> 
> On Linux, "packets dropped by kernel" means "packets that passed he
> filter but could not be added to the buffer because the buffer was
> full".
> 
> (The pcap_stats man page has an entire paragraph devoted to giving
> the message that the meaning of the statistics differs between
> platforms.)
> 
> I.e., when tcpdump exits, the difference, on Linux, between "packets
> received by filter" and "packets captured" is, indeed, "packets
> dropped because tcpdump exited without draining the packet buffer". 
> (On *BSD/macOS/Solaris 11/AIX, the latter value cannot be determined,
> as per the above.)
> 
> > > > There are a few options to overcome the problem. For example,
> > > > by dumping packets to the memory storage first (e.g. /dev/shm)
> > > 
> > > Presumably meaning you specified "-w /dev/shm" or something such
> > > as
> > > that?
> > > 
> > > If so, how does that make a difference?
> > 
> > I mean I can first dump packets to the lightning-fast RAM storage
> > and
> > after being done with the capturing part, copy the dump to the slow
> > storage.
> 
> I.e., it means that, when you signal tcpdump to exit, it's not as far
> behind the capture mechanism with regards to writing to the capture
> file, because it's stalling less waiting for write() calls to finish
> (if the write rate limitation you mention limits the rate at which
> write() calls can push data to the file descriptor), so the "packets
> captured" count is larger.

Exactly.


> > I see. Thank you so much for the explanation.
> > 
> > Do you think this case can justify feature requests both for
> > libpcap
> > and tcpdump on github?
> 
> Yes, as it means that tcpdump (and, potentially, other programs such
> as Wireshark) can write out *all* packets received before being told
> to stop capturing.
> 
> The implementations for various platforms would probably have to 1)
> set a "drop all packets" filter on the capture device, 2) possibly
> put the capture device in non-blocking mode (as there's no point in
> blocking, as no more packets will be seen), and 3) cause the packet
> processing loop in libpcap to quit as soon as  it finds that there
> are no more packets available to read.  For programs using
> pcap_loop(), that should be transparent; for programs using
> pcap_dispatch(), they would have to treat a return value of 0, if
> they've put the capture device in "draining mode", as meaning "done"
> rather than "the packet buffer timeout expired and no packets were
> provided, keep capturing".
> 
> tcpdump uses pcap_loop(), so it'd only have to be changed to use the
> new "stop capturing" API.

Thank you for sharing your thoughts on this. It is good to know that it
is feasible to implement. I will open a feature request for libpcap for
now.

Guy, thank you so much for all your comments. It is much appreciated.

Regards,
Garri
_______________________________________________
tcpdump-workers mailing list -- tcpdump-workers@lists.tcpdump.org
To unsubscribe send an email to tcpdump-workers-le...@lists.tcpdump.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[tcpdump-workers] Re: Flush OS buffer before termination

Reply via email to