Guy Harris asked:
How does pcap_setbufsize() differ from pcap_setbuff()?
The WinPcap pcap_setbuff function is defined as:
int pcap_setbuff(pcap_t *p, int dim);
I declared it as follows:
extern int pcap_setbufsize(pcap_t *p, int bufsize, char *errbuf);
extern int pcap_getbufsize(pcap_t *p, char *errbuf);
where the errbuf can return a warning message even if the call doesn't
fail (e.g. on partial allocation of ring it indicates the maximum size
to the user); the getbufsize function allows programmatic determination
of actual size allocated (if partial allocation was possible).
Paolo Abeni writes:
Having the frame size a power of two solve the above issue and simplify
walking the ring, because there is no need to handle in special way the
end of each ring block; differently we need to keep in the pcap handle
the block size (and that should require adding some field to the handle
structure) and check for end of block after each frame processing (to
skip the gap at the block's end)
Okay, I guess I see why you're doing this - you are trying not to add
any additional fields to the struct pcap (this is why you're re-using
the otherwise unused bp/cc members to track the ring). However, the
struct pcap is an internal structure and is allocated and managed within
libpcap, so there really isn't any reason not to add the additional
tracking structures. The version I have uses a separately allocated
struct iovec array to track the ring slots, so that once this array is
set up, the read code just iterates through the array of pointers; this
is partly an artifact of support for the PACKET_TRECV ringbuffer support
in patched versions of the 2.2 kernel that was replaced by the simpler
and cleaner PACKET_RX_RING ioctl. Whether you use struct iovec or
something better, though, changing the size of the internal struct pcap
will not affect binary compatibility, and is probably a better approach
than forcing the ring frame size to a power of two; remember that
setting the ring buffer frame size not only allocates memory, but will
cause the kernel to actually copy the data (for packets that are larger
than the snapshot) so you're not only wasting memory but also CPU cycles.
Currently If the poll() call is
interrupted by a signal, the call is invoked again, as performed on
other platforms. The interface down will cause the read call to return
with error, and I suppose this is the standard behavior.
I must have missed the breakloop check in your first version; your
current version does have it, but there is a minor problem - the
breakloop field is not reset in the after-poll check (and I wonder if it
might not be better to check before calling poll rather than after?) -
the ring-read check does reset the field.
With the MAX_BLOCK_NR block limit on a 64bits platform the ring will
hold by default 16K jumbo frames (or 32K standard ethernet frames) that
in my experience is more than enough to handle at least a Gb ethernet
I don't really see the need for a hard-coded upper limit, though. With
kernels before 2.6.4, there was a kmalloc restriction that limited the
number of ring slots to 128K/sizeof(pointer) (the total ring size was
not limited in the same way), but this is not the case in recent
kernels, so there's no reason to impose an arbitrary upper limit. As
for it being "enough to handle at least a Gb ethernet" it really depends
on the application's processing speed and the maximum burst rate; I
don't think you can generalize from any particular application.
Also, your "binary search" just reduces the size by halves; this will
not discover the actual limit - the implementation I have also tries
increasing the size after a successful allocation (if the successful one
was less than the original request), with a loop like this (where maxct
is initialized to the original request size, and minct is initialized to
1 - this loop is only used if the original allocation fails, and there
is a slightly different and more complex one for the PACKET_RX_RING case
implementation, I'm providing this just to illustrate the basic idea):
do
{
ct = (maxct + minct) / 2;
if (setsockopt(p->fd, SOL_PACKET, PACKET_TRECV,
(void*)ring, ct*sizeof(struct iovec)))
maxct = ct;
else
minct = ct;
} while (maxct - minct > 1);
On the other hand, it seams that the ring buffer isn't flushed by the
kernel when a pcap filter is attached, so the first issue must be
handled. An alternative, very simple solution would be to manually flush
the ring buffer after setting the filter. It will cause the loss of same
frames, but the same happen right now with standard, not memory mapped
access.
Guy's answer indicates that this is the current implementation for the
socket read() version (and other kernel versions), but just as Andy
Howell noted, I also have applications that adjust the filter
dynamically, so I would prefer not to flush the ring after setting the
filter (although I guess this would be acceptable the very first time
the filter is set for a given pcap handle/fd).
@alex
--
mailto:[EMAIL PROTECTED]
-
This is the tcpdump-workers list.
Visit https://cod.sandelman.ca/ to unsubscribe.