Re: [tcpdump-workers] [PATCH] enable memory mapped access to ethernet device for linux

Alexander Dupuy Mon, 10 Dec 2007 10:45:38 -0800

Guy Harris asked:

How does pcap_setbufsize() differ from pcap_setbuff()?

The WinPcap pcap_setbuff function is defined as:
int pcap_setbuff(pcap_t *p, int dim);

I declared it as follows:

extern int pcap_setbufsize(pcap_t *p, int bufsize, char *errbuf);
extern int pcap_getbufsize(pcap_t *p, char *errbuf);

where the errbuf can return a warning message even if the call doesn'tfail (e.g. on partial allocation of ring it indicates the maximum sizeto the user); the getbufsize function allows programmatic determinationof actual size allocated (if partial allocation was possible).


Paolo Abeni writes:

Having the frame size a power of two solve the above issue and simplify
walking the ring, because there is no need to handle in special way the
end of each ring block; differently we need to keep in the pcap handle
the block size (and that should require adding some field to the handle
structure) and check for end of block after each frame processing (to
skip the gap at the block's end)

Okay, I guess I see why you're doing this - you are trying not to addany additional fields to the struct pcap (this is why you're re-usingthe otherwise unused bp/cc members to track the ring). However, thestruct pcap is an internal structure and is allocated and managed withinlibpcap, so there really isn't any reason not to add the additionaltracking structures. The version I have uses a separately allocatedstruct iovec array to track the ring slots, so that once this array isset up, the read code just iterates through the array of pointers; thisis partly an artifact of support for the PACKET_TRECV ringbuffer supportin patched versions of the 2.2 kernel that was replaced by the simplerand cleaner PACKET_RX_RING ioctl. Whether you use struct iovec orsomething better, though, changing the size of the internal struct pcapwill not affect binary compatibility, and is probably a better approachthan forcing the ring frame size to a power of two; remember thatsetting the ring buffer frame size not only allocates memory, but willcause the kernel to actually copy the data (for packets that are largerthan the snapshot) so you're not only wasting memory but also CPU cycles.

Currently If the poll() call is
interrupted by a signal, the call is invoked again, as performed on
other platforms. The interface down will cause the read call to return

with error, and I suppose this is the standard behavior.

I must have missed the breakloop check in your first version; yourcurrent version does have it, but there is a minor problem - thebreakloop field is not reset in the after-poll check (and I wonder if itmight not be better to check before calling poll rather than after?) -the ring-read check does reset the field.

With the MAX_BLOCK_NR block limit on a 64bits platform the ring will
hold by default 16K jumbo frames (or 32K standard ethernet frames) that
in my experience is more than enough to handle at least a Gb ethernet

I don't really see the need for a hard-coded upper limit, though. Withkernels before 2.6.4, there was a kmalloc restriction that limited thenumber of ring slots to 128K/sizeof(pointer) (the total ring size wasnot limited in the same way), but this is not the case in recentkernels, so there's no reason to impose an arbitrary upper limit. Asfor it being "enough to handle at least a Gb ethernet" it really dependson the application's processing speed and the maximum burst rate; Idon't think you can generalize from any particular application.

Also, your "binary search" just reduces the size by halves; this willnot discover the actual limit - the implementation I have also triesincreasing the size after a successful allocation (if the successful onewas less than the original request), with a loop like this (where maxctis initialized to the original request size, and minct is initialized to1 - this loop is only used if the original allocation fails, and thereis a slightly different and more complex one for the PACKET_RX_RING caseimplementation, I'm providing this just to illustrate the basic idea):


                 do
                 {
                   ct = (maxct + minct) / 2;
                   if (setsockopt(p->fd, SOL_PACKET, PACKET_TRECV,
                                  (void*)ring, ct*sizeof(struct iovec)))
                     maxct = ct;
                   else
                     minct = ct;
                 } while (maxct - minct > 1);

On the other hand, it seams that the ring buffer isn't flushed by the
kernel when a pcap filter is attached, so the first issue must be
handled. An alternative, very simple solution would be to manually flush
the ring buffer after setting the filter. It will cause the loss of same
frames, but the same happen right now with standard, not memory mapped
access.

Guy's answer indicates that this is the current implementation for thesocket read() version (and other kernel versions), but just as AndyHowell noted, I also have applications that adjust the filterdynamically, so I would prefer not to flush the ring after setting thefilter (although I guess this would be acceptable the very first timethe filter is set for a given pcap handle/fd).


@alex

--
mailto:[EMAIL PROTECTED]

-
This is the tcpdump-workers list.
Visit https://cod.sandelman.ca/ to unsubscribe.

Re: [tcpdump-workers] [PATCH] enable memory mapped access to ethernet device for linux

Reply via email to