Re: [tcpdump-workers] freebsd threading bug

Guy Harris Mon, 14 May 2001 22:06:22 -0700
(FYI, "[EMAIL PROTECTED]" is now forwarded to
"[EMAIL PROTECTED]" - libpcap and tcpdump development is now
being done at tcpdump.org; check out the Web site at

        http://www.tcpdump.org/

.)

> If libpcap used in threaded environment on FreeBSD, read() from bpf returns
> with -1 and does not block ever.

In theory, the FreeBSD pthread library should work; there may, however,
be bugs in BPF that prevent it from working.  As long as the BPF device

        1) handles "poll()" and "select()" correctly (which, as
           indicated below, I don't think it does);

        2) handles non-blocking mode correctly (i.e., if there is *NO*
           data to be returned, makes a "read()" return -1 and set
           "errno" to EWOULDBLOCK/EAGAIN and, if there is any data to be
           returned, returns it);

the pthread library should, I think, work correctly with it.

What version of FreeBSD are you using?  Change 1.72 to "sys/net/bpf.c"
fixes a bug that caused problems with BPF devices in threaded programs -
the checkin comment was:

        Fix bug: a read() on a bpf device which was in non-blocking mode
        and had no data available returned 0.  Now it returns -1 with errno
        set to EWOULDBLOCK (== EAGAIN) as it should.  This fix makes the bpf
        device usable in threaded programs.

and that does, in fact, mean that FreeBSD releases with versions of
bpf.c prior to 1.72 violated 2) above.  I'm not sure that would cause
the symptoms you mention, however.

That change was MFCed in revision 1.59.2.5, which has a CVS tag of
RELENG_4_3, so FreeBSD prior to 4.3 had that bug.

> There is the possibility to poll()/select()
> the file descriptor (pcap-int.h) to avoid this,

Note that if you do a "poll()" or "select()" on a BPF file descriptor in
FreeBSD, the "poll()" or "select()" will not, as I remember from when I
tried it in Ethereal and from looking at the code, report that a read is
possible on the file descriptor unless the hold buffer is non-empty -
and that only happens if the store buffer fills up, so the "poll()" or
"select()" will not indicate that you can read from the BPF descriptor
until enough packet data has arrived to fill the store buffer.

That bug might also cause problems with threaded programs - it might
cause threaded programs to act as if an infinite timeout were specified.

I suspect that the OpenBSD BPF code might fix the latter problem; see
change 1.13, with the comment

        fix bpf select(); from [EMAIL PROTECTED]

"[EMAIL PROTECTED]" is, as I remember, Michael Stolarchuk, and that checkin
did more than just "fix bpf select()", I think - I suspect it included
many (most?  all?) of the changes NFR made to the BPF code, as described
in their paper in the LISA 97 proceedings (said paper is, alas, no
longer available at

        http://www.nfr.com/forum/publications/LISA-97.htm

and I didn't find any obvious place on the NFR site where it could be
found; however, it can be found at

        http://www.usenix.org/publications/library/proceedings/lisa97/01.ranum.html

although I think you have to be a USENIX member to read it online).  The
paper says

        The packet suckers we initially implemented have been based on
        the libpcap [11] packet capture interface.  Libpcap provides a
        generalized packet capture facility atop a number of operating
        system-specific network capture interfaces.  This freed us from
        having to deal with a lot of portability issues.  We did
        discover, however, that some of the available packet capture
        facilities cannot reliably buffer high volumes of bursty
        traffic.  Berkeley packet filter-based packet suckers running on
        a Pentium-200 were unable to handle even moderate network loads. 
        This was a result of a latency interaction between BPF and our
        software: we do more processing than a program like tcpdump,
        and, though our average processing seems to be within the
        performance envelope of the machine, we can't always process the
        packet ``immediately,'' as BPF expects.  To fix the problem, we
        increased internal buffer sizes from their default of 32K to
        256K, a number more appropriate for the amount of RAM available
        in modern computers.  Since the NFR daemon potentially monitors
        multiple interfaces, we performed minor modifications to the way
        blocking and time-outs are performed in BPF.  The original BPF
        time-out is an inter-packet time-out based in the arrival of a
        packet.  If you don't see a packet, you never time out.  We
        modified it to begin the timer with the read() or select()
        timeout, so we can detect periods of no traffic.

although, in all versions of BPF I've seen, the timer does, in fact,
begin with the "read()", so that it times out even if no packets have
arrived; however, it *doesn't* begin with a "select()" or "poll()",
which could cause problems if you're monitoring multiple interfaces. 
MTS's changes do, in fact, appear to make it work with "select()".

It might not be a bad idea for FreeBSD and NetBSD to pick up the changes
in question and adapt them to their versions of BPF - I suspect NetBSD's
BPF has the same problem, and picking up those changes might actually
make the changes in revision 1.60 of NetBSD's bpf.c actually do
something useful (sorry about including a change that didn't actually
do anything useful in PR 11836, Jason :-)).

> but this is not portable way
> because of different techniques used on different systems.

Well, I suspect that on most, if not all, non-BSD flavors of UNIX, a
"select()" or "poll()" should work on the file descriptor you get from
"pcap_fileno()".

It should work on any system with DLPI (Solaris, HP-UX, AIX if you
configure libpcap to use DLPI rather than the non-standard unsupported
undocumented BPF in AIX), as the "select()"/"poll()" should be handled
by the stream head (the timer in the "bufmod" STREAMS module in Solaris
starts when the first packet arrives, not when a "read()"/"getmsg()" is
done, so it makes no difference whether you block in "read()"/"getmsg()"
or in "select()").

It should also work on SunOS 4.x, as raw packet capture there is done
using a STREAMS device.

It should work on Linux and IRIX, as the file descriptor in question
refers to a socket in both OSes, and the "select()"/"poll()" should be
handled by the socket code (neither of those OSes support timeouts on
those sockets - they return data as soon as it arrives, rather than
waiting for "enough" data to arrive or a timeout to expire).

It also appears to work on Digital UNIX, which uses neither STREAMS nor
a socket for that purpose; the only remaining
non-STREAMS/non-socket/non-BPF platforms is SunOS prior to 4.x
("pcap-enet.c" doesn't actually include a "pcap_open_live()", so there's
no actual libpcap support in it for whatever OSes it handles - AIX on
the RT PC, and some other unknown platform, I infer, given the "#ifdef
IBMRTPC" in it).

> What are the
> standart way to avoid such a bug in threaded environment?

I guess you could use a "poll()" or "select()" wrapper, assuming that
works in your threaded environment (note that, at least in NetBSD and
FreeBSD, it will, I think, render the timeout ineffective, so you might
block for a long time before seeing any packets, as you'd have to wait
for the store buffer to fill up).

> What is the correct method? Maybe, library has to be modified to implement
> poll() around the read() call, or this is the bug in FreeBSD threading
> implementation?

I think it's a bug in FreeBSD *BPF* implementation, as per the above.

At some point, in my (ha ha) copious free time, I should look at merging
the OpenBSD change in question into the FreeBSD BPF code and, if it
works, send it back to the FreeBSD folk; I don't have NetBSD at home,
but I can at least look at merging them into NetBSD as well.

However, that shouldn't stop some *other* ambitious person from merging
the OpenBSD change into {Net,Free}BSD themselves - it may happen more
quickly that way.  :-)
-
This is the TCPDUMP workers list. It is archived at
http://www.tcpdump.org/lists/workers/index.html
To unsubscribe use mailto:[EMAIL PROTECTED]?body=unsubscribe
Re: [tcpdump-workers] freebsd threading bug

Reply via email to