Hi Guy,

On Sat, Oct 24, 2020 at 01:23:57PM -0700, Guy Harris wrote:
> > sidenote: I've just been doing some development work with io_uring
> > (using liburing) on modern Linux systems, and it's amazing in terms of
> > performance of asynchronous I/O.  Might be worth investigating.
> 
> Might be, although we'd need to either:
> 
> 1) figure out a way to do that, while hiding the platform-dependent details, 
> on *all* (currently living) platforms supported by libpcap:
> 
>       macOS;
>       the *BSDs;
>       Solaris;
>       HP-UX;
>       AIX;
>       Windows;
>       Linux;
> 
> which don't all have the same asynchronous I/O mechanisms (POSIX aio on most 
> if not all of the UN*Xes, "overlapped I/O" or whatever Microsoft calls it on 
> Windows)

I hear you.  But fundamentally, if your abstraction API bases on buffers in 
memory that
* get allocated/provided by the platform-specific code
* get handed to the platform-specific code for write

You should be able to cover all of those (famous last words).    I think the 
problem only starts
when the higher layer code tries to handle the select/poll or even only the 
write() calls by itself.

Now that I think more of your use case, you probably cannot even have the 
platform specific code
handle the allocations for the buffers, as you use mmap()ed AF_PACKET on the 
"read" side.   So
if the platform specific AIO mechanism cannot handle "foreign" memory that it 
didn't allocate,
you will have to copy.

With io_uring, you can hand in whatever buffers allocated in whichever way.  
There's a small
performance benefit if you pre-register the buffers, so that the mapping in/out 
of kernel space
doesn't have to be done on every I/O operation.  You _should_ be able to
register the entire mmap'ed memory from the AF_PACKET socket once on
startup, though.

> 2) arrange that the user may, but need not, provide their own low-level 
> writing code that the new writing APIs call, so they can either use a 
> platform-independent mechanism supplied by libpcap or write their own code.  
> I think Michael Richardson has been thinking of something such as that.

That is basically more or less what I'm suggesting in the above.  I'm
happy to hack up an io_uring / liburing backend and contribute it, once
an interface for plugging that in materializes in libpcap.  As I'm not a
regular follower of the related mailing lists, please send me a ping
once you get to that point.

unrelated note: io_uring really does marvels, also for workloads with
many sockets.  It's easy to send and/or receive something like 500k pps
from thousands of sockets on my several years old laptop (Lenovo x26).
For a traditional userspace program using UDP socket based I/O that's
quite amazing.  Of course, not at all related to libpcap with it's
mmap() ed socket.

> What *might* be possible to do, in the absence of new libpcap capture APIs, 
> would be to have dumpcap, when capturing from the "any" device on Linux and 
> writing to a pcapng file:
> 
>       when the capture starts, write out Interface Description Blocks (IDBs) 
> for all the currently-known interfaces on the system, and make a table 
> mapping from the kernel's interface indices (ifIndexes, in SNMP terms) to 
> interface IDs for those IDBs;
> 
>       when a packet arrives, look up its interface index of the packet, and:
> 
>               if it's found, write the packet out with that interface index;
> 
>               if it's *not* found, write out an IDB for the new interface, 
> add it to the table, and write the packet out with that interface index.

Irrespective of current/future libpcap, this reflects the kind of logic that I
understood would be required for writing proper pcap-ng with IDBs on an
"any" interface capture, yes.  Good to hear it might be possible even
with the current code.

Regards,
        Harald

-- 
- Harald Welte <lafo...@gnumonks.org>           http://laforge.gnumonks.org/
============================================================================
"Privacy in residential applications is a desirable marketing option."
                                                  (ETSI EN 300 175-7 Ch. A6)
___________________________________________________________________________
Sent via:    Wireshark-dev mailing list <wireshark-dev@wireshark.org>
Archives:    https://www.wireshark.org/lists/wireshark-dev
Unsubscribe: https://www.wireshark.org/mailman/options/wireshark-dev
             mailto:wireshark-dev-requ...@wireshark.org?subject=unsubscribe

Reply via email to