Re: [PATCH] Fix integer truncation in systat -ifstat
On Thu, 11 Sep 2014, Ryan Stone wrote: systat -ifstat currently truncates byte counters down to 32-bit integers. The following fixes the issue, but I'm not very happy with it. u_long is what the rest of our code uses for network counters, but that ends up meaning that our counters are 32-bits wide on 32-bit platforms. I could make it uint64_t but that's not very future proof. RIght now I'm leaning towards punting on the issue and using u_long as there is an awful lot of code that would have to be modified for extended byte counters to actually work on all platforms. Only differences in the counters are used except in 1 place that is broken in other ways, so overflow is only a large problem starting at about 40 Gbps. At only 10 Gbps, 32-bit counters are enough with a refresh interval of 1 second but not quite enough with the default interval of 5 seconds (this default is not documented in the man page. It seems to only be documented (with a grammar error -- comma splice) in the status message for mode switches). 5 seconds at nearly 1.125 GBps exceeds UINT32_MAX. Packet counter overflow isn't a problem until about 600 Gbps with the default interval. 32-bit systems would have other problems supporting 600 GBps interfaces. [rstone@rstone-laptop systat]svn diff Index: ifstat.c === --- ifstat.c(revision 271439) +++ ifstat.c(working copy) @@ -269,8 +269,8 @@ struct if_stat *ifp = NULL; struct timeval tv, new_tv, old_tv; double elapsed = 0.0; - u_int new_inb, new_outb, old_inb, old_outb = 0; - u_int new_inp, new_outp, old_inp, old_outp = 0; + u_long new_inb, new_outb, old_inb, old_outb = 0; + u_long new_inp, new_outp, old_inp, old_outp = 0; SLIST_FOREACH(ifp, curlist, link) { /* u_long was technically and practically correct in 1990 when long was the largest integer type and the kernel counters had type u_long. Except u_long was too large then (it should actually be long, thus 2*32 bits on 32-bit machines, making it too large and slow to use for almost anything including these counters then). Now the counters have type uint64_t in the kernel, but apparently not many applications kept up with this change (I think netstat did). DIfferences between these counters are assigned to struct member variables like if_in_curtraffic. These haven't kept up with the change either, but they were correct in 1990 since they have type u_long. The place that is broken in other ways: % /* Display interface if it's received some traffic. */ % if (new_inb 0 old_inb == 0) { % ifp-display = 1; % needsort = 1; % } The bugs here are very minor: - it's expands to it is, so it is a grammar and/or semantics error - in the long term, old_inb always overflows if the interface is used at all. Sometimes it overflows to precisely 0. This breaks the logic for detecting the first activity on the interface. But the result of misdetcting non-first activity as first seems to be harmless -- just sort and redisplay. Bruce ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: netmap wishlist
On Fri, Sep 12, 2014 at 7:59 AM, Eggert, Lars l...@netapp.com wrote: Hi Luigi, I've started to play with netmap, like it a lot, and would like it to grow support for some additional features that I'd need. I wonder if you could comment on how likely support for any of the following is in netmap in the foreseeable future? * IP/TCP/UDP checksum offload * TCP/UDP segmentation offload * TCP/UDP large receive offload * jumbograms (I saw the email earlier today, so maybe that's addressed) Hi Lars: there is something already available/in progress for some of the above, but here are my thoughts on the various subjects: - netmap is designed to work with large frames, by setting the buffer size to something suitable (using a sysctl). There might be some lurking bugs (e.g. some NICs need to be told about the maximum frame size or they will refuse to send/receive them even though the slot in the NIC ring specifies a large buffer), but this is trivial to fix on a case by case basis. The downside is some waste on buffers (they are fixed size so having to allocate say 16K for a 64 byte frame is a bit annoying). - checksums offloading can be added trivially in the *_txsync(), once again on a per-nic basis. Problem is, is we start adding per-packet features (say, checksums, scatter-gather I/O, segmentation) in the inner loop of *_txsync() we are going to lose some performance for high rate applications. Now we are running at about 20ns/pkt (because we assume a flat data format), having a few extra conditionals in the inner loop could easily eat another 5..20ns/pkt, and this makes me a bit uncomfortable, especially because the situations where these offloadings matter are typically with large packets, where we are not CPU bound. - the VALE switch has support for segmentation and checksum avoidance. Clients can register as virtio-net capable: in this case the port will accept/deliver large segments across that port, and do segmentation and checksum as required for ports that are not virtio-net enabled (e.g. physical NICs attached to the same VALE switch). This was developed earlier this year by Vincenzo Maffione. At the moment this only works on top of VALE ports, not NICs, and the reason is that there is a big win if the VM can deliver a large segments in one shot to another local VM. Much less useful if you are talking across a physical device, in which case the OS should be able to do a reasonable job in segmenting packets (see also next item). We could probably leverage this code to work also on top of NICs connected through netmap, e.g. programming the NIC to use its own native offloading, but i am skeptical about the usefulness and concerned about the potential performance loss in *_txsync(). - Stefano Garzarella has some code to do software GSO (this is for FreeBSD, linux already has something similar), which will be presented at EuroBSDCon later this month in Sofia. This should address the segmentation issue on the host stack. - on the receive side, both FreeBSD and Linux have an efficient RLO software fallback in case the NIC does not support it natively, i think we do not need this at the NIC/switch level. cheers luigi ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: netmap wishlist
Hi, On 2014-9-12, at 9:31, Luigi Rizzo ri...@iet.unipi.it wrote: there is something already available/in progress for some of the above, but here are my thoughts on the various subjects: - netmap is designed to work with large frames, by setting the buffer size to something suitable (using a sysctl). ... The downside is some waste on buffers (they are fixed size so having to allocate say 16K for a 64 byte frame is a bit annoying). that's OK for what I'm doing. - checksums offloading can be added trivially in the *_txsync(), once again on a per-nic basis. Problem is, is we start adding per-packet features (say, checksums, scatter-gather I/O, segmentation) in the inner loop of *_txsync() we are going to lose some performance for high rate applications. What about making these things compile-time options? I totally see that if you want to use netmap for fast switching, you wouldn't want these. But if you use netmap for operating on IP and transport protocol packets, they become really essential. (Esp. at 40G - which reminds me that I forgot to add netmap support for the ixl driver to the wishlist...) - the VALE switch has support for segmentation and checksum avoidance. Clients can register as virtio-net capable: in this case the port will accept/deliver large segments across that port, and do segmentation and checksum as required for ports that are not virtio-net enabled (e.g. physical NICs attached to the same VALE switch). This was developed earlier this year by Vincenzo Maffione. I may look into this. I'm unclear if adding a VALE layer into the system just to get this feature would be wort it in terms of performance. We could probably leverage this code to work also on top of NICs connected through netmap, e.g. programming the NIC to use its own native offloading, but i am skeptical about the usefulness and concerned about the potential performance loss in *_txsync(). I totally see that, but maybe a compile-time option would work. There are several distinct use cases for netmap at the moment, and it's unlikely that the same system would need to support several of them, so compile-time specialization may be sufficient here. - Stefano Garzarella has some code to do software GSO (this is for FreeBSD, linux already has something similar), which will be presented at EuroBSDCon later this month in Sofia. This should address the segmentation issue on the host stack. Nice, I will take a look. - on the receive side, both FreeBSD and Linux have an efficient RLO software fallback in case the NIC does not support it natively, i think we do not need this at the NIC/switch level. OK, I need to look into this. Oh, and my list was prioritized - I think the checksum offload would be the real winner when dealing munging IP and transport packets. Thanks, Lars signature.asc Description: Message signed with OpenPGP using GPGMail
Re: [PATCH] Fix integer truncation in systat -ifstat
On Fri, Sep 12, 2014 at 8:41 AM, Bruce Evans b...@optusnet.com.au wrote: Only differences in the counters are used except in 1 place that is broken in other ways, so overflow is only a large problem starting at about 40 Gbps. At only 10 Gbps, 32-bit counters are enough with a refresh interval of 1 second but not quite enough with the default interval of 5 seconds (this default is not documented in the man page. It seems to only be documented (with a grammar error -- comma splice) in the status message for mode switches). 5 seconds at nearly 1.125 GBps exceeds UINT32_MAX. Packet counter overflow isn't a problem until about 600 Gbps with the default interval. 32-bit systems would have other problems supporting 600 GBps interfaces. I confirm this behavior reported in this PR: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=182448 ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
help creating a send-receive test suit using netmap
Hi I am trying to compare the performance of sending packets using netmap, socket and packet mmap. Right now I am working on top of pkt-gen and some other implementations for socket and packet mmap. I'm interested on the relation between packet size and packets I can send per second. I was following the code to check all the steps, and I found it is going throught a memcpy to set the packet on the netmap buffer (as defined on nm_pkt_copy inside netmap_user.h). I am afraid that the data I'm reading is mostly driven by the time the memcpy takes. Is there any other way around to check the performance in send to avoid this bottleneck? regards -- David Díaz Barquero Ingeniería en Computadores Tecnológico de Costa Rica ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org