Re: [PATCH] Fix integer truncation in systat -ifstat

2014-09-12 Thread Bruce Evans

On Thu, 11 Sep 2014, Ryan Stone wrote:


systat -ifstat currently truncates byte counters down to 32-bit
integers.  The following fixes the issue, but I'm not very happy with
it.  u_long is what the rest of our code uses for network counters,
but that ends up meaning that our counters are 32-bits wide on 32-bit
platforms.  I could make it uint64_t but that's not very future proof.
RIght now I'm leaning towards punting on the issue and using u_long as
there is an awful lot of code that would have to be modified for
extended byte counters to actually work on all platforms.


Only differences in the counters are used except in 1 place that is
broken in other ways, so overflow is only a large problem starting at
about 40 Gbps.  At only 10 Gbps, 32-bit counters are enough with a
refresh interval of 1 second but not quite enough with the default
interval of 5 seconds (this default is not documented in the man
page.  It seems to only be documented (with a grammar error -- comma
splice) in the status message for mode switches).  5 seconds at
nearly 1.125 GBps exceeds UINT32_MAX.  Packet counter overflow isn't
a problem until about 600 Gbps with the default interval.  32-bit
systems would have other problems supporting 600 GBps interfaces.


[rstone@rstone-laptop systat]svn diff
Index: ifstat.c
===
--- ifstat.c(revision 271439)
+++ ifstat.c(working copy)
@@ -269,8 +269,8 @@
   struct  if_stat *ifp = NULL;
   struct  timeval tv, new_tv, old_tv;
   double  elapsed = 0.0;
-   u_int   new_inb, new_outb, old_inb, old_outb = 0;
-   u_int   new_inp, new_outp, old_inp, old_outp = 0;
+   u_long  new_inb, new_outb, old_inb, old_outb = 0;
+   u_long  new_inp, new_outp, old_inp, old_outp = 0;

   SLIST_FOREACH(ifp, curlist, link) {
   /*


u_long was technically and practically correct in 1990 when long was
the largest integer type and the kernel counters had type u_long.
Except u_long was too large then (it should actually be long, thus
2*32 bits on 32-bit machines, making it too large and slow to use for
almost anything including these counters then).  Now the counters have
type uint64_t in the kernel, but apparently not many applications kept
up with this change (I think netstat did).

DIfferences between these counters are assigned to struct member
variables like if_in_curtraffic.  These haven't kept up with the change
either, but they were correct in 1990 since they have type u_long.

The place that is broken in other ways:

%   /* Display interface if it's received some traffic. */
%   if (new_inb  0  old_inb == 0) {
%   ifp-display = 1;
%   needsort = 1;
%   }

The bugs here are very minor:
- it's expands to it is, so it is a grammar and/or semantics error
- in the long term, old_inb always overflows if the interface is used at
  all.  Sometimes it overflows to precisely 0.  This breaks the logic
  for detecting the first activity on the interface.  But the result of
  misdetcting non-first activity as first seems to be harmless -- just
  sort and redisplay.

Bruce
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org


Re: netmap wishlist

2014-09-12 Thread Luigi Rizzo
On Fri, Sep 12, 2014 at 7:59 AM, Eggert, Lars l...@netapp.com wrote:

 Hi Luigi,

 I've started to play with netmap, like it a lot, and would like it to grow
 support for some additional features that I'd need. I wonder if you could
 comment on how likely support for any of the following is in netmap in the
 foreseeable future?

 * IP/TCP/UDP checksum offload
 * TCP/UDP segmentation offload
 * TCP/UDP large receive offload
 * jumbograms (I saw the email earlier today, so maybe that's addressed)


​Hi Lars:

there is something already available/in progress for some of the above,
but here are my thoughts on the various subjects:

- netmap is designed to work with large frames, by setting the buffer
  size to something suitable (using a sysctl).
  There might be some lurking bugs (e.g. some NICs need to be told
  about the maximum frame size or they will refuse to send/receive them
  even though the slot in the NIC ring specifies a large buffer),
  but this is trivial to fix on a case by case basis.
The downside is some waste on buffers (they are fixed size so having
  to allocate say 16K for a 64 byte frame is a bit annoying).

- checksums offloading can be added trivially in the *_txsync(),
  once again on a per-nic basis.
  Problem is, is we start adding per-packet features (say, checksums,
  scatter-gather I/O, segmentation) in the inner loop of *_txsync()
  we are going to lose some performance for high rate applications.
  Now we are running at about 20ns/pkt (because we assume a flat
  data format), having a few extra conditionals in the inner loop
  could easily eat another 5..20ns/pkt, and this makes me a bit
  uncomfortable, especially because the situations where these offloadings
  matter are typically with large packets, where we are not CPU bound.

- the VALE switch has support for segmentation and checksum avoidance.
  Clients can register as virtio-net capable: in this case the port will
  accept/deliver large segments across that port, and do segmentation and
  checksum as required for ports that are not virtio-net enabled
  (e.g. physical NICs attached to the same VALE switch).
  This was developed earlier this year by Vincenzo Maffione.

  At the moment this only works on top of VALE ports, not NICs,
  and the reason is that there is a big win if the VM can deliver
  a large segments in one shot to another local VM. Much less useful
  if you are talking across a physical device, in which case the OS
  should be able to do a reasonable job in segmenting packets
  (see also next item).

  We could probably leverage this code to work also on top of NICs
  connected through netmap, e.g. programming the NIC to use its own
  native offloading, but i am skeptical about the usefulness and
  concerned about the potential performance loss in *_txsync().

- Stefano Garzarella has some code to do software GSO (this is for FreeBSD,
  linux already has something similar), which will be presented at
  EuroBSDCon later this month in Sofia. This should address the
  segmentation issue on the host stack.

- on the receive side, both FreeBSD and Linux have an efficient
  RLO software fallback in case the NIC does not support it
  natively, i think we do not need this at the NIC/switch level.

cheers
luigi
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

Re: netmap wishlist

2014-09-12 Thread Eggert, Lars
Hi,

On 2014-9-12, at 9:31, Luigi Rizzo ri...@iet.unipi.it wrote:
 there is something already available/in progress for some of the above,
 but here are my thoughts on the various subjects:
 
 - netmap is designed to work with large frames, by setting the buffer
   size to something suitable (using a sysctl).
...
 The downside is some waste on buffers (they are fixed size so having
   to allocate say 16K for a 64 byte frame is a bit annoying).

that's OK for what I'm doing.

 - checksums offloading can be added trivially in the *_txsync(),
   once again on a per-nic basis.
   Problem is, is we start adding per-packet features (say, checksums,
   scatter-gather I/O, segmentation) in the inner loop of *_txsync()
   we are going to lose some performance for high rate applications.

What about making these things compile-time options? I totally see that if you 
want to use netmap for fast switching, you wouldn't want these. But if you use 
netmap for operating on IP and transport protocol packets, they become really 
essential. (Esp. at 40G - which reminds me that I forgot to add netmap support 
for the ixl driver to the wishlist...)

 - the VALE switch has support for segmentation and checksum avoidance.
   Clients can register as virtio-net capable: in this case the port will
   accept/deliver large segments across that port, and do segmentation and
   checksum as required for ports that are not virtio-net enabled
   (e.g. physical NICs attached to the same VALE switch).
   This was developed earlier this year by Vincenzo Maffione.

I may look into this. I'm unclear if adding a VALE layer into the system just 
to get this feature would be wort it in terms of performance.

   We could probably leverage this code to work also on top of NICs
   connected through netmap, e.g. programming the NIC to use its own
   native offloading, but i am skeptical about the usefulness and
   concerned about the potential performance loss in *_txsync().

I totally see that, but maybe a compile-time option would work. There are 
several distinct use cases for netmap at the moment, and it's unlikely that the 
same system would need to support several of them, so compile-time 
specialization may be sufficient here.

 - Stefano Garzarella has some code to do software GSO (this is for FreeBSD,
   linux already has something similar), which will be presented at
   EuroBSDCon later this month in Sofia. This should address the
   segmentation issue on the host stack.

Nice, I will take a look.

 - on the receive side, both FreeBSD and Linux have an efficient
   RLO software fallback in case the NIC does not support it
   natively, i think we do not need this at the NIC/switch level.

OK, I need to look into this.

Oh, and my list was prioritized - I think the checksum offload would be the 
real winner when dealing munging IP and transport packets.

Thanks,
Lars


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [PATCH] Fix integer truncation in systat -ifstat

2014-09-12 Thread Olivier Cochard-Labbé
On Fri, Sep 12, 2014 at 8:41 AM, Bruce Evans b...@optusnet.com.au wrote:


 Only differences in the counters are used except in 1 place that is
 broken in other ways, so overflow is only a large problem starting at
 about 40 Gbps.  At only 10 Gbps, 32-bit counters are enough with a
 refresh interval of 1 second but not quite enough with the default
 interval of 5 seconds (this default is not documented in the man
 page.  It seems to only be documented (with a grammar error -- comma
 splice) in the status message for mode switches).  5 seconds at
 nearly 1.125 GBps exceeds UINT32_MAX.  Packet counter overflow isn't
 a problem until about 600 Gbps with the default interval.  32-bit
 systems would have other problems supporting 600 GBps interfaces.


I confirm this behavior reported in this PR:

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=182448
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org


help creating a send-receive test suit using netmap

2014-09-12 Thread David
Hi

I am trying to compare the performance of sending packets using netmap,
socket and packet mmap.

Right now I am working on top of pkt-gen and some other implementations for
socket and packet mmap. I'm interested on the relation between packet size
and packets I can send per second.

I was following the code to check all the steps, and I found it is going
throught a memcpy to set the packet on the netmap buffer (as defined
on nm_pkt_copy inside netmap_user.h). I am afraid that the data I'm reading
is mostly driven by the time the memcpy takes.

Is there any other way around to check the performance in send to avoid
this bottleneck?

regards

-- 
David Díaz Barquero

Ingeniería en Computadores
Tecnológico de Costa Rica
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org