Hello!

I have a network emulator module that acts a lot like an ethernet bridge.

It is implemented roughly like this:

Hook into the rx logic and steal packets in the rx-all logic, similar to how 
sniffers
work.

Then, it puts the packet onto a queue for transmit.

A kernel thread services this queue transmitting frames on a different NIC.

I am using spin-locks to protect this queue.

I am disabling LRO/GRO etc on the ixgbe NICs so that I don't have
to deal with linearization when trying to do corruptions and such.  Re-enabling
LRO/GRO makes the transmit logic use less CPU, but the RX logic is the 
bottleneck
anyway it seems.

The code, which is GPL, is here, in case someone wants to take a look:

http://www.candelatech.com/downloads/wanlink/

What I see is that this is very sensitive to which CPU core does what.
If I run the transmitter thread on cpu-0, performance is awful.  If I run
it on 1, then it is good.  Sometimes, though hard to reproduce, I can run
right at 10Gbps bi-directional throughput.  More often, it is stuck at
around 7Gbps bi-directional throughput.

I tried adding some prefetch logic, and that helped when emulating very long
latency (like, 10 seconds worth), but not sure I am really doing that optimally
either.

My basic question is:  Any suggestion for an optimal CPU core configuration
(most likely including binding a NIC's irqs to a particular core)??

Any other suggestions for things to look for?

Thanks,
Ben


--
Ben Greear <gree...@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

Reply via email to