Re: [nox-dev] [openflow-discuss] NOX performance improvement by a factor 10

2010-12-15 Thread Amin Tootoonchian
As Martin said, in some cases cbench may significantly over-report numbers in throughput mode (of course it depends on the controller implementation, so not all the controllers might be affected). The cbench code sleeps for 100ms to clear out buffers after reading the switch counters

Re: [nox-dev] [openflow-discuss] NOX performance improvement by a factor 10

2010-12-15 Thread kk yap
Random curiosity: Why would jumbo frames increases replies per sec? Regards KK On 15 December 2010 11:45, Amin Tootoonchian a...@cs.toronto.edu wrote: I missed that. The single core throughput is ~250k replies/sec, two cores ~450k replies/sec, three cores ~650k replies/sec, four cores ~800

Re: [nox-dev] [openflow-discuss] NOX performance improvement by a factor 10

2010-12-15 Thread Amin Tootoonchian
I double checked. It does slightly improve the performance (in the order of a few thousand replies/sec). Larger MTUs decrease the CPU workload (by decreasing the number of transfers across the bus) and this means that more CPU cycles are available to the controller to process requests. However, I

Re: [nox-dev] [openflow-discuss] NOX performance improvement by a factor 10

2010-12-15 Thread kk yap
Hi Amin, Just to clarify, does your jumbo frames refer to the OpenFlow messages or the frames in the datapath? By OpenFlow messages, I am assuming you use a TCP connection between NOX and the switches, and you are batching the messages into jumbo frames of 9000 bytes before sending them out.

Re: [nox-dev] [openflow-discuss] NOX performance improvement by a factor 10

2010-12-15 Thread kk yap
Oh.. another point, if you are batching the frames, then what about delay? There seems to be a trade-off between delay and throughput, and we have went for the former by disabling Nagle's algorithm. Regards KK On 15 December 2010 12:46, kk yap yap...@stanford.edu wrote: Hi Amin, Just to

Re: [nox-dev] [openflow-discuss] NOX performance improvement by a factor 10

2010-12-15 Thread Martin Casado
I'll let Amin follow up, but from what I understand, the way he's doing batching doesn't introduce any additional delay. Rather, if he can write to the socket, he writes. However, if the socket is blocked for whatever reason (e.g. waiting for an ACK or send buffer is full) he buffers all of

Re: [nox-dev] [openflow-discuss] NOX performance improvement by a factor 10

2010-12-15 Thread Amin Tootoonchian
I am talking about jumbo Ethernet frames here. By batching, I mean batching outgoing messages together and writing to the underlying layer which would be the TCP write buffer. The TCP buffer is not limited to MTU or anything like that, so in most cases my code flushes more than 64KB to the TCP

Re: [nox-dev] [openflow-discuss] NOX performance improvement by a factor 10

2010-12-14 Thread Martin Casado
This is awesome Amin, thanks for posting. It is also probably worth mentioning that cbench was broken and over-reporting numbers. Do you mind sending out a few details about that? I presume that will be helpful to those using cbench [cross-posting to nox-dev, openflow-discuss,