Re: [v2 RFC PATCH 0/4] Implement multiqueue virtio-net

2010-10-14 Thread Krishna Kumar2
Michael S. Tsirkin m...@redhat.com wrote on 10/12/2010 10:39:07 PM: Sorry for the delay, I was sick last couple of days. The results with your patch are (%'s over original code): Code BW% CPU% RemoteCPU MQ (#txq=16) 31.4% 38.42% 6.41% MQ+MST

Re: [v2 RFC PATCH 0/4] Implement multiqueue virtio-net

2010-10-14 Thread Michael S. Tsirkin
On Thu, Oct 14, 2010 at 01:28:58PM +0530, Krishna Kumar2 wrote: Michael S. Tsirkin m...@redhat.com wrote on 10/12/2010 10:39:07 PM: Sorry for the delay, I was sick last couple of days. The results with your patch are (%'s over original code): Code BW% CPU%

Re: [v2 RFC PATCH 0/4] Implement multiqueue virtio-net

2010-10-14 Thread Krishna Kumar2
Michael S. Tsirkin m...@redhat.com What other shared TX/RX locks are there? In your setup, is the same macvtap socket structure used for RX and TX? If yes this will create cacheline bounces as sk_wmem_alloc/sk_rmem_alloc share a cache line, there might also be contention on the lock

Re: [v2 RFC PATCH 0/4] Implement multiqueue virtio-net

2010-10-14 Thread Krishna Kumar2
Krishna Kumar2/India/IBM wrote on 10/14/2010 02:34:01 PM: void vhost_poll_queue(struct vhost_poll *poll) { struct vhost_virtqueue *vq = vhost_find_vq(poll); vhost_work_queue(vq, poll-work); } Since poll batches packets, find_vq does not seem to add much to the CPU

Re: [v2 RFC PATCH 0/4] Implement multiqueue virtio-net

2010-10-14 Thread Krishna Kumar2
Krishna Kumar2/India/IBM wrote on 10/14/2010 05:47:54 PM: Sorry, it should read txq=8 below. - KK There's a significant reduction in CPU/SD utilization with your patch. Following is the performance of ORG vs MQ+mm patch: _ Org

Re: [v2 RFC PATCH 0/4] Implement multiqueue virtio-net

2010-10-11 Thread Krishna Kumar2
Michael S. Tsirkin m...@redhat.com wrote on 10/06/2010 07:04:31 PM: On Fri, Sep 17, 2010 at 03:33:07PM +0530, Krishna Kumar wrote: For 1 TCP netperf, I ran 7 iterations and summed it. Explanation for degradation for 1 stream case: I thought about possible RX/TX contention reasons, and I

Re: [v2 RFC PATCH 0/4] Implement multiqueue virtio-net

2010-10-06 Thread Arnd Bergmann
On Tuesday 05 October 2010, Krishna Kumar2 wrote: After testing various combinations of #txqs, #vhosts, #netperf sessions, I think the drop for 1 stream is due to TX and RX for a flow being processed on different cpus. I did two more tests: 1. Pin vhosts to same CPU: - BW drop is

Re: [v2 RFC PATCH 0/4] Implement multiqueue virtio-net

2010-10-06 Thread Michael S. Tsirkin
On Fri, Sep 17, 2010 at 03:33:07PM +0530, Krishna Kumar wrote: For 1 TCP netperf, I ran 7 iterations and summed it. Explanation for degradation for 1 stream case: I thought about possible RX/TX contention reasons, and I realized that we get/put the mm counter all the time. So I write the

Re: [v2 RFC PATCH 0/4] Implement multiqueue virtio-net

2010-10-06 Thread Krishna Kumar2
...@codemonkey.ws Subject Re: [v2 RFC PATCH 0/4] Implement multiqueue virtio-net On Fri, Sep 17, 2010 at 03:33:07PM +0530, Krishna Kumar wrote: For 1 TCP netperf, I ran 7 iterations and summed it. Explanation for degradation for 1 stream case: I thought about possible RX/TX contention

Re: [v2 RFC PATCH 0/4] Implement multiqueue virtio-net

2010-10-06 Thread Krishna Kumar2
Arnd Bergmann a...@arndb.de wrote on 10/06/2010 05:49:00 PM: I don't see any reasons mentioned above. However, for higher number of netperf sessions, I see a big increase in retransmissions: ___ #netperf ORG NEW BW (#retr)

Re: [v2 RFC PATCH 0/4] Implement multiqueue virtio-net

2010-10-06 Thread Krishna Kumar2
Michael S. Tsirkin m...@redhat.com wrote on 10/05/2010 11:53:23 PM: Any idea where does this come from? Do you see more TX interrupts? RX interrupts? Exits? Do interrupts bounce more between guest CPUs? 4. Identify reasons for single netperf BW regression. After testing various

Re: [v2 RFC PATCH 0/4] Implement multiqueue virtio-net

2010-10-06 Thread Arnd Bergmann
On Wednesday 06 October 2010 19:14:42 Krishna Kumar2 wrote: Arnd Bergmann a...@arndb.de wrote on 10/06/2010 05:49:00 PM: I don't see any reasons mentioned above. However, for higher number of netperf sessions, I see a big increase in retransmissions:

Re: [v2 RFC PATCH 0/4] Implement multiqueue virtio-net

2010-10-06 Thread Michael S. Tsirkin
On Wed, Oct 06, 2010 at 11:13:31PM +0530, Krishna Kumar2 wrote: Michael S. Tsirkin m...@redhat.com wrote on 10/05/2010 11:53:23 PM: Any idea where does this come from? Do you see more TX interrupts? RX interrupts? Exits? Do interrupts bounce more between guest CPUs? 4. Identify

Re: [v2 RFC PATCH 0/4] Implement multiqueue virtio-net

2010-10-05 Thread Krishna Kumar2
Michael S. Tsirkin m...@redhat.com wrote on 09/19/2010 06:14:43 PM: Could you document how exactly do you measure multistream bandwidth: netperf flags, etc? All results were without any netperf flags or system tuning: for i in $list do netperf -c -C -l 60 -H 192.168.122.1

Re: [v2 RFC PATCH 0/4] Implement multiqueue virtio-net

2010-10-05 Thread Michael S. Tsirkin
On Tue, Oct 05, 2010 at 04:10:00PM +0530, Krishna Kumar2 wrote: Michael S. Tsirkin m...@redhat.com wrote on 09/19/2010 06:14:43 PM: Could you document how exactly do you measure multistream bandwidth: netperf flags, etc? All results were without any netperf flags or system tuning:

Re: [v2 RFC PATCH 0/4] Implement multiqueue virtio-net

2010-09-19 Thread Michael S. Tsirkin
On Fri, Sep 17, 2010 at 03:33:07PM +0530, Krishna Kumar wrote: For 1 TCP netperf, I ran 7 iterations and summed it. Explanation for degradation for 1 stream case: Could you document how exactly do you measure multistream bandwidth: netperf flags, etc? 1. Without any tuning, BW falls

[v2 RFC PATCH 0/4] Implement multiqueue virtio-net

2010-09-17 Thread Krishna Kumar
Following patches implement transmit MQ in virtio-net. Also included is the user qemu changes. MQ is disabled by default unless qemu specifies it. 1. This feature was first implemented with a single vhost. Testing showed 3-8% performance gain for upto 8 netperf sessions (and sometimes 16),