Re: [PATCH 2/3] [RFC] Changes for MQ virtio-net

2011-03-08 Thread Krishna Kumar2
Michael S. Tsirkin m...@redhat.com wrote on 03/02/2011 03:36:00 PM:

Sorry for the delayed response, I have been sick the last few days.
I am responding to both your posts here.

  Both virtio-net and vhost need some check to make sure very
  high values are not passed by userspace. Is this not required?

 Whatever we stick in the header is effectively part of
 host/gues interface. Are you sure we'll never want
 more than 16 VQs? This value does not seem that high.

OK, so even constants cannot change?  Given that, should I remove all
checks and use kcalloc?

  OK, so virtio_net_config has num_queue_pairs, and this gets converted to
  numtxqs in virtnet_info?

 Or put num_queue_pairs in virtnet_info too.

For virtnet_info, having numtxqs is easier since all code that loops needs
only 'numtxq'.

  Also, vhost has some
  code that processes tx first before rx (e.g. vhost_net_stop/flush),

 No idea why did I do it this way. I don't think it matters.

  so this approach seemed helpful.
  I am OK either way, what do you
  suggest?

 We get less code generated but also less flexibility.
 I am not sure, I'll play around with code, for now
 let's keep it as is.

OK.

  Yes, it is a waste to have these vectors for tx ints. I initially
  thought of adding a flag to virtio_device to pass to vp_find_vqs,
  but it won't work, so a new API is needed. I can work with you on
  this in the background if you like.

 OK. For starters, how about we change find_vqs to get a structure?  Then
 we can easily add flags that tell us that some interrupts are rare.

Yes. OK to work on this outside this patch series, I guess?

  vq's are matched between qemu, virtio-net and vhost. Isn't some check
  required that userspace has not passed a bad value?


 For virtio, I'm not too concerned: qemu can already easily
 crash the guest :)
 For vhost yes, but I'm concerned that even with 16 VQs we are
 drinking a lot of resources already. I would be happier
 if we had a file descriptor per VQs pair in some way.
 The the amount of memory userspace can use up is
 limited by the # of file descriptors.

I will start working on this approach this week and see how it goes.

  OK, so define free_unused_bufs() as:
 
  static void free_unused_bufs(struct virtnet_info *vi, struct virtqueue
  *svq,
struct virtqueue *rvq)
  {
   /* Use svq and rvq with the remaining code unchanged */
  }

 Not sure I understand. I am just suggesting
 adding symmetrical functions like init/cleanup
 alloc/free etc instead of adding stuff in random
 functions that just happens to be called at the right time.

OK, I will clean up this part in the next revision.

  I was not sure what is the best way - a sysctl parameter? Or should the
  maximum depend on number of host cpus? But that results in too many
  threads, e.g. if I have 16 cpus and 16 txqs.

 I guess the question is, wouldn't # of threads == # of vqs work best?
 If we process stuff on a single CPU, let's make it pass through
 a single VQ.
 And to do this, we could simply open multiple vhost fds without
 changing vhost at all.

 Would this work well?

- enum vhost_net_poll_state 
tx_poll_state;
+ enum vhost_net_poll_state 
*tx_poll_state;
  
   another array?
 
  Yes... I am also allocating twice the space than what is required
  to make it's usage simple.

 Where's the allocation? Couldn't find it.

vhost_setup_vqs(net.c) allocates it based on nvqs, though numtxqs is
enough.

Thanks,

- KK

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] [RFC] Changes for MQ virtio-net

2011-03-08 Thread Michael S. Tsirkin
On Tue, Mar 08, 2011 at 09:02:38PM +0530, Krishna Kumar2 wrote:
 Michael S. Tsirkin m...@redhat.com wrote on 03/02/2011 03:36:00 PM:
 
 Sorry for the delayed response, I have been sick the last few days.
 I am responding to both your posts here.
 
   Both virtio-net and vhost need some check to make sure very
   high values are not passed by userspace. Is this not required?
 
  Whatever we stick in the header is effectively part of
  host/gues interface. Are you sure we'll never want
  more than 16 VQs? This value does not seem that high.
 
 OK, so even constants cannot change?

Parse error :).

 Given that, should I remove all
 checks and use kcalloc?

I think that it's not a bad idea to avoid crashing if
hardware (in our case, host) misbehaves.
But as long as the code is prepared to handle any # of vqs,
I don't see a point of limiting that arbitrarily,
and if the code to handle hardware errors is complex
it'll have bugs itself.

   OK, so virtio_net_config has num_queue_pairs, and this gets converted to
   numtxqs in virtnet_info?
 
  Or put num_queue_pairs in virtnet_info too.
 
 For virtnet_info, having numtxqs is easier since all code that loops needs
 only 'numtxq'.

It seemed to me that the code used numtxqs for # of rx qs as well
sometimes.

   Also, vhost has some
   code that processes tx first before rx (e.g. vhost_net_stop/flush),
 
  No idea why did I do it this way. I don't think it matters.
 
   so this approach seemed helpful.
   I am OK either way, what do you
   suggest?
 
  We get less code generated but also less flexibility.
  I am not sure, I'll play around with code, for now
  let's keep it as is.
 
 OK.
 
   Yes, it is a waste to have these vectors for tx ints. I initially
   thought of adding a flag to virtio_device to pass to vp_find_vqs,
   but it won't work, so a new API is needed. I can work with you on
   this in the background if you like.
 
  OK. For starters, how about we change find_vqs to get a structure?  Then
  we can easily add flags that tell us that some interrupts are rare.
 
 Yes. OK to work on this outside this patch series, I guess?

We could work on this in parallel. Changing APIs is not a problem,
I'm a bit concerned that this might affect the host/guest ABI as well.

   vq's are matched between qemu, virtio-net and vhost. Isn't some check
   required that userspace has not passed a bad value?
 
 
  For virtio, I'm not too concerned: qemu can already easily
  crash the guest :)
  For vhost yes, but I'm concerned that even with 16 VQs we are
  drinking a lot of resources already. I would be happier
  if we had a file descriptor per VQs pair in some way.
  The the amount of memory userspace can use up is
  limited by the # of file descriptors.
 
 I will start working on this approach this week and see how it goes.

Also, could you post your current version of the qemu code pls?
It's useful for testing and to see the whole picture.

   OK, so define free_unused_bufs() as:
  
   static void free_unused_bufs(struct virtnet_info *vi, struct virtqueue
   *svq,
   struct virtqueue *rvq)
   {
  /* Use svq and rvq with the remaining code unchanged */
   }
 
  Not sure I understand. I am just suggesting
  adding symmetrical functions like init/cleanup
  alloc/free etc instead of adding stuff in random
  functions that just happens to be called at the right time.
 
 OK, I will clean up this part in the next revision.
 
   I was not sure what is the best way - a sysctl parameter? Or should the
   maximum depend on number of host cpus? But that results in too many
   threads, e.g. if I have 16 cpus and 16 txqs.
 
  I guess the question is, wouldn't # of threads == # of vqs work best?
  If we process stuff on a single CPU, let's make it pass through
  a single VQ.
  And to do this, we could simply open multiple vhost fds without
  changing vhost at all.
 
  Would this work well?
 
 -   enum vhost_net_poll_state 
 tx_poll_state;
 +   enum vhost_net_poll_state 
 *tx_poll_state;
   
another array?
  
   Yes... I am also allocating twice the space than what is required
   to make it's usage simple.
 
  Where's the allocation? Couldn't find it.
 
 vhost_setup_vqs(net.c) allocates it based on nvqs, though numtxqs is
 enough.
 
 Thanks,
 
 - KK
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] [RFC] Changes for MQ virtio-net

2011-03-08 Thread Krishna Kumar2
Michael S. Tsirkin m...@redhat.com wrote on 03/08/2011 09:11:04 PM:

 Also, could you post your current version of the qemu code pls?
 It's useful for testing and to see the whole picture.

Sorry for the delay on this.

I am attaching the qemu changes. Some parts of the patch are
completely redundant, eg MAX_TUN_DEVICES and I will remove it
later.

It works with latest qemu and the kernel patch sent earlier.

Please let me know if there are any issues.

thanks,

- KK


(See attached file: qemu.patch)

qemu.patch
Description: Binary data


Re: [PATCH 2/3] [RFC] Changes for MQ virtio-net

2011-03-02 Thread Michael S. Tsirkin
On Tue, Mar 01, 2011 at 09:32:56PM +0530, Krishna Kumar2 wrote:
 Michael S. Tsirkin m...@redhat.com wrote on 02/28/2011 03:13:20 PM:
 
 Thank you once again for your feedback on both these patches.
 I will send the qemu patch tomorrow. I will also send the next
 version incorporating these suggestions once we finalize some
 minor points.
 
  Overall looks good.
  The numtxqs meaning the number of rx queues needs some cleanup.
  init/cleanup routines need more symmetry.
  Error handling on setup also seems slightly buggy or at least
 asymmetrical.
  Finally, this will use up a large number of MSI vectors,
  while TX interrupts mostly stay unused.
 
  Some comments below.
 
   +/* Maximum number of individual RX/TX queues supported */
   +#define VIRTIO_MAX_TXQS   16
   +
 
  This also does not seem to belong in the header.
 
 Both virtio-net and vhost need some check to make sure very
 high values are not passed by userspace. Is this not required?

Whatever we stick in the header is effectively part of
host/gues interface. Are you sure we'll never want
more than 16 VQs? This value does not seem that high.

   +#define VIRTIO_NET_F_NUMTXQS  21  /* Device 
   supports multiple
 TX queue */
 
  VIRTIO_NET_F_MULTIQUEUE ?
 
 Yes, that's a better name.
 
   @@ -34,6 +38,8 @@ struct virtio_net_config {
  __u8 mac[6];
  /* See VIRTIO_NET_F_STATUS and VIRTIO_NET_S_* above */
  __u16 status;
   +  /* number of RX/TX queues */
   +  __u16 numtxqs;
 
  The interface here is a bit ugly:
  - this is really both # of tx and rx queues but called numtxqs
  - there's a hardcoded max value
  - 0 is assumed to be same as 1
  - assumptions above are undocumented.
 
  One way to address this could be num_queue_pairs, and something like
   /* The actual number of TX and RX queues is num_queue_pairs +
 1 each. */
   __u16 num_queue_pairs;
  (and tweak code to match).
 
  Alternatively, have separate registers for the number of tx and rx
 queues.
 
 OK, so virtio_net_config has num_queue_pairs, and this gets converted to
 numtxqs in virtnet_info?

Or put num_queue_pairs in virtnet_info too.

   +struct virtnet_info {
   +  struct send_queue **sq;
   +  struct receive_queue **rq;
   +
   +  /* read-mostly variables */
   +  int numtxqs cacheline_aligned_in_smp;
 
  Why do you think this alignment is a win?
 
 Actually this code was from the earlier patchset (MQ TX only) where
 the layout was different. Now rq and sq are allocated as follows:
   vi-sq = kzalloc(numtxqs * sizeof(*vi-sq), GFP_KERNEL);
   for (i = 0; i  numtxqs; i++) {
   vi-sq[i] = kzalloc(sizeof(*vi-sq[i]), GFP_KERNEL);
 Since the two pointers becomes read-only during use, there is no cache
 line dirty'ing.  I will remove this directive.
 
   +/*
   + * Note for 'qnum' below:
   + *first 'numtxqs' vqs are RX, next 'numtxqs' vqs are TX.
   + */
 
  Another option to consider is to have them RX,TX,RX,TX:
  this way vq-queue_index / 2 gives you the
  queue pair number, no need to read numtxqs. On the other hand, it makes
 the
  #RX==#TX assumption even more entrenched.
 
 OK. I was following how many drivers were allocating RX and TX's
 together - eg ixgbe_adapter has tx_ring and rx_ring arrays; bnx2
 has rx_buf_ring and tx_buf_ring arrays, etc.

That's fine. I am only talking about the VQ numbers.

 Also, vhost has some
 code that processes tx first before rx (e.g. vhost_net_stop/flush),

No idea why did I do it this way. I don't think it matters.

 so this approach seemed helpful.
 I am OK either way, what do you
 suggest?

We get less code generated but also less flexibility.
I am not sure, I'll play around with code, for now
let's keep it as is.


   +  err = vi-vdev-config-find_vqs(vi-vdev, totalvqs, vqs,
 callbacks,
   + 
 (const char
 **)names);
   +  if (err)
   +  goto free_params;
   +
 
  This would use up quite a lot of vectors. However,
  tx interrupt is, in fact, slow path. So, assuming we don't have
  enough vectors to use per vq, I think it's a good idea to
  support reducing MSI vector usage by mapping all TX VQs to the same
 vector
  and separate vectors for RX.
  The hypervisor actually allows this, but we don't have an API at the
 virtio
  level to pass that info to virtio pci ATM.
  Any idea what a good API to use would be?
 
 Yes, it is a waste to have these vectors for tx ints. I initially
 thought of adding a flag to virtio_device to pass to vp_find_vqs,
 but it won't work, so a new API is needed. I can work with you on
 this in the background if you like.


OK. For starters, how about we change find_vqs to get a structure?  Then
we can easily add flags that tell us that some interrupts are rare.


diff --git a/include/linux/virtio_config.h 

Re: [PATCH 2/3] [RFC] Changes for MQ virtio-net

2011-03-01 Thread Krishna Kumar2
Michael S. Tsirkin m...@redhat.com wrote on 02/28/2011 03:13:20 PM:

Thank you once again for your feedback on both these patches.
I will send the qemu patch tomorrow. I will also send the next
version incorporating these suggestions once we finalize some
minor points.

 Overall looks good.
 The numtxqs meaning the number of rx queues needs some cleanup.
 init/cleanup routines need more symmetry.
 Error handling on setup also seems slightly buggy or at least
asymmetrical.
 Finally, this will use up a large number of MSI vectors,
 while TX interrupts mostly stay unused.

 Some comments below.

  +/* Maximum number of individual RX/TX queues supported */
  +#define VIRTIO_MAX_TXQS 16
  +

 This also does not seem to belong in the header.

Both virtio-net and vhost need some check to make sure very
high values are not passed by userspace. Is this not required?

  +#define VIRTIO_NET_F_NUMTXQS21  /* Device 
  supports multiple
TX queue */

 VIRTIO_NET_F_MULTIQUEUE ?

Yes, that's a better name.

  @@ -34,6 +38,8 @@ struct virtio_net_config {
   __u8 mac[6];
   /* See VIRTIO_NET_F_STATUS and VIRTIO_NET_S_* above */
   __u16 status;
  +/* number of RX/TX queues */
  +__u16 numtxqs;

 The interface here is a bit ugly:
 - this is really both # of tx and rx queues but called numtxqs
 - there's a hardcoded max value
 - 0 is assumed to be same as 1
 - assumptions above are undocumented.

 One way to address this could be num_queue_pairs, and something like
/* The actual number of TX and RX queues is num_queue_pairs +
1 each. */
__u16 num_queue_pairs;
 (and tweak code to match).

 Alternatively, have separate registers for the number of tx and rx
queues.

OK, so virtio_net_config has num_queue_pairs, and this gets converted to
numtxqs in virtnet_info?

  +struct virtnet_info {
  +struct send_queue **sq;
  +struct receive_queue **rq;
  +
  +/* read-mostly variables */
  +int numtxqs cacheline_aligned_in_smp;

 Why do you think this alignment is a win?

Actually this code was from the earlier patchset (MQ TX only) where
the layout was different. Now rq and sq are allocated as follows:
vi-sq = kzalloc(numtxqs * sizeof(*vi-sq), GFP_KERNEL);
for (i = 0; i  numtxqs; i++) {
vi-sq[i] = kzalloc(sizeof(*vi-sq[i]), GFP_KERNEL);
Since the two pointers becomes read-only during use, there is no cache
line dirty'ing.  I will remove this directive.

  +/*
  + * Note for 'qnum' below:
  + *  first 'numtxqs' vqs are RX, next 'numtxqs' vqs are TX.
  + */

 Another option to consider is to have them RX,TX,RX,TX:
 this way vq-queue_index / 2 gives you the
 queue pair number, no need to read numtxqs. On the other hand, it makes
the
 #RX==#TX assumption even more entrenched.

OK. I was following how many drivers were allocating RX and TX's
together - eg ixgbe_adapter has tx_ring and rx_ring arrays; bnx2
has rx_buf_ring and tx_buf_ring arrays, etc. Also, vhost has some
code that processes tx first before rx (e.g. vhost_net_stop/flush),
so this approach seemed helpful. I am OK either way, what do you
suggest?

  +err = vi-vdev-config-find_vqs(vi-vdev, totalvqs, vqs,
callbacks,
  +   
(const char
**)names);
  +if (err)
  +goto free_params;
  +

 This would use up quite a lot of vectors. However,
 tx interrupt is, in fact, slow path. So, assuming we don't have
 enough vectors to use per vq, I think it's a good idea to
 support reducing MSI vector usage by mapping all TX VQs to the same
vector
 and separate vectors for RX.
 The hypervisor actually allows this, but we don't have an API at the
virtio
 level to pass that info to virtio pci ATM.
 Any idea what a good API to use would be?

Yes, it is a waste to have these vectors for tx ints. I initially
thought of adding a flag to virtio_device to pass to vp_find_vqs,
but it won't work, so a new API is needed. I can work with you on
this in the background if you like.

  +for (i = 0; i  numtxqs; i++) {
  +vi-rq[i]-rvq = vqs[i];
  +vi-sq[i]-svq = vqs[i + numtxqs];

 This logic is spread all over. We need some kind of macro to
 get queue number of vq number and back.

Will add this.

  +if (virtio_has_feature(vi-vdev, VIRTIO_NET_F_CTRL_VQ)) {
  +vi-cvq = vqs[i + numtxqs];
  +
  +if (virtio_has_feature(vi-vdev,
VIRTIO_NET_F_CTRL_VLAN))
  +vi-dev-features |=
NETIF_F_HW_VLAN_FILTER;

 This bit does not seem to belong in initialize_vqs.

I will move it back to probe.

  +err = virtio_config_val(vdev, VIRTIO_NET_F_NUMTXQS,
  +

Re: [PATCH 2/3] [RFC] Changes for MQ virtio-net

2011-02-28 Thread Michael S. Tsirkin
On Mon, Feb 28, 2011 at 12:04:37PM +0530, Krishna Kumar wrote:
 Implement mq virtio-net driver. 
 
 Though struct virtio_net_config changes, it works with the old
 qemu since the last element is not accessed unless qemu sets
 VIRTIO_NET_F_NUMTXQS.  Patch also adds a macro for the maximum
 number of TX vq's (VIRTIO_MAX_TXQS) that the user can specify.
 
 Signed-off-by: Krishna Kumar krkum...@in.ibm.com

Overall looks good.
The numtxqs meaning the number of rx queues needs some cleanup.
init/cleanup routines need more symmetry.
Error handling on setup also seems slightly buggy or at least asymmetrical.
Finally, this will use up a large number of MSI vectors,
while TX interrupts mostly stay unused.

Some comments below.

 ---
  drivers/net/virtio_net.c   |  543 ---
  include/linux/virtio_net.h |6 
  2 files changed, 386 insertions(+), 163 deletions(-)
 
 diff -ruNp org/include/linux/virtio_net.h new/include/linux/virtio_net.h
 --- org/include/linux/virtio_net.h2010-10-11 10:20:22.0 +0530
 +++ new/include/linux/virtio_net.h2011-02-25 16:24:15.0 +0530
 @@ -7,6 +7,9 @@
  #include linux/virtio_config.h
  #include linux/if_ether.h
  
 +/* Maximum number of individual RX/TX queues supported */
 +#define VIRTIO_MAX_TXQS  16
 +

This also does not seem to belong in the header.

  /* The feature bitmap for virtio net */
  #define VIRTIO_NET_F_CSUM0   /* Host handles pkts w/ partial csum */
  #define VIRTIO_NET_F_GUEST_CSUM  1   /* Guest handles pkts w/ 
 partial csum */
 @@ -26,6 +29,7 @@
  #define VIRTIO_NET_F_CTRL_RX 18  /* Control channel RX mode support */
  #define VIRTIO_NET_F_CTRL_VLAN   19  /* Control channel VLAN 
 filtering */
  #define VIRTIO_NET_F_CTRL_RX_EXTRA 20/* Extra RX mode control 
 support */
 +#define VIRTIO_NET_F_NUMTXQS 21  /* Device supports multiple TX queue */

VIRTIO_NET_F_MULTIQUEUE ?

  
  #define VIRTIO_NET_S_LINK_UP 1   /* Link is up */
  
 @@ -34,6 +38,8 @@ struct virtio_net_config {
   __u8 mac[6];
   /* See VIRTIO_NET_F_STATUS and VIRTIO_NET_S_* above */
   __u16 status;
 + /* number of RX/TX queues */
 + __u16 numtxqs;


The interface here is a bit ugly:
- this is really both # of tx and rx queues but called numtxqs
- there's a hardcoded max value
- 0 is assumed to be same as 1
- assumptions above are undocumented.

One way to address this could be num_queue_pairs, and something like
/* The actual number of TX and RX queues is num_queue_pairs + 1 each. */
__u16 num_queue_pairs;
(and tweak code to match).

Alternatively, have separate registers for the number of tx and rx queues.

  } __attribute__((packed));
  
  /* This is the first element of the scatter-gather list.  If you don't
 diff -ruNp org/drivers/net/virtio_net.c new/drivers/net/virtio_net.c
 --- org/drivers/net/virtio_net.c  2011-02-21 17:55:42.0 +0530
 +++ new/drivers/net/virtio_net.c  2011-02-25 16:23:41.0 +0530
 @@ -40,31 +40,53 @@ module_param(gso, bool, 0444);
  
  #define VIRTNET_SEND_COMMAND_SG_MAX2
  
 -struct virtnet_info {
 - struct virtio_device *vdev;
 - struct virtqueue *rvq, *svq, *cvq;
 - struct net_device *dev;
 +/* Internal representation of a send virtqueue */
 +struct send_queue {
 + struct virtqueue *svq;
 +
 + /* TX: fragments + linear part + virtio header */
 + struct scatterlist tx_sg[MAX_SKB_FRAGS + 2];
 +};
 +
 +/* Internal representation of a receive virtqueue */
 +struct receive_queue {
 + /* Virtqueue associated with this receive_queue */
 + struct virtqueue *rvq;
 +
 + /* Back pointer to the virtnet_info */
 + struct virtnet_info *vi;
 +
   struct napi_struct napi;
 - unsigned int status;
  
   /* Number of input buffers, and max we've ever had. */
   unsigned int num, max;
  
 - /* I like... big packets and I cannot lie! */
 - bool big_packets;
 -
 - /* Host will merge rx buffers for big packets (shake it! shake it!) */
 - bool mergeable_rx_bufs;
 -
   /* Work struct for refilling if we run low on memory. */
   struct delayed_work refill;
  
   /* Chain pages by the private ptr. */
   struct page *pages;
  
 - /* fragments + linear part + virtio header */
 + /* RX: fragments + linear part + virtio header */
   struct scatterlist rx_sg[MAX_SKB_FRAGS + 2];
 - struct scatterlist tx_sg[MAX_SKB_FRAGS + 2];
 +};
 +
 +struct virtnet_info {
 + struct send_queue **sq;
 + struct receive_queue **rq;
 +
 + /* read-mostly variables */
 + int numtxqs cacheline_aligned_in_smp;   /* # of rxqs/txqs */

Why do you think this alignment is a win?

 + struct virtio_device *vdev;
 + struct virtqueue *cvq;
 + struct net_device *dev;
 + unsigned int status;
 +
 + /* I like... big packets and I cannot lie! */
 + bool big_packets;
 +
 + /* Host will merge rx buffers for big packets (shake 

[PATCH 2/3] [RFC] Changes for MQ virtio-net

2011-02-27 Thread Krishna Kumar
Implement mq virtio-net driver. 

Though struct virtio_net_config changes, it works with the old
qemu since the last element is not accessed unless qemu sets
VIRTIO_NET_F_NUMTXQS.  Patch also adds a macro for the maximum
number of TX vq's (VIRTIO_MAX_TXQS) that the user can specify.

Signed-off-by: Krishna Kumar krkum...@in.ibm.com
---
 drivers/net/virtio_net.c   |  543 ---
 include/linux/virtio_net.h |6 
 2 files changed, 386 insertions(+), 163 deletions(-)

diff -ruNp org/include/linux/virtio_net.h new/include/linux/virtio_net.h
--- org/include/linux/virtio_net.h  2010-10-11 10:20:22.0 +0530
+++ new/include/linux/virtio_net.h  2011-02-25 16:24:15.0 +0530
@@ -7,6 +7,9 @@
 #include linux/virtio_config.h
 #include linux/if_ether.h
 
+/* Maximum number of individual RX/TX queues supported */
+#define VIRTIO_MAX_TXQS16
+
 /* The feature bitmap for virtio net */
 #define VIRTIO_NET_F_CSUM  0   /* Host handles pkts w/ partial csum */
 #define VIRTIO_NET_F_GUEST_CSUM1   /* Guest handles pkts w/ 
partial csum */
@@ -26,6 +29,7 @@
 #define VIRTIO_NET_F_CTRL_RX   18  /* Control channel RX mode support */
 #define VIRTIO_NET_F_CTRL_VLAN 19  /* Control channel VLAN filtering */
 #define VIRTIO_NET_F_CTRL_RX_EXTRA 20  /* Extra RX mode control support */
+#define VIRTIO_NET_F_NUMTXQS   21  /* Device supports multiple TX queue */
 
 #define VIRTIO_NET_S_LINK_UP   1   /* Link is up */
 
@@ -34,6 +38,8 @@ struct virtio_net_config {
__u8 mac[6];
/* See VIRTIO_NET_F_STATUS and VIRTIO_NET_S_* above */
__u16 status;
+   /* number of RX/TX queues */
+   __u16 numtxqs;
 } __attribute__((packed));
 
 /* This is the first element of the scatter-gather list.  If you don't
diff -ruNp org/drivers/net/virtio_net.c new/drivers/net/virtio_net.c
--- org/drivers/net/virtio_net.c2011-02-21 17:55:42.0 +0530
+++ new/drivers/net/virtio_net.c2011-02-25 16:23:41.0 +0530
@@ -40,31 +40,53 @@ module_param(gso, bool, 0444);
 
 #define VIRTNET_SEND_COMMAND_SG_MAX2
 
-struct virtnet_info {
-   struct virtio_device *vdev;
-   struct virtqueue *rvq, *svq, *cvq;
-   struct net_device *dev;
+/* Internal representation of a send virtqueue */
+struct send_queue {
+   struct virtqueue *svq;
+
+   /* TX: fragments + linear part + virtio header */
+   struct scatterlist tx_sg[MAX_SKB_FRAGS + 2];
+};
+
+/* Internal representation of a receive virtqueue */
+struct receive_queue {
+   /* Virtqueue associated with this receive_queue */
+   struct virtqueue *rvq;
+
+   /* Back pointer to the virtnet_info */
+   struct virtnet_info *vi;
+
struct napi_struct napi;
-   unsigned int status;
 
/* Number of input buffers, and max we've ever had. */
unsigned int num, max;
 
-   /* I like... big packets and I cannot lie! */
-   bool big_packets;
-
-   /* Host will merge rx buffers for big packets (shake it! shake it!) */
-   bool mergeable_rx_bufs;
-
/* Work struct for refilling if we run low on memory. */
struct delayed_work refill;
 
/* Chain pages by the private ptr. */
struct page *pages;
 
-   /* fragments + linear part + virtio header */
+   /* RX: fragments + linear part + virtio header */
struct scatterlist rx_sg[MAX_SKB_FRAGS + 2];
-   struct scatterlist tx_sg[MAX_SKB_FRAGS + 2];
+};
+
+struct virtnet_info {
+   struct send_queue **sq;
+   struct receive_queue **rq;
+
+   /* read-mostly variables */
+   int numtxqs cacheline_aligned_in_smp;   /* # of rxqs/txqs */
+   struct virtio_device *vdev;
+   struct virtqueue *cvq;
+   struct net_device *dev;
+   unsigned int status;
+
+   /* I like... big packets and I cannot lie! */
+   bool big_packets;
+
+   /* Host will merge rx buffers for big packets (shake it! shake it!) */
+   bool mergeable_rx_bufs;
 };
 
 struct skb_vnet_hdr {
@@ -94,22 +116,22 @@ static inline struct skb_vnet_hdr *skb_v
  * private is used to chain pages for big packets, put the whole
  * most recent used list in the beginning for reuse
  */
-static void give_pages(struct virtnet_info *vi, struct page *page)
+static void give_pages(struct receive_queue *rq, struct page *page)
 {
struct page *end;
 
/* Find end of list, sew whole thing into vi-pages. */
for (end = page; end-private; end = (struct page *)end-private);
-   end-private = (unsigned long)vi-pages;
-   vi-pages = page;
+   end-private = (unsigned long)rq-pages;
+   rq-pages = page;
 }
 
-static struct page *get_a_page(struct virtnet_info *vi, gfp_t gfp_mask)
+static struct page *get_a_page(struct receive_queue *rq, gfp_t gfp_mask)
 {
-   struct page *p = vi-pages;
+   struct page *p = rq-pages;
 
if (p) {
-   vi-pages = (struct page *)p-private;
+