Re: PATCH v2 3/4] Defer skb allocation -- new recvbuf alloc receive calls

2009-12-15 Thread Michael S. Tsirkin
On Mon, Dec 14, 2009 at 02:08:38PM -0800, Shirley Ma wrote:
 On Sun, 2009-12-13 at 13:43 +0200, Michael S. Tsirkin wrote:
  Interesting. I think skb_goodcopy will sometimes
  set *page to NULL. Will the above crash then?
 
 Nope, when *page is NULL, *len is 0.

Hmm. Yes, I see, it is here:
+   if (*len) {
+   *len = skb_set_frag(skb, *page, offset, *len);
+   *page = (struct page *)(*page)-private;
+   } else {
+   give_pages(vi, *page);
+   *page = NULL;
+   }

So what I would suggest is, have function
that just copies part of skb, and have
caller open-code allocating the skb and free up
pages as necessary.

  don't put empty line here. if below is part of same logical block as
  skb_goodcopy.
 Ok.
 
  Local variable shadows a parameter.
  It seems gcc will let you get away with a warning,
  but this is not legal C.
 Ok.
 
   +
   + i = skb_shinfo(skb)-nr_frags;
   + if (i = MAX_SKB_FRAGS) {
   + pr_debug(%s: packet too long %d\n,
  skb-dev-name,
   +  len);
  
  If this happens, we have corrupted memory already.
  We do need this check, but please put is before you increment
  nr_frags.
 
 It is before increase for mergeable buffer case. Only one page(one frag)
 per get_buf.
 
   + skb-dev-stats.rx_length_errors++;
   + return skb;
  
  This will propagate the error up the stack and corrupt
  more memory.
 
 I just copied the code from original code. There might not be a problem
 for mergeable buffer. I will double check.
 
  sizeof hdr-hdr
 Ok.
 
   +
   + skb_to_sgvec(skb, sg+1, 0, skb-len);
  
  space around +
 Ok.
 
   +
   + err = vi-rvq-vq_ops-add_buf(vi-rvq, sg, 0, 2, skb);
   + if (err  0)
   + kfree_skb(skb);
   + else
   + skb_queue_head(vi-recv, skb);
  
  So why are we queueing this still?
 This is for small packet. I didn't change that code since it will
 involve extra copy by using page.

What I am asking is why do we add skb in vi-recv.
Can't we use vq destoy hack here as well?

   +
   + return err;
   +}
   +
   +static int add_recvbuf_big(struct virtnet_info *vi, gfp_t gfp, bool
  *oom)
   +{
   + struct scatterlist sg[2 + MAX_SKB_FRAGS];
  
  MAX_SKB_FRAGS + 2 will be more readable.
  Also, create a macro for this constant and document
  why does +2 make sense?
 
 One is for big packet virtio_net_hdr, one is for goodcopy skb.


Maybe put this in a comment then.

  Again, pls explain *why* do we want 16 byte alignment.
  Also this code seems duplicated?
  Please put structs at top of file where they
  can be found.
 Ok.
 
   + };
   +
   + offset = sizeof(struct padded_vnet_hdr);
   +
   + for (i = total - 1; i  0; i--) {
  
  I prefer --i.
 Ok.
 
  Also, total is just a constant.
  So simply MAX_SKB_FRAGS + 1 will be clearer.
 Ok.
 
  Why do we scan last to first?
  If there's reason, please add a comment.
 We use page private to maintain next page, here there is no scan last to
 first, just add the new page in list head instead of list tail, which
 will require scan the list.

I mean the for loop: can it be for(i = 0, ..., ++i) just as well?
Why do you start at the end of buffer and decrement?

  space around - .
 Ok.
 
  All the if (i == 1) handling on exit is really hard to grok.
  How about moving common code out of this loop
  into a function, and then you can
  for (i = total - 1; i  1; i--) {
  handle(i);
  }
  handle(1);
  handle(0);
  add_buf
 That works.
 
  do we really need *oom here and below?
  We can just set err to ENOMEM, no?
 We could.
 
  Please do not return 0 on failure.
 
 Ok.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: PATCH v2 3/4] Defer skb allocation -- new recvbuf alloc receive calls

2009-12-15 Thread Shirley Ma
On Tue, 2009-12-15 at 13:33 +0200, Michael S. Tsirkin wrote:
 So what I would suggest is, have function
 that just copies part of skb, and have
 caller open-code allocating the skb and free up
 pages as necessary.
Yes, the updated patch has changed the function.

 What I am asking is why do we add skb in vi-recv.
 Can't we use vq destoy hack here as well?
Yes, I removed recv queue skb link totally in the updated patch.

  One is for big packet virtio_net_hdr, one is for goodcopy skb.
 
 
 Maybe put this in a comment then.
Ok, will do.

 
 I mean the for loop: can i be for(i = 0, ..., ++i) just as well?
 Why do you start at the end of buffer and decrement?

Are asking why reverse order for new page to sg? The reason is we link
the new page in first, and only maintain the first pointer. So the most
recent new page should be related to sg[0], if we put the new page in
the last, then we need to travel the page list to get last pointer. Am I
missing your point here?

Thanks
Shirley 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: PATCH v2 3/4] Defer skb allocation -- new recvbuf alloc receive calls

2009-12-15 Thread Michael S. Tsirkin
On Tue, Dec 15, 2009 at 08:25:20AM -0800, Shirley Ma wrote:
 On Tue, 2009-12-15 at 13:33 +0200, Michael S. Tsirkin wrote:
  So what I would suggest is, have function
  that just copies part of skb, and have
  caller open-code allocating the skb and free up
  pages as necessary.
 Yes, the updated patch has changed the function.
 
  What I am asking is why do we add skb in vi-recv.
  Can't we use vq destoy hack here as well?
 Yes, I removed recv queue skb link totally in the updated patch.
 
   One is for big packet virtio_net_hdr, one is for goodcopy skb.
  
  
  Maybe put this in a comment then.
 Ok, will do.
 
  
  I mean the for loop: can i be for(i = 0, ..., ++i) just as well?
  Why do you start at the end of buffer and decrement?
 
 Are asking why reverse order for new page to sg? The reason is we link
 the new page in first, and only maintain the first pointer. So the most
 recent new page should be related to sg[0], if we put the new page in
 the last, then we need to travel the page list to get last pointer. Am I
 missing your point here?
 
 Thanks
 Shirley 


No, that was what I was looking for.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: PATCH v2 3/4] Defer skb allocation -- new recvbuf alloc receive calls

2009-12-14 Thread Shirley Ma
On Sun, 2009-12-13 at 13:43 +0200, Michael S. Tsirkin wrote:
 Interesting. I think skb_goodcopy will sometimes
 set *page to NULL. Will the above crash then?

Nope, when *page is NULL, *len is 0.

 don't put empty line here. if below is part of same logical block as
 skb_goodcopy.
Ok.

 Local variable shadows a parameter.
 It seems gcc will let you get away with a warning,
 but this is not legal C.
Ok.

  +
  + i = skb_shinfo(skb)-nr_frags;
  + if (i = MAX_SKB_FRAGS) {
  + pr_debug(%s: packet too long %d\n,
 skb-dev-name,
  +  len);
 
 If this happens, we have corrupted memory already.
 We do need this check, but please put is before you increment
 nr_frags.

It is before increase for mergeable buffer case. Only one page(one frag)
per get_buf.

  + skb-dev-stats.rx_length_errors++;
  + return skb;
 
 This will propagate the error up the stack and corrupt
 more memory.

I just copied the code from original code. There might not be a problem
for mergeable buffer. I will double check.

 sizeof hdr-hdr
Ok.

  +
  + skb_to_sgvec(skb, sg+1, 0, skb-len);
 
 space around +
Ok.

  +
  + err = vi-rvq-vq_ops-add_buf(vi-rvq, sg, 0, 2, skb);
  + if (err  0)
  + kfree_skb(skb);
  + else
  + skb_queue_head(vi-recv, skb);
 
 So why are we queueing this still?
This is for small packet. I didn't change that code since it will
involve extra copy by using page.

  +
  + return err;
  +}
  +
  +static int add_recvbuf_big(struct virtnet_info *vi, gfp_t gfp, bool
 *oom)
  +{
  + struct scatterlist sg[2 + MAX_SKB_FRAGS];
 
 MAX_SKB_FRAGS + 2 will be more readable.
 Also, create a macro for this constant and document
 why does +2 make sense?

One is for big packet virtio_net_hdr, one is for goodcopy skb.

 Again, pls explain *why* do we want 16 byte alignment.
 Also this code seems duplicated?
 Please put structs at top of file where they
 can be found.
Ok.

  + };
  +
  + offset = sizeof(struct padded_vnet_hdr);
  +
  + for (i = total - 1; i  0; i--) {
 
 I prefer --i.
Ok.

 Also, total is just a constant.
 So simply MAX_SKB_FRAGS + 1 will be clearer.
Ok.

 Why do we scan last to first?
 If there's reason, please add a comment.
We use page private to maintain next page, here there is no scan last to
first, just add the new page in list head instead of list tail, which
will require scan the list.

 space around - .
Ok.

 All the if (i == 1) handling on exit is really hard to grok.
 How about moving common code out of this loop
 into a function, and then you can
 for (i = total - 1; i  1; i--) {
 handle(i);
 }
 handle(1);
 handle(0);
 add_buf
That works.

 do we really need *oom here and below?
 We can just set err to ENOMEM, no?
We could.

 Please do not return 0 on failure.

Ok.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: PATCH v2 3/4] Defer skb allocation -- new recvbuf alloc receive calls

2009-12-14 Thread Shirley Ma
Hello Michael,

On Mon, 2009-12-14 at 14:08 -0800, Shirley Ma wrote:
   +
   + err = vi-rvq-vq_ops-add_buf(vi-rvq, sg, 0, 2, skb);
   + if (err  0)
   + kfree_skb(skb);
   + else
   + skb_queue_head(vi-recv, skb);
  
  So why are we queueing this still?
 This is for small packet. I didn't change that code since it will
 involve extra copy by using page.

I think I can remove skb link for small packet as well by adding
kfree_skb() in virtio_net_free_bufs for small packet.

Thanks
Shirley

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: PATCH v2 3/4] Defer skb allocation -- new recvbuf alloc receive calls

2009-12-13 Thread Michael S. Tsirkin
On Fri, Dec 11, 2009 at 04:46:53AM -0800, Shirley Ma wrote:
 Signed-off-by: Shirley Ma x...@us.ibm.com
 -
 
 diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
 index 100b4b9..dde8060 100644
 --- a/drivers/net/virtio_net.c
 +++ b/drivers/net/virtio_net.c
 @@ -203,6 +203,73 @@ static struct sk_buff *skb_goodcopy(struct virtnet_info 
 *vi, struct page **page,
   return skb;
  }
  
 +static struct sk_buff *receive_big(struct virtnet_info *vi, struct page 
 *page,
 +unsigned int len)
 +{
 + struct sk_buff *skb;
 +
 + skb = skb_goodcopy(vi, page, len);
 + if (unlikely(!skb))
 + return NULL;
 +
 + while (len  0) {
 + len = skb_set_frag(skb, page, 0, len);
 + page = (struct page *)page-private;

Interesting. I think skb_goodcopy will sometimes
set *page to NULL. Will the above crash then?

 + }
 +
 + if (page)
 + give_pages(vi, page);
 +
 + return skb;
 +}
 +
 +static struct sk_buff *receive_mergeable(struct virtnet_info *vi,
 +  struct page *page, unsigned int len)
 +{
 + struct sk_buff *skb;
 + struct skb_vnet_hdr *hdr;
 + int num_buf, i;
 +
 + if (len  PAGE_SIZE)
 + len = PAGE_SIZE;
 +
 + skb = skb_goodcopy(vi, page, len);
 +

don't put empty line here. if below is part of same logical block as
skb_goodcopy.

 + if (unlikely(!skb))
 + return NULL;

don't we care that *page might not be NULL? why not?

 +
 + hdr = skb_vnet_hdr(skb);
 + num_buf = hdr-mhdr.num_buffers;
 + while (--num_buf) {
 + struct page *page;

Local variable shadows a parameter.
It seems gcc will let you get away with a warning,
but this is not legal C.

 +
 + i = skb_shinfo(skb)-nr_frags;
 + if (i = MAX_SKB_FRAGS) {
 + pr_debug(%s: packet too long %d\n, skb-dev-name,
 +  len);


If this happens, we have corrupted memory already.
We do need this check, but please put is before you increment
nr_frags.

 + skb-dev-stats.rx_length_errors++;
 + return skb;

This will propagate the error up the stack and corrupt
more memory.

 + }
 +
 + page = vi-rvq-vq_ops-get_buf(vi-rvq, len);
 + if (!page) {
 + pr_debug(%s: rx error: %d buffers missing\n,
 +  skb-dev-name, hdr-mhdr.num_buffers);
 + skb-dev-stats.rx_length_errors++;
 + return skb;

Here, skb is some random part of packet, don't propagate
it up the stack.

 + }
 +
 + if (len  PAGE_SIZE)
 + len = PAGE_SIZE;
 +
 + skb_set_frag(skb, page, 0, len);
 +
 + vi-num--;
 + }
 +
 + return skb;
 +}
 +
  static void receive_skb(struct net_device *dev, struct sk_buff *skb,
   unsigned len)
  {
 @@ -356,6 +423,103 @@ drop:
   dev_kfree_skb(skb);
  }
  
 +static int add_recvbuf_small(struct virtnet_info *vi, gfp_t gfp, bool *oom)
 +{
 + struct sk_buff *skb;
 + struct skb_vnet_hdr *hdr;
 + struct scatterlist sg[2];
 + int err = 0;
 +
 + skb = netdev_alloc_skb(vi-dev, MAX_PACKET_LEN + NET_IP_ALIGN);
 + if (unlikely(!skb)) {
 + *oom = true;
 + return err;
 + }
 +
 + skb_reserve(skb, NET_IP_ALIGN);
 + skb_put(skb, MAX_PACKET_LEN);
 +
 + hdr = skb_vnet_hdr(skb);
 + sg_set_buf(sg, hdr-hdr, sizeof(hdr-hdr));

sizeof hdr-hdr

 +
 + skb_to_sgvec(skb, sg+1, 0, skb-len);

space around +

 +
 + err = vi-rvq-vq_ops-add_buf(vi-rvq, sg, 0, 2, skb);
 + if (err  0)
 + kfree_skb(skb);
 + else
 + skb_queue_head(vi-recv, skb);

So why are we queueing this still?

 +
 + return err;
 +}
 +
 +static int add_recvbuf_big(struct virtnet_info *vi, gfp_t gfp, bool *oom)
 +{
 + struct scatterlist sg[2 + MAX_SKB_FRAGS];

MAX_SKB_FRAGS + 2 will be more readable.
Also, create a macro for this constant and document
why does +2 make sense?

 + int total = MAX_SKB_FRAGS + 2;
 + char *p;
 + int err = 0;
 + int i, offset;
 + struct page *first = NULL;
 + struct page *page;
 + /* share one page between virtio_net header and data */
 + struct padded_vnet_hdr {
 + struct virtio_net_hdr hdr;
 + /* This padding makes our data 16 byte aligned */
 + char padding[6];

Again, pls explain *why* do we want 16 byte alignment.
Also this code seems duplicated?
Please put structs at top of file where they
can be found.

 + };
 +
 + offset = sizeof(struct padded_vnet_hdr);
 +
 + for (i = total - 1; i  0; i--) {

I prefer --i.
Also, total is just a constant.
So simply MAX_SKB_FRAGS + 1 will be clearer.
Why do we scan last to first?
If there's reason, please add a comment.

 + page =