Re: [RFC PATCH 06/13] xen-netback: copy buffer on xenvif_start_xmit()

2015-06-02 Thread Wei Liu
On Fri, May 22, 2015 at 10:26:48AM +, Joao Martins wrote:
[...]
 return IRQ_HANDLED;
  }
  @@ -168,8 +169,12 @@ static int xenvif_start_xmit(struct sk_buff *skb, 
  struct net_device *dev)
 cb = XENVIF_RX_CB(skb);
 cb-expires = jiffies + vif-drain_timeout;
  
  -  xenvif_rx_queue_tail(queue, skb);
  -  xenvif_kick_thread(queue);
  +  if (!queue-vif-persistent_grants) {
  +  xenvif_rx_queue_tail(queue, skb);
  +  xenvif_kick_thread(queue);
  +  } else if (xenvif_rx_map(queue, skb)) {
  +  return NETDEV_TX_BUSY;
  +  }
  
  
  We now have two different functions for guest RX, one is xenvif_rx_map,
  the other is xenvif_rx_action. They look very similar. Can we only have
  one?
 I think I can merge this into xenvif_rx_action, and I notice that the stall
 detection its missing. I will also add that.
 Perhaps I could also disable the RX kthread, since this doesn't get used with
 persistent grants?
 

Disabling that kthread is fine. But we do need to make sure we can do
the same things in start_xmit as we are in kthread. I.e. what context
does start_xmit run in and what are the restrictions.

Wei.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 06/13] xen-netback: copy buffer on xenvif_start_xmit()

2015-05-22 Thread Joao Martins

On 19 May 2015, at 17:35, Wei Liu wei.l...@citrix.com wrote:

 On Tue, May 12, 2015 at 07:18:30PM +0200, Joao Martins wrote:
 By introducing persistent grants we speed up the RX thread with the
 decreased copy cost, that leads to a throughput decrease of 20%.
 It is observed that the rx_queue stays mostly at 10% of its capacity,
 as opposed to full capacity when using grant copy. And a finer measure
 with lock_stat (below with pkt_size 64, burst 1) shows much higher wait
 queue contention on queue-wq, which hints that the RX kthread is
 waits/wake_up more often, that is actually doing work.
 
 Without persistent grants:
 
 class namecon-bouncescontentions   waittime-min   waittime-max
 waittime-total   waittime-avgacq-bounces   acquisitions   holdtime-min
 holdtime-max holdtime-total   holdtime-avg
 --
 queue-wq:   792792   0.36  24.36
 1140.30   1.44   42081002671   0.00
 46.75  538164.02   0.54
 --
 queue-wq326  [8115949f] __wake_up+0x2f/0x80
 queue-wq410  [811592bf] finish_wait+0x4f/0xa0
 queue-wq 56  [811593eb] prepare_to_wait+0x2b/0xb0
 --
 queue-wq202  [811593eb] prepare_to_wait+0x2b/0xb0
 queue-wq467  [8115949f] __wake_up+0x2f/0x80
 queue-wq123  [811592bf] finish_wait+0x4f/0xa0
 
 With persistent grants:
 
 queue-wq:   61834  61836   0.32  30.12
 99710.27   1.61 2414001125308   0.00
 75.61 1106578.82   0.98
 --
 queue-wq 5079[8115949f] __wake_up+0x2f/0x80
 queue-wq56280[811592bf] finish_wait+0x4f/0xa0
 queue-wq  479[811593eb] prepare_to_wait+0x2b/0xb0
 --
 queue-wq 1005[811592bf] finish_wait+0x4f/0xa0
 queue-wq56761[8115949f] __wake_up+0x2f/0x80
 queue-wq 4072[811593eb] prepare_to_wait+0x2b/0xb0
 
 Also, with persistent grants, we don't require batching grant copy ops
 (besides the initial copy+map) which makes me believe that deferring
 the skb to the RX kthread just adds up unnecessary overhead (for this
 particular case). This patch proposes copying the buffer on
 xenvif_start_xmit(), which lets us both remove the contention on
 queue-wq and lock on rx_queue. Here, an alternative to
 xenvif_rx_action routine is added namely xenvif_rx_map() that maps
 and copies the buffer to the guest. This is only used when persistent
 grants are used, since it would otherwise mean an hypercall per
 packet.
 
 Improvements are up to a factor of 2.14 with a single queue getting us
 from 1.04 Mpps to 1.7 Mpps (burst 1, pkt_size 64) and 1.5 to 2.6 Mpps
 (burst 2, pkt_size 64) compared to using the kthread. Maximum with grant
 copy is 1.2 Mpps, irrespective of the burst. All of this, measured on
 an Intel(R) Xeon(R) CPU E5-1650 v2 @ 3.50GHz.
 
 Signed-off-by: Joao Martins joao.mart...@neclab.eu
 ---
 drivers/net/xen-netback/common.h|  2 ++
 drivers/net/xen-netback/interface.c | 11 +---
 drivers/net/xen-netback/netback.c   | 52 
 +
 3 files changed, 51 insertions(+), 14 deletions(-)
 
 diff --git a/drivers/net/xen-netback/common.h 
 b/drivers/net/xen-netback/common.h
 index 23deb6a..f3ece12 100644
 --- a/drivers/net/xen-netback/common.h
 +++ b/drivers/net/xen-netback/common.h
 @@ -363,6 +363,8 @@ void xenvif_kick_thread(struct xenvif_queue *queue);
 
 int xenvif_dealloc_kthread(void *data);
 
 +int xenvif_rx_map(struct xenvif_queue *queue, struct sk_buff *skb);
 +
 void xenvif_rx_queue_tail(struct xenvif_queue *queue, struct sk_buff *skb);
 
 /* Determine whether the needed number of slots (req) are available,
 diff --git a/drivers/net/xen-netback/interface.c 
 b/drivers/net/xen-netback/interface.c
 index 1103568..dfe2b7b 100644
 --- a/drivers/net/xen-netback/interface.c
 +++ b/drivers/net/xen-netback/interface.c
 @@ -109,7 +109,8 @@ static irqreturn_t xenvif_rx_interrupt(int irq, void 
 *dev_id)
 {
  struct xenvif_queue *queue = dev_id;
 
 -xenvif_kick_thread(queue);
 +if (!queue-vif-persistent_grants)
 +xenvif_kick_thread(queue);
 
  return IRQ_HANDLED;
 }
 @@ -168,8 +169,12 @@ static int xenvif_start_xmit(struct sk_buff *skb, 
 struct net_device *dev)
  cb = XENVIF_RX_CB(skb);
  cb-expires = jiffies + vif-drain_timeout;
 
 -xenvif_rx_queue_tail(queue, skb);
 -xenvif_kick_thread(queue);
 +if (!queue-vif-persistent_grants) {
 +xenvif_rx_queue_tail(queue, skb);
 +xenvif_kick_thread(queue);
 +} else if (xenvif_rx_map(queue, skb)) {
 +return NETDEV_TX_BUSY;
 +}
 
 
 We now have two different functions for guest RX, one is xenvif_rx_map,
 the other is xenvif_rx_action. They look very similar. Can 

Re: [RFC PATCH 06/13] xen-netback: copy buffer on xenvif_start_xmit()

2015-05-19 Thread Wei Liu
On Tue, May 12, 2015 at 07:18:30PM +0200, Joao Martins wrote:
 By introducing persistent grants we speed up the RX thread with the
 decreased copy cost, that leads to a throughput decrease of 20%.
 It is observed that the rx_queue stays mostly at 10% of its capacity,
 as opposed to full capacity when using grant copy. And a finer measure
 with lock_stat (below with pkt_size 64, burst 1) shows much higher wait
 queue contention on queue-wq, which hints that the RX kthread is
 waits/wake_up more often, that is actually doing work.
 
 Without persistent grants:
 
 class namecon-bouncescontentions   waittime-min   waittime-max
 waittime-total   waittime-avgacq-bounces   acquisitions   holdtime-min
 holdtime-max holdtime-total   holdtime-avg
 --
 queue-wq:   792792   0.36  24.36
 1140.30   1.44   42081002671   0.00
 46.75  538164.02   0.54
 --
 queue-wq326  [8115949f] __wake_up+0x2f/0x80
 queue-wq410  [811592bf] finish_wait+0x4f/0xa0
 queue-wq 56  [811593eb] prepare_to_wait+0x2b/0xb0
 --
 queue-wq202  [811593eb] prepare_to_wait+0x2b/0xb0
 queue-wq467  [8115949f] __wake_up+0x2f/0x80
 queue-wq123  [811592bf] finish_wait+0x4f/0xa0
 
 With persistent grants:
 
 queue-wq:   61834  61836   0.32  30.12
 99710.27   1.61 2414001125308   0.00
 75.61 1106578.82   0.98
 --
 queue-wq 5079[8115949f] __wake_up+0x2f/0x80
 queue-wq56280[811592bf] finish_wait+0x4f/0xa0
 queue-wq  479[811593eb] prepare_to_wait+0x2b/0xb0
 --
 queue-wq 1005[811592bf] finish_wait+0x4f/0xa0
 queue-wq56761[8115949f] __wake_up+0x2f/0x80
 queue-wq 4072[811593eb] prepare_to_wait+0x2b/0xb0
 
 Also, with persistent grants, we don't require batching grant copy ops
 (besides the initial copy+map) which makes me believe that deferring
 the skb to the RX kthread just adds up unnecessary overhead (for this
 particular case). This patch proposes copying the buffer on
 xenvif_start_xmit(), which lets us both remove the contention on
 queue-wq and lock on rx_queue. Here, an alternative to
 xenvif_rx_action routine is added namely xenvif_rx_map() that maps
 and copies the buffer to the guest. This is only used when persistent
 grants are used, since it would otherwise mean an hypercall per
 packet.
 
 Improvements are up to a factor of 2.14 with a single queue getting us
 from 1.04 Mpps to 1.7 Mpps (burst 1, pkt_size 64) and 1.5 to 2.6 Mpps
 (burst 2, pkt_size 64) compared to using the kthread. Maximum with grant
 copy is 1.2 Mpps, irrespective of the burst. All of this, measured on
 an Intel(R) Xeon(R) CPU E5-1650 v2 @ 3.50GHz.
 
 Signed-off-by: Joao Martins joao.mart...@neclab.eu
 ---
  drivers/net/xen-netback/common.h|  2 ++
  drivers/net/xen-netback/interface.c | 11 +---
  drivers/net/xen-netback/netback.c   | 52 
 +
  3 files changed, 51 insertions(+), 14 deletions(-)
 
 diff --git a/drivers/net/xen-netback/common.h 
 b/drivers/net/xen-netback/common.h
 index 23deb6a..f3ece12 100644
 --- a/drivers/net/xen-netback/common.h
 +++ b/drivers/net/xen-netback/common.h
 @@ -363,6 +363,8 @@ void xenvif_kick_thread(struct xenvif_queue *queue);
  
  int xenvif_dealloc_kthread(void *data);
  
 +int xenvif_rx_map(struct xenvif_queue *queue, struct sk_buff *skb);
 +
  void xenvif_rx_queue_tail(struct xenvif_queue *queue, struct sk_buff *skb);
  
  /* Determine whether the needed number of slots (req) are available,
 diff --git a/drivers/net/xen-netback/interface.c 
 b/drivers/net/xen-netback/interface.c
 index 1103568..dfe2b7b 100644
 --- a/drivers/net/xen-netback/interface.c
 +++ b/drivers/net/xen-netback/interface.c
 @@ -109,7 +109,8 @@ static irqreturn_t xenvif_rx_interrupt(int irq, void 
 *dev_id)
  {
   struct xenvif_queue *queue = dev_id;
  
 - xenvif_kick_thread(queue);
 + if (!queue-vif-persistent_grants)
 + xenvif_kick_thread(queue);
  
   return IRQ_HANDLED;
  }
 @@ -168,8 +169,12 @@ static int xenvif_start_xmit(struct sk_buff *skb, struct 
 net_device *dev)
   cb = XENVIF_RX_CB(skb);
   cb-expires = jiffies + vif-drain_timeout;
  
 - xenvif_rx_queue_tail(queue, skb);
 - xenvif_kick_thread(queue);
 + if (!queue-vif-persistent_grants) {
 + xenvif_rx_queue_tail(queue, skb);
 + xenvif_kick_thread(queue);
 + } else if (xenvif_rx_map(queue, skb)) {
 + return NETDEV_TX_BUSY;
 + }
  

We now have two different functions for guest RX, one is xenvif_rx_map,
the other is xenvif_rx_action. They look very similar. Can we only have
one?

   return 

[RFC PATCH 06/13] xen-netback: copy buffer on xenvif_start_xmit()

2015-05-12 Thread Joao Martins
By introducing persistent grants we speed up the RX thread with the
decreased copy cost, that leads to a throughput decrease of 20%.
It is observed that the rx_queue stays mostly at 10% of its capacity,
as opposed to full capacity when using grant copy. And a finer measure
with lock_stat (below with pkt_size 64, burst 1) shows much higher wait
queue contention on queue-wq, which hints that the RX kthread is
waits/wake_up more often, that is actually doing work.

Without persistent grants:

class namecon-bouncescontentions   waittime-min   waittime-max
waittime-total   waittime-avgacq-bounces   acquisitions   holdtime-min
holdtime-max holdtime-total   holdtime-avg
--
queue-wq:   792792   0.36  24.36
1140.30   1.44   42081002671   0.00
46.75  538164.02   0.54
--
queue-wq326  [8115949f] __wake_up+0x2f/0x80
queue-wq410  [811592bf] finish_wait+0x4f/0xa0
queue-wq 56  [811593eb] prepare_to_wait+0x2b/0xb0
--
queue-wq202  [811593eb] prepare_to_wait+0x2b/0xb0
queue-wq467  [8115949f] __wake_up+0x2f/0x80
queue-wq123  [811592bf] finish_wait+0x4f/0xa0

With persistent grants:

queue-wq:   61834  61836   0.32  30.12
99710.27   1.61 2414001125308   0.00
75.61 1106578.82   0.98
--
queue-wq 5079[8115949f] __wake_up+0x2f/0x80
queue-wq56280[811592bf] finish_wait+0x4f/0xa0
queue-wq  479[811593eb] prepare_to_wait+0x2b/0xb0
--
queue-wq 1005[811592bf] finish_wait+0x4f/0xa0
queue-wq56761[8115949f] __wake_up+0x2f/0x80
queue-wq 4072[811593eb] prepare_to_wait+0x2b/0xb0

Also, with persistent grants, we don't require batching grant copy ops
(besides the initial copy+map) which makes me believe that deferring
the skb to the RX kthread just adds up unnecessary overhead (for this
particular case). This patch proposes copying the buffer on
xenvif_start_xmit(), which lets us both remove the contention on
queue-wq and lock on rx_queue. Here, an alternative to
xenvif_rx_action routine is added namely xenvif_rx_map() that maps
and copies the buffer to the guest. This is only used when persistent
grants are used, since it would otherwise mean an hypercall per
packet.

Improvements are up to a factor of 2.14 with a single queue getting us
from 1.04 Mpps to 1.7 Mpps (burst 1, pkt_size 64) and 1.5 to 2.6 Mpps
(burst 2, pkt_size 64) compared to using the kthread. Maximum with grant
copy is 1.2 Mpps, irrespective of the burst. All of this, measured on
an Intel(R) Xeon(R) CPU E5-1650 v2 @ 3.50GHz.

Signed-off-by: Joao Martins joao.mart...@neclab.eu
---
 drivers/net/xen-netback/common.h|  2 ++
 drivers/net/xen-netback/interface.c | 11 +---
 drivers/net/xen-netback/netback.c   | 52 +
 3 files changed, 51 insertions(+), 14 deletions(-)

diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
index 23deb6a..f3ece12 100644
--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@@ -363,6 +363,8 @@ void xenvif_kick_thread(struct xenvif_queue *queue);
 
 int xenvif_dealloc_kthread(void *data);
 
+int xenvif_rx_map(struct xenvif_queue *queue, struct sk_buff *skb);
+
 void xenvif_rx_queue_tail(struct xenvif_queue *queue, struct sk_buff *skb);
 
 /* Determine whether the needed number of slots (req) are available,
diff --git a/drivers/net/xen-netback/interface.c 
b/drivers/net/xen-netback/interface.c
index 1103568..dfe2b7b 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -109,7 +109,8 @@ static irqreturn_t xenvif_rx_interrupt(int irq, void 
*dev_id)
 {
struct xenvif_queue *queue = dev_id;
 
-   xenvif_kick_thread(queue);
+   if (!queue-vif-persistent_grants)
+   xenvif_kick_thread(queue);
 
return IRQ_HANDLED;
 }
@@ -168,8 +169,12 @@ static int xenvif_start_xmit(struct sk_buff *skb, struct 
net_device *dev)
cb = XENVIF_RX_CB(skb);
cb-expires = jiffies + vif-drain_timeout;
 
-   xenvif_rx_queue_tail(queue, skb);
-   xenvif_kick_thread(queue);
+   if (!queue-vif-persistent_grants) {
+   xenvif_rx_queue_tail(queue, skb);
+   xenvif_kick_thread(queue);
+   } else if (xenvif_rx_map(queue, skb)) {
+   return NETDEV_TX_BUSY;
+   }
 
return NETDEV_TX_OK;
 
diff --git a/drivers/net/xen-netback/netback.c 
b/drivers/net/xen-netback/netback.c
index c4f57d7..228df92 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -883,9 +883,48 @@ static bool xenvif_rx_submit(struct xenvif_queue *queue,