Re: [PATCH 00/12] xen: add common function for reading optional value

2016-11-07 Thread David Vrabel
On 31/10/16 16:48, Juergen Gross wrote:
> There are multiple instances of code reading an optional unsigned
> parameter from Xenstore via xenbus_scanf(). Instead of repeating the
> same code over and over add a service function doing the job and
> replace the call of xenbus_scanf() with the call of the new function
> where appropriate.

Acked-by: David Vrabel <david.vra...@citrix.com>

Please queue for the next release.

David


[PATCHv1 net] xen-netback: fix guest Rx stall detection (after guest Rx refactor)

2016-10-11 Thread David Vrabel
If a VIF has been ready for rx_stall_timeout (60s by default) and an
Rx ring is drained of all requests an Rx stall will be incorrectly
detected.  When this occurs and the guest Rx queue is empty, the Rx
ring's event index will not be set and the frontend will not raise an
event when new requests are placed on the ring, permanently stalling
the VIF.

This is a regression introduced by eb1723a29b9a7 (xen-netback:
refactor guest rx).

Fix this by reinstating the setting of queue->last_rx_time when
placing a packet onto the guest Rx ring.

Signed-off-by: David Vrabel <david.vra...@citrix.com>
---
 drivers/net/xen-netback/rx.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/xen-netback/rx.c b/drivers/net/xen-netback/rx.c
index 8e9ade6..d69f2a9 100644
--- a/drivers/net/xen-netback/rx.c
+++ b/drivers/net/xen-netback/rx.c
@@ -425,6 +425,8 @@ void xenvif_rx_skb(struct xenvif_queue *queue)
 
xenvif_rx_next_skb(queue, );
 
+   queue->last_rx_time = jiffies;
+
do {
struct xen_netif_rx_request *req;
struct xen_netif_rx_response *rsp;
-- 
2.1.4



Re: [Xen-devel] [PATCH v2 net-next 5/7] xen-netback: process guest rx packets in batches

2016-10-04 Thread David Vrabel
On 04/10/16 13:47, Konrad Rzeszutek Wilk wrote:
> On Tue, Oct 04, 2016 at 10:29:16AM +0100, Paul Durrant wrote:
>> From: David Vrabel <david.vra...@citrix.com>
>>
>> Instead of only placing one skb on the guest rx ring at a time, process
>> a batch of up-to 64.  This improves performance by ~10% in some tests.
> 
> And does it regress latency workloads?

No. Because the loop outside of these batches is only checking for a
fatal error condition or a disconnection.

> What are those 'some tests' you speak off?

I think it was aggregate intrahost, but I don't remember exactly.

David


Re: [Xen-devel] [PATCH v2 net-next 7/7] xen/netback: add fraglist support for to-guest rx

2016-10-04 Thread David Vrabel
On 04/10/16 10:29, Paul Durrant wrote:
> From: Ross Lagerwall <ross.lagerw...@citrix.com>
> 
> This allows full 64K skbuffs (with 1500 mtu ethernet, composed of 45
> fragments) to be handled by netback for to-guest rx.

Reviewed-by: David Vrabel <david.vra...@citrix.com>

David


Re: [Xen-devel] [PATCH v2 net-next 2/7] xen-netback: retire guest rx side prefix GSO feature

2016-10-04 Thread David Vrabel
On 04/10/16 10:29, Paul Durrant wrote:
> As far as I am aware only very old Windows network frontends make use of
> this style of passing GSO packets from backend to frontend. These
> frontends can easily be replaced by the freely available Xen Project
> Windows PV network frontend, which uses the 'default' mechanism for
> passing GSO packets, which is also used by all Linux frontends.
> 
> NOTE: Removal of this feature will not cause breakage in old Windows
>   frontends. They simply will no longer receive GSO packets - the
>   packets instead being fragmented in the backend.

Reviewed-by: David Vrabel <david.vra...@citrix.com>

David



[PATCHv2] sunrpc: fix write space race causing stalls

2016-09-19 Thread David Vrabel
Write space becoming available may race with putting the task to sleep
in xprt_wait_for_buffer_space().  The existing mechanism to avoid the
race does not work.

This (edited) partial trace illustrates the problem:

   [1] rpc_task_run_action: task:43546@5 ... action=call_transmit
   [2] xs_write_space <-xs_tcp_write_space
   [3] xprt_write_space <-xs_write_space
   [4] rpc_task_sleep: task:43546@5 ...
   [5] xs_write_space <-xs_tcp_write_space

[1] Task 43546 runs but is out of write space.

[2] Space becomes available, xs_write_space() clears the
SOCKWQ_ASYNC_NOSPACE bit.

[3] xprt_write_space() attemts to wake xprt->snd_task (== 43546), but
this has not yet been queued and the wake up is lost.

[4] xs_nospace() is called which calls xprt_wait_for_buffer_space()
which queues task 43546.

[5] The call to sk->sk_write_space() at the end of xs_nospace() (which
is supposed to handle the above race) does not call
xprt_write_space() as the SOCKWQ_ASYNC_NOSPACE bit is clear and
thus the task is not woken.

Fix the race by resetting the SOCKWQ_ASYNC_NOSPACE bit in xs_nospace()
so the second call to sk->sk_write_space() calls xprt_write_space().

Suggested-by: Trond Myklebust <tron...@primarydata.com>
Signed-off-by: David Vrabel <david.vra...@citrix.com>
---
 net/sunrpc/xprtsock.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index bf16883..e72581d 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -473,7 +473,16 @@ static int xs_nospace(struct rpc_task *task)
spin_unlock_bh(>transport_lock);
 
/* Race breaker in case memory is freed before above code is called */
-   sk->sk_write_space(sk);
+   if (ret == -EAGAIN) {
+   struct socket_wq *wq;
+
+   rcu_read_lock();
+   wq = rcu_dereference(sk->sk_wq);
+   set_bit(SOCKWQ_ASYNC_NOSPACE, >flags);
+   rcu_read_unlock();
+
+   sk->sk_write_space(sk);
+   }
return ret;
 }
 
-- 
2.1.4



Re: [PATCH net-next RESEND] xen-netfront: avoid packet loss when ethernet header crosses page boundary

2016-09-19 Thread David Vrabel
On 19/09/16 11:22, Vitaly Kuznetsov wrote:
> David Miller  writes:
> 
>> From: Vitaly Kuznetsov 
>> Date: Fri, 16 Sep 2016 12:59:14 +0200
>>
>>> @@ -595,6 +596,19 @@ static int xennet_start_xmit(struct sk_buff *skb, 
>>> struct net_device *dev)
>>> offset = offset_in_page(skb->data);
>>> len = skb_headlen(skb);
>>>  
>>> +   /* The first req should be at least ETH_HLEN size or the packet will be
>>> +* dropped by netback.
>>> +*/
>>> +   if (unlikely(PAGE_SIZE - offset < ETH_HLEN)) {
>>> +   nskb = skb_copy(skb, GFP_ATOMIC);
>>> +   if (!nskb)
>>> +   goto drop;
>>> +   dev_kfree_skb_any(skb);
>>> +   skb = nskb;
>>> +   page = virt_to_page(skb->data);
>>> +   offset = offset_in_page(skb->data);
>>> +   }
>>> +
>>> spin_lock_irqsave(>tx_lock, flags);
>>
>> I think you also have to recalculate 'len' in this case too, as
>> skb_headlen() will definitely be different for nskb.
>>
>> In fact, I can't see how this code can work properly without that fix.
> 
> Thank you for your feedback David,
> 
> in my testing (even when I tried doing skb_copy() for all skbs
> unconditionally) skb_headlen(nskb) always equals 'len' so I was under an
> impression that both 'skb->len' and 'skb->data_len' remain the same when
> we do skb_copy(). However, in case you think there are cases when
> headlen changes, I see no problem with re-calculating 'len' as it won't
> bring any significant performace penalty compared to the already added
> skb_copy().

I think you can move the len = skb_headlen(skb) after the if, no need to
recalculate it.

David


Re: [PATCHv1] sunrpc: fix write space race causing stalls

2016-09-16 Thread David Vrabel
On 16/09/16 18:06, Trond Myklebust wrote:
> 
>> On Sep 16, 2016, at 12:41, David Vrabel <david.vra...@citrix.com> wrote:
>>
>> On 16/09/16 17:01, Trond Myklebust wrote:
>>>
>>>> On Sep 16, 2016, at 08:28, David Vrabel <david.vra...@citrix.com> wrote:
>>>>
>>>> Write space becoming available may race with putting the task to sleep
>>>> in xprt_wait_for_buffer_space().  The existing mechanism to avoid the
>>>> race does not work.
>>>>
>>>> This (edited) partial trace illustrates the problem:
>>>>
>>>>  [1] rpc_task_run_action: task:43546@5 ... action=call_transmit
>>>>  [2] xs_write_space <-xs_tcp_write_space
>>>>  [3] xprt_write_space <-xs_write_space
>>>>  [4] rpc_task_sleep: task:43546@5 ...
>>>>  [5] xs_write_space <-xs_tcp_write_space
>>>>
>>>> [1] Task 43546 runs but is out of write space.
>>>>
>>>> [2] Space becomes available, xs_write_space() clears the
>>>>   SOCKWQ_ASYNC_NOSPACE bit.
>>>>
>>>> [3] xprt_write_space() attemts to wake xprt->snd_task (== 43546), but
>>>>   this has not yet been queued and the wake up is lost.
>>>>
>>>> [4] xs_nospace() is called which calls xprt_wait_for_buffer_space()
>>>>   which queues task 43546.
>>>>
>>>> [5] The call to sk->sk_write_space() at the end of xs_nospace() (which
>>>>   is supposed to handle the above race) does not call
>>>>   xprt_write_space() as the SOCKWQ_ASYNC_NOSPACE bit is clear and
>>>>   thus the task is not woken.
>>>>
>>>> Fix the race by have xprt_wait_for_buffer_space() check for write
>>>> space after putting the task to sleep.
>>>>
>>>> Signed-off-by: David Vrabel <david.vra...@citrix.com>
>>>> ---
>>>> include/linux/sunrpc/xprt.h |  1 +
>>>> net/sunrpc/xprt.c   |  4 
>>>> net/sunrpc/xprtsock.c   | 21 +++--
>>>> 3 files changed, 24 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/include/linux/sunrpc/xprt.h b/include/linux/sunrpc/xprt.h
>>>> index a16070d..621e74b 100644
>>>> --- a/include/linux/sunrpc/xprt.h
>>>> +++ b/include/linux/sunrpc/xprt.h
>>>> @@ -129,6 +129,7 @@ struct rpc_xprt_ops {
>>>>void(*connect)(struct rpc_xprt *xprt, struct rpc_task 
>>>> *task);
>>>>void *  (*buf_alloc)(struct rpc_task *task, size_t size);
>>>>void(*buf_free)(void *buffer);
>>>> +  bool(*have_write_space)(struct rpc_xprt *task);
>>>>int (*send_request)(struct rpc_task *task);
>>>>void(*set_retrans_timeout)(struct rpc_task *task);
>>>>void(*timer)(struct rpc_xprt *xprt, struct rpc_task *task);
>>>> diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
>>>> index ea244b2..d3c1b1e 100644
>>>> --- a/net/sunrpc/xprt.c
>>>> +++ b/net/sunrpc/xprt.c
>>>> @@ -502,6 +502,10 @@ void xprt_wait_for_buffer_space(struct rpc_task 
>>>> *task, rpc_action action)
>>>>
>>>>task->tk_timeout = RPC_IS_SOFT(task) ? req->rq_timeout : 0;
>>>>rpc_sleep_on(>pending, task, action);
>>>> +
>>>> +  /* Write space notification may race with putting task to sleep. */
>>>> +  if (xprt->ops->have_write_space(xprt))
>>>> +  rpc_wake_up_queued_task(>pending, task);
>>>> }
>>>> EXPORT_SYMBOL_GPL(xprt_wait_for_buffer_space);
>>>>
>>>> diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
>>>> index bf16883..211de5b 100644
>>>> --- a/net/sunrpc/xprtsock.c
>>>> +++ b/net/sunrpc/xprtsock.c
>>>> @@ -472,8 +472,6 @@ static int xs_nospace(struct rpc_task *task)
>>>>
>>>>spin_unlock_bh(>transport_lock);
>>>>
>>>> -  /* Race breaker in case memory is freed before above code is called */
>>>> -  sk->sk_write_space(sk);
>>>>return ret;
>>>> }
>>>
>>> Instead of these callbacks, why not just add a call to
>>> sk_set_bit(SOCKWQ_ASYNC_WAITDATA, sk) after queueing the task in
>>> xs_nospace()? Won’t that fix the existing race breaker?
>>
>> I don't see how that would help.  If sk->sk_write_space was already
>> called, SOCKWQ_ASYNC_NOSPACE will still be clear and the next call to
>> sk->sk_write_space will still be a nop.
> 
> Sorry. Copy+paste error. I meant SOCKWQ_ASYNC_NOSPACE.
> 
>>
>> Or did you mean SOCKWQ_ASYNC_NOSPACE here?  It doesn't seem right to set
>> this bit when we don't know if there's space or not.
> 
> Why not?

I prefer my solution because:

a) It obviously fixes the race (games with bits are less understandable).

b) It requires fewer atomic ops.

c) It doesn't require me to understand what the behaviour of the
socket-internal SOCKWQ_ASYNC_NOSPACE bit is or should be.

d) I'm not sure I understand the objection to the additional
have_write_space method -- it has simple, clear behaviour.

David



Re: [PATCHv1] sunrpc: fix write space race causing stalls

2016-09-16 Thread David Vrabel
On 16/09/16 17:01, Trond Myklebust wrote:
> 
>> On Sep 16, 2016, at 08:28, David Vrabel <david.vra...@citrix.com> wrote:
>>
>> Write space becoming available may race with putting the task to sleep
>> in xprt_wait_for_buffer_space().  The existing mechanism to avoid the
>> race does not work.
>>
>> This (edited) partial trace illustrates the problem:
>>
>>   [1] rpc_task_run_action: task:43546@5 ... action=call_transmit
>>   [2] xs_write_space <-xs_tcp_write_space
>>   [3] xprt_write_space <-xs_write_space
>>   [4] rpc_task_sleep: task:43546@5 ...
>>   [5] xs_write_space <-xs_tcp_write_space
>>
>> [1] Task 43546 runs but is out of write space.
>>
>> [2] Space becomes available, xs_write_space() clears the
>>SOCKWQ_ASYNC_NOSPACE bit.
>>
>> [3] xprt_write_space() attemts to wake xprt->snd_task (== 43546), but
>>this has not yet been queued and the wake up is lost.
>>
>> [4] xs_nospace() is called which calls xprt_wait_for_buffer_space()
>>which queues task 43546.
>>
>> [5] The call to sk->sk_write_space() at the end of xs_nospace() (which
>>is supposed to handle the above race) does not call
>>xprt_write_space() as the SOCKWQ_ASYNC_NOSPACE bit is clear and
>>thus the task is not woken.
>>
>> Fix the race by have xprt_wait_for_buffer_space() check for write
>> space after putting the task to sleep.
>>
>> Signed-off-by: David Vrabel <david.vra...@citrix.com>
>> ---
>> include/linux/sunrpc/xprt.h |  1 +
>> net/sunrpc/xprt.c   |  4 
>> net/sunrpc/xprtsock.c   | 21 +++--
>> 3 files changed, 24 insertions(+), 2 deletions(-)
>>
>> diff --git a/include/linux/sunrpc/xprt.h b/include/linux/sunrpc/xprt.h
>> index a16070d..621e74b 100644
>> --- a/include/linux/sunrpc/xprt.h
>> +++ b/include/linux/sunrpc/xprt.h
>> @@ -129,6 +129,7 @@ struct rpc_xprt_ops {
>>  void(*connect)(struct rpc_xprt *xprt, struct rpc_task 
>> *task);
>>  void *  (*buf_alloc)(struct rpc_task *task, size_t size);
>>  void(*buf_free)(void *buffer);
>> +bool(*have_write_space)(struct rpc_xprt *task);
>>  int (*send_request)(struct rpc_task *task);
>>  void(*set_retrans_timeout)(struct rpc_task *task);
>>  void(*timer)(struct rpc_xprt *xprt, struct rpc_task *task);
>> diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
>> index ea244b2..d3c1b1e 100644
>> --- a/net/sunrpc/xprt.c
>> +++ b/net/sunrpc/xprt.c
>> @@ -502,6 +502,10 @@ void xprt_wait_for_buffer_space(struct rpc_task *task, 
>> rpc_action action)
>>
>>  task->tk_timeout = RPC_IS_SOFT(task) ? req->rq_timeout : 0;
>>  rpc_sleep_on(>pending, task, action);
>> +
>> +/* Write space notification may race with putting task to sleep. */
>> +if (xprt->ops->have_write_space(xprt))
>> +rpc_wake_up_queued_task(>pending, task);
>> }
>> EXPORT_SYMBOL_GPL(xprt_wait_for_buffer_space);
>>
>> diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
>> index bf16883..211de5b 100644
>> --- a/net/sunrpc/xprtsock.c
>> +++ b/net/sunrpc/xprtsock.c
>> @@ -472,8 +472,6 @@ static int xs_nospace(struct rpc_task *task)
>>
>>  spin_unlock_bh(>transport_lock);
>>
>> -/* Race breaker in case memory is freed before above code is called */
>> -sk->sk_write_space(sk);
>>  return ret;
>> }
> 
> Instead of these callbacks, why not just add a call to
> sk_set_bit(SOCKWQ_ASYNC_WAITDATA, sk) after queueing the task in
> xs_nospace()? Won’t that fix the existing race breaker?

I don't see how that would help.  If sk->sk_write_space was already
called, SOCKWQ_ASYNC_NOSPACE will still be clear and the next call to
sk->sk_write_space will still be a nop.

Or did you mean SOCKWQ_ASYNC_NOSPACE here?  It doesn't seem right to set
this bit when we don't know if there's space or not.

David


[PATCHv1] sunrpc: fix write space race causing stalls

2016-09-16 Thread David Vrabel
Write space becoming available may race with putting the task to sleep
in xprt_wait_for_buffer_space().  The existing mechanism to avoid the
race does not work.

This (edited) partial trace illustrates the problem:

   [1] rpc_task_run_action: task:43546@5 ... action=call_transmit
   [2] xs_write_space <-xs_tcp_write_space
   [3] xprt_write_space <-xs_write_space
   [4] rpc_task_sleep: task:43546@5 ...
   [5] xs_write_space <-xs_tcp_write_space

[1] Task 43546 runs but is out of write space.

[2] Space becomes available, xs_write_space() clears the
SOCKWQ_ASYNC_NOSPACE bit.

[3] xprt_write_space() attemts to wake xprt->snd_task (== 43546), but
this has not yet been queued and the wake up is lost.

[4] xs_nospace() is called which calls xprt_wait_for_buffer_space()
which queues task 43546.

[5] The call to sk->sk_write_space() at the end of xs_nospace() (which
is supposed to handle the above race) does not call
xprt_write_space() as the SOCKWQ_ASYNC_NOSPACE bit is clear and
thus the task is not woken.

Fix the race by have xprt_wait_for_buffer_space() check for write
space after putting the task to sleep.

Signed-off-by: David Vrabel <david.vra...@citrix.com>
---
 include/linux/sunrpc/xprt.h |  1 +
 net/sunrpc/xprt.c   |  4 
 net/sunrpc/xprtsock.c   | 21 +++--
 3 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/include/linux/sunrpc/xprt.h b/include/linux/sunrpc/xprt.h
index a16070d..621e74b 100644
--- a/include/linux/sunrpc/xprt.h
+++ b/include/linux/sunrpc/xprt.h
@@ -129,6 +129,7 @@ struct rpc_xprt_ops {
void(*connect)(struct rpc_xprt *xprt, struct rpc_task 
*task);
void *  (*buf_alloc)(struct rpc_task *task, size_t size);
void(*buf_free)(void *buffer);
+   bool(*have_write_space)(struct rpc_xprt *task);
int (*send_request)(struct rpc_task *task);
void(*set_retrans_timeout)(struct rpc_task *task);
void(*timer)(struct rpc_xprt *xprt, struct rpc_task *task);
diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
index ea244b2..d3c1b1e 100644
--- a/net/sunrpc/xprt.c
+++ b/net/sunrpc/xprt.c
@@ -502,6 +502,10 @@ void xprt_wait_for_buffer_space(struct rpc_task *task, 
rpc_action action)
 
task->tk_timeout = RPC_IS_SOFT(task) ? req->rq_timeout : 0;
rpc_sleep_on(>pending, task, action);
+
+   /* Write space notification may race with putting task to sleep. */
+   if (xprt->ops->have_write_space(xprt))
+   rpc_wake_up_queued_task(>pending, task);
 }
 EXPORT_SYMBOL_GPL(xprt_wait_for_buffer_space);
 
diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index bf16883..211de5b 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -472,8 +472,6 @@ static int xs_nospace(struct rpc_task *task)
 
spin_unlock_bh(>transport_lock);
 
-   /* Race breaker in case memory is freed before above code is called */
-   sk->sk_write_space(sk);
return ret;
 }
 
@@ -1679,6 +1677,22 @@ static void xs_tcp_write_space(struct sock *sk)
read_unlock_bh(>sk_callback_lock);
 }
 
+static bool xs_udp_have_write_space(struct rpc_xprt *xprt)
+{
+   struct sock_xprt *transport = container_of(xprt, struct sock_xprt, 
xprt);
+   struct sock *sk = transport->inet;
+
+   return sock_writeable(sk);
+}
+
+static bool xs_tcp_have_write_space(struct rpc_xprt *xprt)
+{
+   struct sock_xprt *transport = container_of(xprt, struct sock_xprt, 
xprt);
+   struct sock *sk = transport->inet;
+
+   return sk_stream_is_writeable(sk);
+}
+
 static void xs_udp_do_set_buffer_size(struct rpc_xprt *xprt)
 {
struct sock_xprt *transport = container_of(xprt, struct sock_xprt, 
xprt);
@@ -2664,6 +2678,7 @@ static struct rpc_xprt_ops xs_local_ops = {
.connect= xs_local_connect,
.buf_alloc  = rpc_malloc,
.buf_free   = rpc_free,
+   .have_write_space   = xs_udp_have_write_space,
.send_request   = xs_local_send_request,
.set_retrans_timeout= xprt_set_retrans_timeout_def,
.close  = xs_close,
@@ -2683,6 +2698,7 @@ static struct rpc_xprt_ops xs_udp_ops = {
.connect= xs_connect,
.buf_alloc  = rpc_malloc,
.buf_free   = rpc_free,
+   .have_write_space   = xs_udp_have_write_space,
.send_request   = xs_udp_send_request,
.set_retrans_timeout= xprt_set_retrans_timeout_rtt,
.timer  = xs_udp_timer,
@@ -2704,6 +2720,7 @@ static struct rpc_xprt_ops xs_tcp_ops = {
.connect= xs_connect,
.buf_alloc  = rpc_malloc,
.buf_free   = rpc_free,
+   .have_write_space   = xs_tcp_have_write_space,
.send_request

Re: [PATCH net-next] xen-netfront: avoid packet loss when ethernet header crosses page boundary

2016-09-09 Thread David Vrabel
On 22/08/16 16:42, Vitaly Kuznetsov wrote:
> Small packet loss is reported on complex multi host network configurations
> including tunnels, NAT, ... My investigation led me to the following check
> in netback which drops packets:
> 
> if (unlikely(txreq.size < ETH_HLEN)) {
> netdev_err(queue->vif->dev,
>"Bad packet size: %d\n", txreq.size);
> xenvif_tx_err(queue, , extra_count, idx);
> break;
> }
> 
> But this check itself is legitimate. SKBs consist of a linear part (which
> has to have the ethernet header) and (optionally) a number of frags.
> Netfront transmits the head of the linear part up to the page boundary
> as the first request and all the rest becomes frags so when we're
> reconstructing the SKB in netback we can't distinguish between original
> frags and the 'tail' of the linear part. The first SKB needs to be at
> least ETH_HLEN size. So in case we have an SKB with its linear part
> starting too close to the page boundary the packet is lost.
> 
> I see two ways to fix the issue:
> - Change the 'wire' protocol between netfront and netback to start keeping
>   the original SKB structure. We'll have to add a flag indicating the fact
>   that the particular request is a part of the original linear part and not
>   a frag. We'll need to know the length of the linear part to pre-allocate
>   memory.
> - Avoid transmitting SKBs with linear parts starting too close to the page
>   boundary. That seems preferable short-term and shouldn't bring
>   significant performance degradation as such packets are rare. That's what
>   this patch is trying to achieve with skb_copy().
> 
> Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com>

We should probably fix the backend to handle this (by grant copying a
minimum amount in the linear area, but since netfront needs to work with
older netback.

Acked-by: David Vrabel <david.vra...@citrix.com>

David


Re: [Xen-devel] [PATCH net-next] xen-netfront: avoid packet loss when ethernet header crosses page boundary

2016-08-22 Thread David Vrabel
On 22/08/16 16:42, Vitaly Kuznetsov wrote:
> 
> I see two ways to fix the issue:
> - Change the 'wire' protocol between netfront and netback to start keeping
>   the original SKB structure. We'll have to add a flag indicating the fact
>   that the particular request is a part of the original linear part and not
>   a frag. We'll need to know the length of the linear part to pre-allocate
>   memory.

I don't think there needs to be a protocol change.  I think the check in
netback is bogus -- it's the total packet length that must be >
HLEN_ETH.  The upper layers will pull any headers from the frags as
needed (or if necessary, netback could pull a minimum amount).

There's no need to preserve the skb layout (e.g., look how the to-guest
direction we do not do this).

David


Re: [Xen-devel] [PATCH] xen-netfront: prefer xenbus_write() over xenbus_printf() where possible

2016-07-07 Thread David Vrabel
On 07/07/16 09:00, Jan Beulich wrote:
> ... as being the simpler variant.

It's really annoying that all these related cleanups where not in the
same thread. Don't do this again, please.

The better clean-up is to remove xenbus_write() in favour of
xenbus_printf() everywhere (especially since one of your "cleanups" made
it worse).

xenbus_printf() does everything xenbus_write() can do with no loss of
readability.

David


Re: [Xen-devel] [PATCH] xen-netback: correct return value checks on xenbus_scanf()

2016-07-07 Thread David Vrabel
On 07/07/16 11:35, Wei Liu wrote:
> On Thu, Jul 07, 2016 at 10:58:16AM +0100, David Vrabel wrote:
>> On 07/07/16 08:57, Jan Beulich wrote:
>>> Only a positive return value indicates success.
>>
>> This is not correct.
>>
> 
> Do you mean the commit message is not correct or the code is not
> correct? If it is the formal, do you have any suggestion to fix it?

This code is correct as-is, thus the commit message is wrong or misleading.

David


Re: [Xen-devel] [PATCH] xen-netback: correct return value checks on xenbus_scanf()

2016-07-07 Thread David Vrabel
On 07/07/16 08:57, Jan Beulich wrote:
> Only a positive return value indicates success.

This is not correct.

David


Re: [Xen-devel] [PATCH] xen-netfront: correct return value checks on xenbus_scanf()

2016-07-07 Thread David Vrabel
On 07/07/16 08:59, Jan Beulich wrote:
> Only a positive return value indicates success.

This checks were already correct.

David


Re: [Xen-devel] [PATCH] xen-netback: prefer xenbus_scanf() over xenbus_gather()

2016-07-07 Thread David Vrabel
On 07/07/16 08:57, Jan Beulich wrote:
> ... for single items being collected: It is more typesafe (as the
> compiler can check format string and to-be-written-to variable match)
> and requires one less parameter to be passed.

Do not split commit messages between the subject and the body in this
way.  The body should be a complete description of the commit,
independent of the subject.

It's annoying to find only half a sentence in the body and have to go
re-read the subject again.

e.g.,

"xen-netback: prefer xenbus_scanf() over xenbus_gather()

xenbus_scanf() is preferred over xenbus_gather() because: a) the
compiler can check the types match the format; and b) one less parameter
is needed.

Use xenbus_scanf() instead of xenbus_gather() when reading only a single
key."

I think almost all of your commit messages suffer from this, please redo
them all.

> --- 4.7-rc6-prefer-xenbus_scanf.orig/drivers/net/xen-netback/xenbus.c
> +++ 4.7-rc6-prefer-xenbus_scanf/drivers/net/xen-netback/xenbus.c
> @@ -842,16 +842,16 @@ static int connect_ctrl_ring(struct back
>   unsigned int evtchn;
>   int err;
>  
> - err = xenbus_gather(XBT_NIL, dev->otherend,
> - "ctrl-ring-ref", "%u", , NULL);
> - if (err)
> + err = xenbus_scanf(XBT_NIL, dev->otherend,
> +"ctrl-ring-ref", "%u", );
> + if (err <= 0)

< 0

David


Re: [Xen-devel] [PATCH] xen-netfront: set real_num_tx_queues to zreo avoid to trigger BUG_ON

2016-02-22 Thread David Vrabel
On 20/02/16 01:27, Gonglei wrote:
> It's possible for a race condition to exist between xennet_open() and
> talk_to_netback(). After invoking netfront_probe() then other
> threads or processes invoke xennet_open (such as NetworkManager)
> immediately may trigger BUG_ON(). Besides, we also should reset
> real_num_tx_queues in xennet_destroy_queues().

I think you want something like the following.  We serialize the backend
changing and the netdev open with the device mutex.

(I've not even tried to compile this so it might not even build.)

David

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index d6abf19..7e2791a 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -340,22 +340,28 @@ static int xennet_open(struct net_device *dev)
unsigned int i = 0;
struct netfront_queue *queue = NULL;

+   device_lock(>xbdev->dev);
+
+   if (!netif_carrier_ok(dev))
+   goto unlock;
+
for (i = 0; i < num_queues; ++i) {
queue = >queues[i];
napi_enable(>napi);

spin_lock_bh(>rx_lock);
-   if (netif_carrier_ok(dev)) {
-   xennet_alloc_rx_buffers(queue);
-   queue->rx.sring->rsp_event = queue->rx.rsp_cons + 1;
-   if (RING_HAS_UNCONSUMED_RESPONSES(>rx))
-   napi_schedule(>napi);
-   }
+   xennet_alloc_rx_buffers(queue);
+   queue->rx.sring->rsp_event = queue->rx.rsp_cons + 1;
+   if (RING_HAS_UNCONSUMED_RESPONSES(>rx))
+   napi_schedule(>napi);
spin_unlock_bh(>rx_lock);
}

netif_tx_start_all_queues(dev);

+  unlock:
+   device_unlock(>xbdev->dev);
+
return 0;
 }

@@ -1983,8 +1989,8 @@ static int xennet_connect(struct net_device *dev)
num_queues = dev->real_num_tx_queues;

rtnl_lock();
+
netdev_update_features(dev);
-   rtnl_unlock();

/*
 * All public and private state should now be sane.  Get
@@ -1992,7 +1998,6 @@ static int xennet_connect(struct net_device *dev)
 * domain a kick because we've probably just requeued some
 * packets.
 */
-   netif_carrier_on(np->netdev);
for (j = 0; j < num_queues; ++j) {
queue = >queues[j];

@@ -2006,9 +2011,17 @@ static int xennet_connect(struct net_device *dev)

spin_lock_bh(>rx_lock);
xennet_alloc_rx_buffers(queue);
+   if (netif_running(dev)) {
+   if (RING_HAS_UNCONSUMED_RESPONSES(>rx))
+   napi_schedule(>napi);
+   }
spin_unlock_bh(>rx_lock);
}

+   netif_carrier_on(np->netdev);
+
+   rtnl_unlock();
+
return 0;
 }

diff --git a/drivers/xen/xenbus/xenbus_probe.c
b/drivers/xen/xenbus/xenbus_probe.c
index 33a31cf..57cd20d 100644
--- a/drivers/xen/xenbus/xenbus_probe.c
+++ b/drivers/xen/xenbus/xenbus_probe.c
@@ -204,8 +204,11 @@ void xenbus_otherend_changed(struct xenbus_watch
*watch,
return;
}

-   if (drv->otherend_changed)
+   if (drv->otherend_changed) {
+   device_lock(>dev);
drv->otherend_changed(dev, state);
+   device_unlock(>dev);
+   }
 }
 EXPORT_SYMBOL_GPL(xenbus_otherend_changed);



Re: [Xen-devel] [PATCH] xen-netfront: set real_num_tx_queues to zreo avoid to trigger BUG_ON

2016-02-22 Thread David Vrabel
On 20/02/16 06:00, Gonglei (Arei) wrote:
> Hi,
> 
> Thanks for rapid feedback :)
> 
>> From: David Miller [mailto:da...@davemloft.net]
>> Sent: Saturday, February 20, 2016 12:37 PM
>>
>> From: Gonglei 
>> Date: Sat, 20 Feb 2016 09:27:26 +0800
>>
>>> It's possible for a race condition to exist between xennet_open() and
>>> talk_to_netback(). After invoking netfront_probe() then other
>>> threads or processes invoke xennet_open (such as NetworkManager)
>>> immediately may trigger BUG_ON(). Besides, we also should reset
>>> real_num_tx_queues in xennet_destroy_queues().
>>
>> One should really never invoke register_netdev() until the device is
>> %100 fully initialized.
>>
>> This means you cannot call register_netdev() until it is completely
>> legal to invoke your ->open() method.
>>
>> And I think that is what the real problem is here.
>>
>> If you follow the correct rules for ordering wrt. register_netdev()
>> there are no "races".  Because ->open() must be legally invokable
>> from the exact moment you call register_netdev().
>>
> 
> Yes, I agree. Though that's the historic legacy problem. ;)
> 
>> I'm not applying this, as it really sounds like the fundamental issue
>> is the order in which the xen-netfront private data is initialized
>> or setup before being registered.
> 
> That means register_netdev() should be invoked after xennet_connect(), right?

No.  This would mean that the network device is removed and re-added
when a guest is migrated which at best would result in considerably more
downtime (e.g., the IP address has to be renegotiated with DHCP).

David


[PATCHv2 net] xen-netfront: request Tx response events more often

2016-01-26 Thread David Vrabel
From: Malcolm Crossley <malcolm.cross...@citrix.com>

Trying to batch Tx response events results in poor performance because
this delays freeing the transmitted skbs.

Instead use the standard RING_FINAL_CHECK_FOR_RESPONSES() macro to be
notified once the next Tx response is placed on the ring.

Signed-off-by: Malcolm Crossley <malcolm.cross...@citrix.com>
Signed-off-by: David Vrabel <david.vra...@citrix.com>
---
v2:
- Use bool.
---
 drivers/net/xen-netfront.c | 15 +++
 1 file changed, 3 insertions(+), 12 deletions(-)

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index d6abf19..96ccd4e 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -364,6 +364,7 @@ static void xennet_tx_buf_gc(struct netfront_queue *queue)
RING_IDX cons, prod;
unsigned short id;
struct sk_buff *skb;
+   bool more_to_do;
 
BUG_ON(!netif_carrier_ok(queue->info->netdev));
 
@@ -398,18 +399,8 @@ static void xennet_tx_buf_gc(struct netfront_queue *queue)
 
queue->tx.rsp_cons = prod;
 
-   /*
-* Set a new event, then check for race with update of tx_cons.
-* Note that it is essential to schedule a callback, no matter
-* how few buffers are pending. Even if there is space in the
-* transmit ring, higher layers may be blocked because too much
-* data is outstanding: in such cases notification from Xen is
-* likely to be the only kick that we'll get.
-*/
-   queue->tx.sring->rsp_event =
-   prod + ((queue->tx.sring->req_prod - prod) >> 1) + 1;
-   mb();   /* update shared area */
-   } while ((cons == prod) && (prod != queue->tx.sring->rsp_prod));
+   RING_FINAL_CHECK_FOR_RESPONSES(>tx, more_to_do);
+   } while (more_to_do);
 
xennet_maybe_wake_tx(queue);
 }
-- 
2.1.4



Re: [Xen-devel] [PATCH 1/3] xen-netback: fix license ident used in MODULE_LICENSE

2016-01-22 Thread David Vrabel
On 22/01/16 12:34, Wei Liu wrote:
> The comment at the beginning of the file is the canonical source of
> licenses for this module. Currently it contains GPL and MIT license. Fix
> the code to reflect the reality.

"The MIT license" isn't really a thing.  The closest is the X11
license[1], but this not applicable here either since the text in the
drivers does not refer to X11 trademarks etc.

You can either use "GPL" which would be correct for a Linux kernel
module since the alternate only applies when distributed separately from
Linux ("or, when distributed separately from the Linux kernel or
incorporated into other software packages, subject to the following
license:"); or you can use "GPL and additional rights".

(Or you could just leave it as-is since "Dual BSD/GPL" is close enough.)

David

[1] http://www.gnu.org/licenses/license-list.html#X11License



Re: [Xen-devel] [PATCH 1/3] xen-netback: fix license ident used in MODULE_LICENSE

2016-01-22 Thread David Vrabel
On 22/01/16 14:15, Ian Campbell wrote:
> On Fri, 2016-01-22 at 13:49 +, Wei Liu wrote:
>> On Fri, Jan 22, 2016 at 01:14:24PM +0000, David Vrabel wrote:
>>> On 22/01/16 12:34, Wei Liu wrote:
>>>> The comment at the beginning of the file is the canonical source of
>>>> licenses for this module. Currently it contains GPL and MIT license.
>>>> Fix
>>>> the code to reflect the reality.
>>>
>>> "The MIT license" isn't really a thing.  The closest is the X11
>>> license[1], but this not applicable here either since the text in the
>>> drivers does not refer to X11 trademarks etc.
>>>
>>
>> That was referring to the license ident string in Linux.  If MIT license
>> isn't a thing, why would Linux have it at all?
> 
> The fact what include/linux/license.h:license_is_gpl_compatible includes
> "Dual MIT/GPL" as an option seems to suggest that it is enough of a thing
> to be validly used as the contents of a MODULE_LICENSE() thing.

"Dual MIT/GPL" is used exactly once in the source in a file that has no
license text and there is no other documentation.

David


Re: [Xen-devel] xen-netfront crash when detaching network while some network activity

2015-11-17 Thread David Vrabel
On 21/10/15 19:57, Marek Marczykowski-Górecki wrote:
> 
> Any ideas?

No, sorry.  Netfront looks correct to me.

We take an additional ref for the ref released by
gnttab_release_grant_reference().  The get_page() here is safe since we
haven't freed the page yet (this is done in the subsequent call to
skb_kfree_irq()).

get_page()/put_page() also look fine when used with tail pages.

David
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] [PATCH net-next 0/8] xen-netback/core: packet hashing

2015-10-26 Thread David Vrabel
On 24/10/15 12:55, David Miller wrote:
> From: Paul Durrant 
> Date: Wed, 21 Oct 2015 11:36:17 +0100
> 
>> This series adds xen-netback support for hash negotiation with a frontend
>> driver, and an implementation of toeplitz hashing as the initial negotiable
>> algorithm.
> 
> Ping, I want to see some review from some other xen networking folks.

There's been some review of the front/back protocol (on a different
thread) and some significant changes have been suggested.

David
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] [PATCH] xen-netback: correctly check failed allocation

2015-10-16 Thread David Vrabel
On 16/10/15 10:05, Wei Liu wrote:
> On Thu, Oct 15, 2015 at 02:02:47PM -0400, Insu Yun wrote:
>> I changed patch with valid format.
>>
>> On Thu, Oct 15, 2015 at 2:02 PM, Insu Yun  wrote:
>>
>>> Since vzalloc can be failed in memory pressure,
>>> writes -ENOMEM to xenstore to indicate error.
>>>
>>> Signed-off-by: Insu Yun 
>>> ---
>>>  drivers/net/xen-netback/xenbus.c | 6 ++
>>>  1 file changed, 6 insertions(+)
>>>
>>> diff --git a/drivers/net/xen-netback/xenbus.c
>>> b/drivers/net/xen-netback/xenbus.c
>>> index 929a6e7..56ebd82 100644
>>> --- a/drivers/net/xen-netback/xenbus.c
>>> +++ b/drivers/net/xen-netback/xenbus.c
>>> @@ -788,6 +788,12 @@ static void connect(struct backend_info *be)
>>> /* Use the number of queues requested by the frontend */
>>> be->vif->queues = vzalloc(requested_num_queues *
>>>   sizeof(struct xenvif_queue));
>>> +   if (!be->vif->queues) {
>>> +   xenbus_dev_fatal(dev, -ENOMEM,
>>> +"allocating queues");
>>> +   return;
>>>
>>
>> I didn't use goto err, because another error handling is not required
>>
> 
> It's recommended in kernel coding style to use "goto" style error
> handling. I personally prefer that to arbitrary return in function body,
> too.
> 
> It's not a matter of whether another error handling is required or not,
> it's about cleaner code that is easy to reason about and consistent
> coding style.

Using xenbus_dev_fatal(); return; throughout would be consistent and
easy to reason about.

Also, the goto err path should raise a fatal error (instead of
disconnecting).  I also note that failures in the xen_net_read_rate(),
xen_register_watchers() and read_xenbus_vif_flags() are also not handled.

David
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] [PATCH net-next] xen-netfront: always set num queues if possible

2015-09-15 Thread David Vrabel
On 14/09/15 22:28, Charles (Chas) Williams wrote:
> The xen store preserves this information across module invocations.
> If you insmod netfront with two queues and later insmod again with one
> queue, the backend will still believe you asked for two queues.

Can you rewrite the commit message to be clearer?

"If netfront connects with 2 (or more) queues and then reconnects with
only 1 queue it fails to delete or rewrite the multi-queue-num-queues
key and netback will try to use the wrong number of queues.

Always write the num-queues field if the backend has multi-queue support."

> --- a/drivers/net/xen-netfront.c
> +++ b/drivers/net/xen-netfront.c
> @@ -1819,11 +1819,7 @@ again:
>   goto destroy_ring;
>   }
>  
> - if (num_queues == 1) {
> - err = write_queue_xenstore_keys(>queues[0], , 0); /* 
> flat */
> - if (err)
> - goto abort_transaction_no_dev_fatal;
> - } else {
> + if (xenbus_exists(xbt, dev->nodename, "multi-queue-num-queues")) {
>   /* Write the number of queues */
>   err = xenbus_printf(xbt, dev->nodename, 
> "multi-queue-num-queues",
>   "%u", num_queues);

Isn't this broken?  It looks like it won't write the
multi-queue-num-queues key the first time around.

It think this should be conditional on multi-queue-max-queues existing
(which is written by the backend if multi-queue is supported).

David

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] [PATCHv1 net] xen-netback: require fewer guest Rx slots when not using GSO

2015-09-15 Thread David Vrabel
On 09/09/15 20:34, David Miller wrote:
> From: David Vrabel <david.vra...@citrix.com>
> Date: Tue, 8 Sep 2015 14:25:14 +0100
> 
>> Commit f48da8b14d04ca87ffcffe68829afd45f926ec6a (xen-netback: fix
>> unlimited guest Rx internal queue and carrier flapping) introduced a
>> regression.
>>
>> The PV frontend in IPXE only places 4 requests on the guest Rx ring.
>> Since netback required at least (MAX_SKB_FRAGS + 1) slots, IPXE could
>> not receive any packets.
>>
>> a) If GSO is not enabled on the VIF, fewer guest Rx slots are required
>>for the largest possible packet.  Calculate the required slots
>>based on the maximum GSO size or the MTU.
>>
>>This calculation of the number of required slots relies on
>>1650d5455bd2 (xen-netback: always fully coalesce guest Rx packets)
>>which present in 4.0-rc1 and later.
>>
>> b) Reduce the Rx stall detection to checking for at least one
>>available Rx request.  This is fine since we're predominately
>>    concerned with detecting interfaces which are down and thus have
>>zero available Rx requests.
>>
>> Signed-off-by: David Vrabel <david.vra...@citrix.com>
> 
> Applied, thanks David.

Hi David,

This is causing regressions with certain frontend drivers.  Can you drop
it, please?

Apologies for posting the patch before running it through the full set
of internal testing.

David
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] [PATCHv1 net] xen-netback: require fewer guest Rx slots when not using GSO

2015-09-10 Thread David Vrabel
On 09/09/15 20:34, Konrad Rzeszutek Wilk wrote:
> On Tue, Sep 08, 2015 at 02:25:14PM +0100, David Vrabel wrote:
>> Commit f48da8b14d04ca87ffcffe68829afd45f926ec6a (xen-netback: fix
>> unlimited guest Rx internal queue and carrier flapping) introduced a
>> regression.
>>
>> The PV frontend in IPXE only places 4 requests on the guest Rx ring.
> 
> in IPXE ? What is that?

ipxe.org

>> Since netback required at least (MAX_SKB_FRAGS + 1) slots, IPXE could
>> not receive any packets.
>>
>> a) If GSO is not enabled on the VIF, fewer guest Rx slots are required
>>for the largest possible packet.  Calculate the required slots
> 
> s/slots/slot number/
>>based on the maximum GSO size or the MTU.
>>
>>This calculation of the number of required slots relies on
>>1650d5455bd2 (xen-netback: always fully coalesce guest Rx packets)
>>which present in 4.0-rc1 and later.
>>
>> b) Reduce the Rx stall detection to checking for at least one
> 
> s/to/by/
>>available Rx request.  This is fine since we're predominately
>>concerned with detecting interfaces which are down and thus have
>>zero available Rx requests.
> 
> s/have zero available Rx requests/no available Rx requests/?

If the only review comments you have are unnecessary changes to the
grammar in the commit message, it's probably best if you don't post them.

David
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] [PATCH net v3 2/2] xen-netfront: respect user provided max_queues

2015-09-10 Thread David Vrabel
On 10/09/15 11:18, Wei Liu wrote:
> Originally that parameter was always reset to num_online_cpus during
> module initialisation, which renders it useless.
> 
> The fix is to only set max_queues to num_online_cpus when user has not
> provided a value.

Reviewed-by: David Vrabel <david.vra...@citrix.com>
Tested-by: David Vrabel <david.vra...@citrix.com>

David

> --- a/drivers/net/xen-netfront.c
> +++ b/drivers/net/xen-netfront.c
> @@ -2132,8 +2132,11 @@ static int __init netif_init(void)
>  
>   pr_info("Initialising Xen virtual ethernet driver\n");
>  
> - /* Allow as many queues as there are CPUs, by default */
> - xennet_max_queues = num_online_cpus();
> + /* Allow as many queues as there are CPUs if user has not
> +  * specified a value.
> +  */
> + if (xennet_max_queues == 0)
> + xennet_max_queues = num_online_cpus();
>  
>   return xenbus_register_frontend(_driver);
>  }
> 

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net v2 2/2] xen-netfront: respect user provided max_queues

2015-09-09 Thread David Vrabel
On 09/09/15 11:23, Wei Liu wrote:
> Originally that parameter was always reset to num_online_cpus during
> module initialisation, which renders it useless.
> 
> The fix is to only set max_queues to num_online_cpus when user has not
> provided a value.

Reviewed-by: David Vrabel <david.vra...@citrix.com>

Thanks.

David
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net] xen-netback: respect user provided max_queues

2015-09-09 Thread David Vrabel
On 09/09/15 11:09, Wei Liu wrote:
> Originally that parameter was always reset to num_online_cpus during
> module initialisation, which renders it useless.
> 
> The fix is to only set max_queues to num_online_cpus when user has not
> provided a value.
[...]
> --- a/drivers/net/xen-netback/netback.c
> +++ b/drivers/net/xen-netback/netback.c
> @@ -67,7 +67,7 @@ module_param(rx_drain_timeout_msecs, uint, 0444);
>  unsigned int rx_stall_timeout_msecs = 6;
>  module_param(rx_stall_timeout_msecs, uint, 0444);
>  
> -unsigned int xenvif_max_queues;
> +unsigned int xenvif_max_queues = 0;

You don't need this.

Otherwise,

Reviewed-by: David Vrabel <david.vra...@citrix.com>

Is an equivalent fix needed in netfront?

David

>  module_param_named(max_queues, xenvif_max_queues, uint, 0644);
>  MODULE_PARM_DESC(max_queues,
>"Maximum number of queues per virtual interface");
> @@ -2105,8 +2105,11 @@ static int __init netback_init(void)
>   if (!xen_domain())
>   return -ENODEV;
>  
> - /* Allow as many queues as there are CPUs, by default */
> - xenvif_max_queues = num_online_cpus();
> + /* Allow as many queues as there are CPUs if user has not
> +  * specified a value.
> +  */
> + if (xenvif_max_queues == 0)
> + xenvif_max_queues = num_online_cpus();
>  
>   if (fatal_skb_slots < XEN_NETBK_LEGACY_SLOTS_MAX) {
>   pr_info("fatal_skb_slots too small (%d), bump it to 
> XEN_NETBK_LEGACY_SLOTS_MAX (%d)\n",
> 

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv1 net] xen-netback: require fewer guest Rx slots when not using GSO

2015-09-08 Thread David Vrabel
Commit f48da8b14d04ca87ffcffe68829afd45f926ec6a (xen-netback: fix
unlimited guest Rx internal queue and carrier flapping) introduced a
regression.

The PV frontend in IPXE only places 4 requests on the guest Rx ring.
Since netback required at least (MAX_SKB_FRAGS + 1) slots, IPXE could
not receive any packets.

a) If GSO is not enabled on the VIF, fewer guest Rx slots are required
   for the largest possible packet.  Calculate the required slots
   based on the maximum GSO size or the MTU.

   This calculation of the number of required slots relies on
   1650d5455bd2 (xen-netback: always fully coalesce guest Rx packets)
   which present in 4.0-rc1 and later.

b) Reduce the Rx stall detection to checking for at least one
   available Rx request.  This is fine since we're predominately
   concerned with detecting interfaces which are down and thus have
   zero available Rx requests.

Signed-off-by: David Vrabel <david.vra...@citrix.com>
---
 drivers/net/xen-netback/common.h  | 10 --
 drivers/net/xen-netback/netback.c | 23 ---
 2 files changed, 16 insertions(+), 17 deletions(-)

Note that this can only be backported as-is on top of
1650d5455bd2dc6b5ee134bd6fc1a3236c266b5b (xen-netback: always fully
coalesce guest Rx packets) which is in 4.0 and later.

diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
index 8a495b3..25990b1 100644
--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@@ -200,11 +200,6 @@ struct xenvif_queue { /* Per-queue data for xenvif */
struct xenvif_stats stats;
 };
 
-/* Maximum number of Rx slots a to-guest packet may use, including the
- * slot needed for GSO meta-data.
- */
-#define XEN_NETBK_RX_SLOTS_MAX (MAX_SKB_FRAGS + 1)
-
 enum state_bit_shift {
/* This bit marks that the vif is connected */
VIF_STATUS_CONNECTED,
@@ -306,11 +301,6 @@ int xenvif_dealloc_kthread(void *data);
 
 void xenvif_rx_queue_tail(struct xenvif_queue *queue, struct sk_buff *skb);
 
-/* Determine whether the needed number of slots (req) are available,
- * and set req_event if not.
- */
-bool xenvif_rx_ring_slots_available(struct xenvif_queue *queue, int needed);
-
 void xenvif_carrier_on(struct xenvif *vif);
 
 /* Callback from stack when TX packet can be released */
diff --git a/drivers/net/xen-netback/netback.c 
b/drivers/net/xen-netback/netback.c
index 3f44b52..e791930 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -149,9 +149,20 @@ static inline pending_ring_idx_t pending_index(unsigned i)
return i & (MAX_PENDING_REQS-1);
 }
 
-bool xenvif_rx_ring_slots_available(struct xenvif_queue *queue, int needed)
+static int xenvif_rx_ring_slots_needed(struct xenvif *vif)
+{
+   if (vif->gso_mask)
+   return DIV_ROUND_UP(vif->dev->gso_max_size, PAGE_SIZE) + 1;
+   else
+   return DIV_ROUND_UP(vif->dev->mtu, PAGE_SIZE);
+}
+
+static bool xenvif_rx_ring_slots_available(struct xenvif_queue *queue)
 {
RING_IDX prod, cons;
+   int needed;
+
+   needed = xenvif_rx_ring_slots_needed(queue->vif);
 
do {
prod = queue->rx.sring->req_prod;
@@ -513,7 +524,7 @@ static void xenvif_rx_action(struct xenvif_queue *queue)
 
skb_queue_head_init();
 
-   while (xenvif_rx_ring_slots_available(queue, XEN_NETBK_RX_SLOTS_MAX)
+   while (xenvif_rx_ring_slots_available(queue)
   && (skb = xenvif_rx_dequeue(queue)) != NULL) {
queue->last_rx_time = jiffies;
 
@@ -1839,8 +1850,7 @@ static bool xenvif_rx_queue_stalled(struct xenvif_queue 
*queue)
prod = queue->rx.sring->req_prod;
cons = queue->rx.req_cons;
 
-   return !queue->stalled
-   && prod - cons < XEN_NETBK_RX_SLOTS_MAX
+   return !queue->stalled && prod - cons < 1
&& time_after(jiffies,
  queue->last_rx_time + queue->vif->stall_timeout);
 }
@@ -1852,14 +1862,13 @@ static bool xenvif_rx_queue_ready(struct xenvif_queue 
*queue)
prod = queue->rx.sring->req_prod;
cons = queue->rx.req_cons;
 
-   return queue->stalled
-   && prod - cons >= XEN_NETBK_RX_SLOTS_MAX;
+   return queue->stalled && prod - cons >= 1;
 }
 
 static bool xenvif_have_rx_work(struct xenvif_queue *queue)
 {
return (!skb_queue_empty(>rx_queue)
-   && xenvif_rx_ring_slots_available(queue, 
XEN_NETBK_RX_SLOTS_MAX))
+   && xenvif_rx_ring_slots_available(queue))
|| (queue->vif->stall_timeout &&
(xenvif_rx_queue_stalled(queue)
 || xenvif_rx_queue_ready(queue)))
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] [PATCH v3 17/20] net/xen-netfront: Make it running on 64KB page granularity

2015-08-20 Thread David Vrabel
On 07/08/15 17:46, Julien Grall wrote:
 The PV network protocol is using 4KB page granularity. The goal of this
 patch is to allow a Linux using 64KB page granularity using network
 device on a non-modified Xen.
 
 It's only necessary to adapt the ring size and break skb data in small
 chunk of 4KB. The rest of the code is relying on the grant table code.
 
 Note that we allocate a Linux page for each rx skb but only the first
 4KB is used. We may improve the memory usage by extending the size of
 the rx skb.

Reviewed-by: David Vrabel david.vra...@citrix.com

David
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] [PATCH v3] xen-apic: Enable on domU as well

2015-08-11 Thread David Vrabel
On 10/08/15 14:40, Jason A. Donenfeld wrote:
 It turns out that domU also requires the Xen APIC driver. Otherwise we
 get stuck in busy loops that never exit, such as in this stack trace:

Applied to for-linus-4.2 and tagged for stable, thanks.

David
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] [PATCH v3 0/9] Use correctly the Xen memory terminologies

2015-08-11 Thread David Vrabel
On 07/08/15 17:34, Julien Grall wrote:
 Hi all,
 
 This patch series aims to use the memory terminologies described in
 include/xen/mm.h [1] for Linux xen code.

Applied to for-linus-4.3, thanks.

David
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] xen-apic: Enable on domU as well

2015-08-10 Thread David Vrabel
On 10/08/15 14:40, Jason A. Donenfeld wrote:
 It turns out that domU also requires the Xen APIC driver. Otherwise we
 get stuck in busy loops that never exit, such as in this stack trace:

What's the difference between v3 and v2?

David

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] printk from softirq on xen: hard lockup

2015-08-06 Thread David Vrabel
On 05/08/15 00:01, Jason A. Donenfeld wrote:
 Hey David,
 
 Sorry for the premature response on my phone earlier. Real reply follows.
 rcu_read_lock, when using Xen PV. Relevant excerpts of the
 ^^  PV guest?
 
 Yes. The lockup occurs on a PV guest. Nothing special at all about the
 configuration. Vanilla upstream 4.1.3 kernel.
 
 __xapic_wait_icr_idle () at ./arch/x86/include/asm/ipi.h:56
 56  while (native_apic_mem_read(APIC_ICR)  APIC_ICR_BUSY)
  ^^  = HVM guest

 Which is it?
 
 That's odd. It's a PV guest, not an HVM nor PVH guest.

Linux PV guests must use the Xen PV APIC driver.  You need to
investigate why your PV guest is not using this (although I'm surprised
it works at all with the wrong one).

David
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] printk from softirq on xen: hard lockup

2015-08-06 Thread David Vrabel
On 06/08/15 16:58, Jason A. Donenfeld wrote:
 On Thu, Aug 6, 2015 at 12:02 PM, David Vrabel david.vra...@citrix.com wrote:
 Linux PV guests must use the Xen PV APIC driver.  You need to
 investigate why your PV guest is not using this (although I'm surprised
 it works at all with the wrong one).
 
 Actually it appears this PV Guest is using the flat APIC driver
 instead of the Xen APIC driver.
 
 But upon further investigation into why:
 
 arch/x86/xen/Makefile:
 obj-$(CONFIG_XEN_DOM0)  += apic.o vga.o
 
 It would appear that only dom0 gets to use the Xen APIC driver.
 
 What gives?

Looks like the Makefile is wrong.

Try this:

8--
x86/xen: build Xen PV APIC driver for domU as well

A PV domU also needs the Xen PV APIC driver but it was only built for
CONFIG_XEN_DOM0=y.

Signed-off-by: David Vrabel david.vra...@citrix.com
Cc: sta...@vger.kernel.org
---
 arch/x86/xen/Makefile | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/xen/Makefile b/arch/x86/xen/Makefile
index 7322755..4b6e29a 100644
--- a/arch/x86/xen/Makefile
+++ b/arch/x86/xen/Makefile
@@ -13,13 +13,13 @@ CFLAGS_mmu.o:= $(nostackp)
 obj-y  := enlighten.o setup.o multicalls.o mmu.o irq.o \
time.o xen-asm.o xen-asm_$(BITS).o \
grant-table.o suspend.o platform-pci-unplug.o \
-   p2m.o
+   p2m.o apic.o

 obj-$(CONFIG_EVENT_TRACING) += trace.o

 obj-$(CONFIG_SMP)  += smp.o
 obj-$(CONFIG_PARAVIRT_SPINLOCKS)+= spinlock.o
 obj-$(CONFIG_XEN_DEBUG_FS) += debugfs.o
-obj-$(CONFIG_XEN_DOM0) += apic.o vga.o
+obj-$(CONFIG_XEN_DOM0) += vga.o
 obj-$(CONFIG_SWIOTLB_XEN)  += pci-swiotlb-xen.o
 obj-$(CONFIG_XEN_EFI)  += efi.o
-- 
2.1.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] printk from softirq on xen: hard lockup

2015-08-04 Thread David Vrabel
On 04/08/15 17:41, Jason A. Donenfeld wrote:
 Hi folks,
 
 Paul McKenney and I had an offline discussion about some rcu questions
 that eventually lead into me investigating a strange full lock-up I'm
 experiencing as a consequence of a printk in softirq inside of an
 rcu_read_lock, when using Xen PV. Relevant excerpts of the
^^  PV guest?

 (gdb) target remote localhost:
 Remote debugging using localhost:
 __xapic_wait_icr_idle () at ./arch/x86/include/asm/ipi.h:56
 56  while (native_apic_mem_read(APIC_ICR)  APIC_ICR_BUSY)
 ^^  = HVM guest

Which is it?

A HVM guest's serial console may be particularly slow (since it's
emulated by qemu).  Try using the PV console?

David
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] [PATCH 4/8] xen: Use the correctly the Xen memory terminologies

2015-07-29 Thread David Vrabel
On 29/07/15 12:35, Julien Grall wrote:
 Hi Wei,
 
 On 29/07/15 11:13, Wei Liu wrote:
 On Tue, Jul 28, 2015 at 04:02:45PM +0100, Julien Grall wrote:
 [...]
 diff --git a/drivers/net/xen-netback/netback.c 
 b/drivers/net/xen-netback/netback.c
 index 7d50711..3b7b7c3 100644
 --- a/drivers/net/xen-netback/netback.c
 +++ b/drivers/net/xen-netback/netback.c
 @@ -314,7 +314,7 @@ static void xenvif_gop_frag_copy(struct xenvif_queue 
 *queue, struct sk_buff *skb
 } else {
 copy_gop-source.domid = DOMID_SELF;
 copy_gop-source.u.gmfn =
 -   virt_to_mfn(page_address(page));
 +   virt_to_gfn(page_address(page));
 }
 copy_gop-source.offset = offset;
  
 @@ -1284,7 +1284,7 @@ static void xenvif_tx_build_gops(struct xenvif_queue 
 *queue,
 queue-tx_copy_ops[*copy_ops].source.offset = txreq.offset;
  
 queue-tx_copy_ops[*copy_ops].dest.u.gmfn =
 -   virt_to_mfn(skb-data);
 +   virt_to_gfn(skb-data);
 queue-tx_copy_ops[*copy_ops].dest.domid = DOMID_SELF;
 queue-tx_copy_ops[*copy_ops].dest.offset =
 offset_in_page(skb-data);

 Reviewed-by: Wei Liu wei.l...@citrix.com

 One possible improvement is to change gmfn in copy_gop to gfn as well.
 But that's outside of netback code.
 
 The structure gnttab_copy is part of the hypervisor interface. Is it
 fine to differ on the naming between Xen and Linux?
 
 Or maybe we could do the change in the public headers in Xen repo too.
 Is it fine to do field renaming in public headers?

I think this series should not alter than Xen API.

David
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] [PATCH 4/8] xen: Use the correctly the Xen memory terminologies

2015-07-28 Thread David Vrabel
On 28/07/15 16:02, Julien Grall wrote:
 Based on include/xen/mm.h [1], Linux is mistakenly using MFN when GFN
 is meant, I suspect this is because the first support for Xen was for
 PV. This brough some misimplementation of helpers on ARM and make the
 developper confused the expected behavior.

For the benefit of other subsystem maintainers, this is a purely
mechanical change in Xen-specific terminology.  It doesn't need reviews
or acks from non-Xen people (IMO).

 For instance, with pfn_to_mfn, we expect to get an MFN based on the name.
 Although, if we look at the implementation on x86, it's returning a GFN.
 
 For clarity and avoid new confusion, replace any reference of mfn into
 gnf in any helpers used by PV drivers.
 
 Take also the opportunity to simplify simple construction such
 as pfn_to_mfn(page_to_pfn(page)) into page_to_gfn. More complex clean up
 will come in follow-up patches.
 
 I think it may be possible to do further clean up in the x86 code to
 ensure that helpers returning machine address (such as virt_address) is
 not used by no auto-translated guests. I will let x86 xen expert doing
 it.

Reviewed-by: David Vrabel david.vra...@citrix.com

It looks a bit odd to use GFN in some of the PV code where the
hypervisor API uses MFN but overall I think using the correct
terminology where possible is best.  But I'd like to have Boris's or
Konrad's opinion on this.

David
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] xen-netfront: Remove the meaningless code

2015-06-26 Thread David Vrabel
On 27/06/15 00:17, Liang Li wrote:
 The function netif_set_real_num_tx_queues() will return -EINVAL if
 the second parameter  1, so call this function with the second
 parameter set to 0 is meaningless.

Reviewed-by: David Vrabel david.vra...@citrix.com

David
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] [PATCH v2 0/7] xen: Clean up

2015-06-17 Thread David Vrabel
On 17/06/15 15:28, Julien Grall wrote:
 Hi,
 
 Thoses patches was originally part of the Xen 64KB series [1]. Although,
 I think they can go without waiting the rest of the 64KB series.
 
 Patch #1-#4 should go through the Xen tree, even though patch #1 touches
 multiple part.
 
 Patch #5-#7 should go through the Block tree.

I've applied all of these to for-linus-4.2, thanks.  Including the block
ones since they were trivial and not block-related.

David
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/2] net/xen-netfront: Correct printf format in xennet_get_responses

2015-06-04 Thread David Vrabel
On 04/06/15 13:45, Julien Grall wrote:
 On 03/06/15 18:06, Joe Perches wrote:
 On Wed, 2015-06-03 at 17:55 +0100, Julien Grall wrote:
 rx-status is an int16_t, print it using %d rather than %u in order to
 have a meaningful value when the field is negative.
 []
 diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
 []
 @@ -733,7 +733,7 @@ static int xennet_get_responses(struct netfront_queue 
 *queue,
 if (unlikely(rx-status  0 ||
  rx-offset + rx-status  PAGE_SIZE)) {
 if (net_ratelimit())
 -   dev_warn(dev, rx-offset: %x, size: %u\n,
 +   dev_warn(dev, rx-offset: %x, size: %d\n,

 If you're going to do this, perhaps it'd be sensible to
 also change the %x to %#x or 0x%x so that people don't
 mistake offset without an [a-f] for decimal.
 
 Good idea. I will resend a version of this series.
 
 David, can I keep you Reviewed-by for this change?#

Can you make the offset %d instead?

You can keep the reviewed-by if you do this.

David
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net] xen: netback: read hotplug script once at start of day.

2015-05-29 Thread David Vrabel
On 29/05/15 17:24, Ian Campbell wrote:
 --- a/drivers/net/xen-netback/xenbus.c
 +++ b/drivers/net/xen-netback/xenbus.c
 @@ -235,6 +235,7 @@ static int netback_remove(struct xenbus_device *dev)
   kobject_uevent(dev-dev.kobj, KOBJ_OFFLINE);
   xen_unregister_watchers(be-vif);
   xenbus_rm(XBT_NIL, dev-nodename, hotplug-status);
 + kfree(be-vif-hotplug_script);

Should this kfree() be in xenvif_free()?

   xenvif_free(be-vif);
   be-vif = NULL;
   }

David

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv1] xen-netfront: properly destroy queues when removing device

2015-05-27 Thread David Vrabel
xennet_remove() freed the queues before freeing the netdevice which
results in a use-after-free when free_netdev() tries to delete the
napi instances that have already been freed.

Fix this by fully destroy the queues (which includes deleting the napi
instances) before freeing the netdevice.

Signed-off-by: David Vrabel david.vra...@citrix.com
---
 drivers/net/xen-netfront.c |   15 ++-
 1 file changed, 2 insertions(+), 13 deletions(-)

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index 3f45afd..e031c94 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -1698,6 +1698,7 @@ static void xennet_destroy_queues(struct netfront_info 
*info)
 
if (netif_running(info-netdev))
napi_disable(queue-napi);
+   del_timer_sync(queue-rx_refill_timer);
netif_napi_del(queue-napi);
}
 
@@ -2102,9 +2103,6 @@ static const struct attribute_group xennet_dev_group = {
 static int xennet_remove(struct xenbus_device *dev)
 {
struct netfront_info *info = dev_get_drvdata(dev-dev);
-   unsigned int num_queues = info-netdev-real_num_tx_queues;
-   struct netfront_queue *queue = NULL;
-   unsigned int i = 0;
 
dev_dbg(dev-dev, %s\n, dev-nodename);
 
@@ -2112,16 +2110,7 @@ static int xennet_remove(struct xenbus_device *dev)
 
unregister_netdev(info-netdev);
 
-   for (i = 0; i  num_queues; ++i) {
-   queue = info-queues[i];
-   del_timer_sync(queue-rx_refill_timer);
-   }
-
-   if (num_queues) {
-   kfree(info-queues);
-   info-queues = NULL;
-   }
-
+   xennet_destroy_queues(info);
xennet_free_netdev(info-netdev);
 
return 0;
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: xen-netfront crash when detaching network while some network activity

2015-05-26 Thread David Vrabel
On 22/05/15 12:49, Marek Marczykowski-Górecki wrote:
 Hi all,
 
 I'm experiencing xen-netfront crash when doing xl network-detach while
 some network activity is going on at the same time. It happens only when
 domU has more than one vcpu. Not sure if this matters, but the backend
 is in another domU (not dom0). I'm using Xen 4.2.2. It happens on kernel
 3.9.4 and 4.1-rc1 as well.
 
 Steps to reproduce:
 1. Start the domU with some network interface
 2. Call there 'ping -f some-IP'
 3. Call 'xl network-detach NAME 0'

There's a use-after-free in xennet_remove().  Does this patch fix it?

8
xen-netfront: properly destroy queues when removing device

xennet_remove() freed the queues before freeing the netdevice which
results in a use-after-free when free_netdev() tries to delete the
napi instances that have already been freed.

Fix this by fully destroy the queues (which includes deleting the napi
instances) before freeing the netdevice.

Reported-by: Marek Marczykowski marma...@invisiblethingslab.com
Signed-off-by: David Vrabel david.vra...@citrix.com
---
 drivers/net/xen-netfront.c |   15 ++-
 1 file changed, 2 insertions(+), 13 deletions(-)

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index 3f45afd..e031c94 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -1698,6 +1698,7 @@ static void xennet_destroy_queues(struct netfront_info 
*info)
 
if (netif_running(info-netdev))
napi_disable(queue-napi);
+   del_timer_sync(queue-rx_refill_timer);
netif_napi_del(queue-napi);
}
 
@@ -2102,9 +2103,6 @@ static const struct attribute_group xennet_dev_group = {
 static int xennet_remove(struct xenbus_device *dev)
 {
struct netfront_info *info = dev_get_drvdata(dev-dev);
-   unsigned int num_queues = info-netdev-real_num_tx_queues;
-   struct netfront_queue *queue = NULL;
-   unsigned int i = 0;
 
dev_dbg(dev-dev, %s\n, dev-nodename);
 
@@ -2112,16 +2110,7 @@ static int xennet_remove(struct xenbus_device *dev)
 
unregister_netdev(info-netdev);
 
-   for (i = 0; i  num_queues; ++i) {
-   queue = info-queues[i];
-   del_timer_sync(queue-rx_refill_timer);
-   }
-
-   if (num_queues) {
-   kfree(info-queues);
-   info-queues = NULL;
-   }
-
+   xennet_destroy_queues(info);
xennet_free_netdev(info-netdev);
 
return 0;
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] xen-netfront crash when detaching network while some network activity

2015-05-22 Thread David Vrabel
On 22/05/15 12:49, Marek Marczykowski-Górecki wrote:
 Hi all,
 
 I'm experiencing xen-netfront crash when doing xl network-detach while
 some network activity is going on at the same time. It happens only when
 domU has more than one vcpu. Not sure if this matters, but the backend
 is in another domU (not dom0). I'm using Xen 4.2.2. It happens on kernel
 3.9.4 and 4.1-rc1 as well.
 
 Steps to reproduce:
 1. Start the domU with some network interface
 2. Call there 'ping -f some-IP'
 3. Call 'xl network-detach NAME 0'

I tried this about 10 times without a crash.  How reproducible is it?

I used a 4.1-rc4 frontend and a 4.0 backend.

David
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] xen-netfront crash when detaching network while some network activity

2015-05-22 Thread David Vrabel
On 22/05/15 17:42, Marek Marczykowski-Górecki wrote:
 On Fri, May 22, 2015 at 05:25:44PM +0100, David Vrabel wrote:
 On 22/05/15 12:49, Marek Marczykowski-Górecki wrote:
 Hi all,

 I'm experiencing xen-netfront crash when doing xl network-detach while
 some network activity is going on at the same time. It happens only when
 domU has more than one vcpu. Not sure if this matters, but the backend
 is in another domU (not dom0). I'm using Xen 4.2.2. It happens on kernel
 3.9.4 and 4.1-rc1 as well.

 Steps to reproduce:
 1. Start the domU with some network interface
 2. Call there 'ping -f some-IP'
 3. Call 'xl network-detach NAME 0'

 I tried this about 10 times without a crash.  How reproducible is it?

 I used a 4.1-rc4 frontend and a 4.0 backend.
 
 It happens every time for me... Do you have at least two vcpus in that
 domU? With one vcpu it doesn't crash. The IP for ping I've used one in
 backend domU, but it shouldn't matter.
 
 Backend is 3.19.6 here. I don't see any changes there between rc1 and
 rc4, so stayed with rc1. With 4.1-rc1 backend it also crashes for me.

Doesn't repro for me with 4 VCPU PV or PVHVM guests.  Is your guest
kernel vanilla or does it have some qubes specific patches on top?

David
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] ipg: redundancy with mii.h

2006-05-20 Thread David Vrabel

Apologies for the (very) late response.

0007-ipg-plug-leaks-in-the-error-path-of-ipg_nic_open.txt broke receive 
since it was skipping the initialization of the Rx buffers.  Patch attached.


Did anyone manage to get a response from IC Plus regarding the required 
Signed-off-by line?


David Vrabel
ipg: initialize Rx buffers correctly

A previous patch resulted in the initialization of the Rx buffers
being skipped.

Signed-off-by: David Vrabel [EMAIL PROTECTED]

--- linux-source-2.6.16.orig/drivers/net/ipg.c  2006-05-20 21:38:44.604788258 
+0100
+++ linux-source-2.6.16/drivers/net/ipg.c   2006-05-20 21:39:14.298898552 
+0100
@@ -1298,18 +1298,16 @@
sp-RxBuffNotReady = 0;
 
for (i = 0; i  IPG_RFDLIST_LENGTH; i++) {
-   if (!sp-RxBuff[i])
-   continue;
-
-   /* Free any allocated receive buffers. */
-   pci_unmap_single(sp-pdev,
-sp-RxBuffDMAhandle[i].dmahandle,
-sp-RxBuffDMAhandle[i].len,
-PCI_DMA_FROMDEVICE);
-
-   IPG_DEV_KFREE_SKB(sp-RxBuff[i]);
-   sp-RxBuff[i] = NULL;
+   if (sp-RxBuff[i]) {
+   /* Free any allocated receive buffers. */
+   pci_unmap_single(sp-pdev,
+sp-RxBuffDMAhandle[i].dmahandle,
+sp-RxBuffDMAhandle[i].len,
+PCI_DMA_FROMDEVICE);
 
+   IPG_DEV_KFREE_SKB(sp-RxBuff[i]);
+   sp-RxBuff[i] = NULL;
+   }
/* Clear out the RFS field. */
sp-RFDList[i].RFS = 0x;
 


Re: [PATCH 2/2] ipg: redundancy with mii.h

2006-05-04 Thread David Vrabel

Francois Romieu wrote:


ipg: remove forward declarations

It makes no sense in a new driver.

Signed-off-by: Francois Romieu [EMAIL PROTECTED]


Ack.


ipg: replace #define with enum

Added some underscores to improve readability.

Signed-off-by: Francois Romieu [EMAIL PROTECTED]


Nack.  Register names in code should match those used in the 
documentation (even if they are a bit unreadable).  Though I will 
conceed that the available datasheet doesn't actually describe the 
majority of the registers.



ipg: removal of useless #defines

IPG_TX_NOTBUSY apart (one occurence in ipg.c), the #defines appear

nowhere in the sources.


Ack.


ipg: redundancy with mii.h - take II

Replace a bunch of #define with their counterpart from mii.h

It is applied to the usual MII registers this time.


Ack.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: IP1000 gigabit nic driver

2006-05-03 Thread David Vrabel
Pekka J Enberg wrote:
 On Wed, 3 May 2006, Andrew Morton wrote:
 Please remember that to merge this we'll need a signed-off-by from the
 original developers.  (That's not very gplish, but such is life).
 
 OK. Lets see if we can track one of them developers down. I see Craig 
 Rich's email (only email found in the original mail) is out of date so if 
 anyone knows how to reach him, please let me know. Thanks!

I think/guess that IC Plus bought out Sundance so you'd need to contact
someone at IC Plus.

David Vrabel
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: IP1000 gigabit nic driver

2006-05-01 Thread David Vrabel

Lennert Buytenhek wrote:

On Mon, May 01, 2006 at 10:38:47PM +0200, Francois Romieu wrote:



-/* Minimum number of miliseconds used to toggle MDC clock during
+/* Minimum number of nanoseconds used to toggle MDC clock during
 * MII/GMII register access.
 */
-#define IPG_PC_PHYCTRLWAIT   0x01
+#defineIPG_PC_PHYCTRLWAIT_NS   200


I would have expected a cycle of 400 ns (p.72/77 of the datasheet)
for a 2.5 MHz clock. Why is it cut by a two factor ?



200 ns high + 200 ns low = 400 ns clock period?


Yes.

David
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: IP1000 gigabit nic driver

2006-04-30 Thread David Vrabel

Pekka Enberg wrote:

On Sat, 2006-04-29 at 14:21 +0200, David Gómez wrote:

I already had it modified, just needed to create the patch... Anyway,
have you submitted it to netdev?


On Sat, 2006-04-29 at 23:35 +0300, Pekka Enberg wrote:

No, I haven't. I don't have the hardware, so I can't test the driver.
Furthermore, there's plenty of stuff to fix before it's in any shape for
submission. If someone wants to give this patch a spin, I would love to
hear the results.


Thanks for doing this Pekka.  I've fixed up some stuff and given it some 
brief testing on a 100BaseT network and it seems to work now.



Subject: [PATCH] IP1000 Gigabit Ethernet device driver

This is a cleaned up fork of the IP1000A device driver:

  http://www.icplus.com.tw/driver-pp-IP1000A.html

Open issues include but are not limited to:

  - ipg_probe() looks really fishy and doesn't handle all errors
(e.g. ioremap failing).
  - ipg_nic_do_ioctl() is playing games with user-space pointer.
We should use ethtool ioctl instead as suggested by Arjan.


Still pending.  Also:

- something (PHY reset/auto negotiation?) takes 2-3 seconds and
  appears to be done with interrupts disabled.


  - For multiple devices, the driver uses a global root_dev and
ipg_remove() play some tricks which look fishy.


Killed this.  It was broke and ugly as hell.

Attached is patch with some more changes:

- Remove changelogs
- Remove ether_crc_le() -- use crc32_le() instead.
- No more nonsense with root_dev -- ipg_remove() now works.
- Move PHY and MAC address initialization into the ipg_probe().  It was
  previously filling in the MAC address on open which breaks some user
  space.
- Folded ipg_nic_init into ipg_probe since it was broke otherwise.

Signed-off-by: David Vrabel [EMAIL PROTECTED]
Index: linux-source-2.6.16/drivers/net/ipg.c
===
--- linux-source-2.6.16.orig/drivers/net/ipg.c  2006-04-30 22:26:05.788013667 
+0100
+++ linux-source-2.6.16/drivers/net/ipg.c   2006-05-01 00:23:28.358641581 
+0100
@@ -13,149 +13,14 @@
  * 408 873 4117
  * www.sundanceti.com
  * [EMAIL PROTECTED]
- *
- * Rev  Date Description
- * --
- * 0.1  11/8/99  Initial revision work begins.
- *
- * 0.2  11/12/99  Basic operation achieved, continuing work.
- *
- * 0.3  11/19/99  MAC Loop Back for sync problem testing.
- *
- * 0.4  12/22/99  ioctl for diagnotic program 'hunter' support.
- *
- * 0.5  4/13/00   Updates to
- *
- * 0.6  6/14/00   Slight correction to handling TFDDONE, and
- *preservation of PHYCTRL polarity bits.
- *
- * 0.7  7/27/00   Modifications to accomodate triple speed
- *autonegotiation. Also change to ioctl routine
- *to handle unknown PHY address.
- *
- * 0.8  8/11/00   Added change_mtu function.
- *
- * 0.9  8/15/00   Corrected autonegotiation resolution.
- *
- * 0.10 8/30/00   Changed constants to use IPG in place
- *of RIO. Also, removed most of debug
- *code in preparation for production release.
- *
- * 0.11 8/31/00   Utilize 64 bit data types where appropriate.
- *
- * 0.12 9/1/00Move some constants to include file and utilize
- *RxDMAInt register.
- *
- * 0.13 10/31/00  Several minor modifications to improve stability.
- *
- * 0.14 11/28/00  Added call to nic_tx_free if TFD not available.
- *
- * 0.15 12/5/00   Corrected problem with receive errors, always set
- *receive buffer address to NULL. Release RX buffers
- *on errors.
- *
- * 0.16 12/20/00  Corrected autoneg resolution issue, must detect
- *speed via PHYCTRL register. Also, perform only 1
- *loop in the nic_txcleanup routine.
- *
- * 0.17 2/7/01Changed all references of ST2021 to IPG.
- *When next TFD not available, return -ENOMEM instead
- *of 0. Removed references to RUBICON.
- *
- * 0.18 2/14/01   Corrected problem when unexpected jumbo frames are
- *received (now dropped properly.) Changed
- *DROP_ON_ERRORS breaking out Ethernet errors and
- *TCP/IP errors serparately. Corrected Gigabit
- *copper PAUSE autonegotiation.
- *
- * 0.19 2/22/01   Changed interrupt handling of RFD_LIST_END,
- *INT_REQUESTED, and RX_DMA_COMPLETE. Masked off
- *RMON statistics and unused MIB statistics.
- *Make sure *all* statistics are accounted for
- *(either masked or read in get_stats) to avoid
- *perpetual UpdateStats interrupt from causing
- *driver to crash.
- *
- * 0.20 3/2/01Corrected error in nic_stop. Need to set
- *TxBuff[] = NULL after freeing TxBuff and
- *RxBuff[] = NULL after freeing RxBuff.
- *
- * 0.21 3/5/01Correct 10/100Mbit PAUSE autonegotiation

Re: IP1000 gigabit nic driver

2006-04-29 Thread David Vrabel

Pekka Enberg wrote:


I ended up doing most of them myself [1]. Sorry :-) Are the datasheets
public by the way?


http://www.icplus.com.tw/Data/Datasheet/IP1000A-DS-R08-07052005.pdf


  1. http://www.cs.helsinki.fi/u/penberg/linux/ip1000-driver.patch

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: IP1000 gigabit nic driver

2006-04-27 Thread David Vrabel

Francois Romieu wrote:

David Gómez [EMAIL PROTECTED] :
[...]

Does anybody in this list know why the IP1000 driver is not
included in the kernel ?


Afaik the driver has never been submitted for inclusion.
At least not on netdev@vger.kernel.org (hint, hint).

[...]

The card in question is:

Sundance Technology Inc IC Plus IP1000

and the driver can be found in sundance web, sources 


URL please ?


included. I tried to contact the author but my email
bounced.

There's no LICENSE in the source, just copyrigth
sentences in the .c files, so i'm not sure under
which license it's distributed :-?.


/me goes to http://www.icplus.com.tw/driver-pp-IP1000A.html

$ unzip -c IP1000A-Linux-driver-v2.09f.zip | grep MODULE_LICENSE
MODULE_LICENSE(GPL);

It's a bit bloaty but it does not seem too bad (not mergeable as
is though). Do you volunteer to test random cra^W^W carefully
engineered code on your computer to help the rework/merging process ?


I finally got around to putting a 2nd NIC in my box that has one of this 
chips and was going to start fixing the driver up and preparing it for 
submission this weekend.  Or I might try rewriting from scratch based on 
the datasheet depending on how horrific the code looks on closer inspection.


Not got a whole lot of time to do this so no timescale for completion...

David Vrabel

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Integrating IXP4xx NPE support in kernel (with IXP4xx accesslibrary)

2005-12-22 Thread David Vrabel
Lennert Buytenhek wrote:
 
 That sounds very fishy -- are you 100% sure that the person doing the
 s/Proprietary/BSD/ was in fact authorized to do so?  (This is one reason
 why I'd love to see someone @intel.com submit this code upstream, BTW.)

FWIW, the parts of the IXP400 Access Library relevant to ethernet
(except the microcode which could be made into a binary blob for loading
by the firmware loader) are available (from Intel) under the eCos
variant of the GPL.

The IXP400 Access Library is a complete horror.  It'll need completely
rewriting to even have a chance at getting into the kernel.  The above
mentioned bits for eCos/RedBoot would probably be a good starting point.

David Vrabel

ps. It's bad form to CC a subscription-only list. I've trimmed
linux-arm-kernel from the CCs.
-- 
David Vrabel, Design Engineer

Arcom, Clifton Road   Tel: +44 (0)1223 411200 ext. 3233
Cambridge CB1 7EA, UK Web: http://www.arcom.com/
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html