Re: [Xen-devel] netfront/netback multiqueue exhausting grants

2016-01-25 Thread Jan Beulich
>>> On 23.01.16 at 01:29,  wrote:

> On 01/22/2016 07:02 PM, Jan Beulich wrote:
> On 22.01.16 at 11:40,  wrote:
>>> On 01/22/2016 03:53 PM, Jan Beulich wrote:
>>> On 22.01.16 at 04:36,  wrote:
> By the way, do you think it's possible to make grant table support bigger 
> page e.g 64K?
> One grant-ref per 64KB instead of 4KB, this should able to reduce the 
> grant 
> entry consumption significantly.

 How would that work with an underlying page size of 4k, and pages
 potentially being non-contiguous in machine address space? Besides
 that the grant table hypercall interface isn't prepared to support
 64k page size, due to its use of uint16_t for the length of copy ops.
>>>
>>> Right, and I mean whether we should consider address all the place as your 
>>> mentioned.
>> 
>> Just from an abstract perspective: How would you envision to avoid
>> machine address discontiguity? Or would you want to limit such an
> 
> E.g Reserve a page pool with continuous 64KB pages, or make grant-map support 
> huge page(2MB)?
> To be honest, I haven't think much about the detail.
> 
> Do you think that's unlikely to implement?

Contiguous memory (of whatever granularity above 4k) is quite
difficult to _guarantee_ in PV guests, so yes, without you or
someone else having a fantastic new idea on how to achieve
this I indeed see this pretty unlikely a thing to come true.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] netfront/netback multiqueue exhausting grants

2016-01-22 Thread Jan Beulich
>>> On 22.01.16 at 11:40,  wrote:
> On 01/22/2016 03:53 PM, Jan Beulich wrote:
> On 22.01.16 at 04:36,  wrote:
>>> By the way, do you think it's possible to make grant table support bigger 
>>> page e.g 64K?
>>> One grant-ref per 64KB instead of 4KB, this should able to reduce the grant 
>>> entry consumption significantly.
>> 
>> How would that work with an underlying page size of 4k, and pages
>> potentially being non-contiguous in machine address space? Besides
>> that the grant table hypercall interface isn't prepared to support
>> 64k page size, due to its use of uint16_t for the length of copy ops.
> 
> Right, and I mean whether we should consider address all the place as your 
> mentioned.

Just from an abstract perspective: How would you envision to avoid
machine address discontiguity? Or would you want to limit such an
improvement to only HVM/PVH/HVMlite guests?

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] netfront/netback multiqueue exhausting grants

2016-01-22 Thread Bob Liu


On 01/22/2016 03:53 PM, Jan Beulich wrote:
 On 22.01.16 at 04:36,  wrote:
>> By the way, do you think it's possible to make grant table support bigger 
>> page e.g 64K?
>> One grant-ref per 64KB instead of 4KB, this should able to reduce the grant 
>> entry consumption significantly.
> 
> How would that work with an underlying page size of 4k, and pages
> potentially being non-contiguous in machine address space? Besides
> that the grant table hypercall interface isn't prepared to support
> 64k page size, due to its use of uint16_t for the length of copy ops.
> 

Right, and I mean whether we should consider address all the place as your 
mentioned.
With multi-queue xen-block and xen-network, we got more reports that the grants 
were exhausted.

-- 
Regards,
-Bob

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] netfront/netback multiqueue exhausting grants

2016-01-22 Thread Bob Liu

On 01/22/2016 07:02 PM, Jan Beulich wrote:
 On 22.01.16 at 11:40,  wrote:
>> On 01/22/2016 03:53 PM, Jan Beulich wrote:
>> On 22.01.16 at 04:36,  wrote:
 By the way, do you think it's possible to make grant table support bigger 
 page e.g 64K?
 One grant-ref per 64KB instead of 4KB, this should able to reduce the 
 grant 
 entry consumption significantly.
>>>
>>> How would that work with an underlying page size of 4k, and pages
>>> potentially being non-contiguous in machine address space? Besides
>>> that the grant table hypercall interface isn't prepared to support
>>> 64k page size, due to its use of uint16_t for the length of copy ops.
>>
>> Right, and I mean whether we should consider address all the place as your 
>> mentioned.
> 
> Just from an abstract perspective: How would you envision to avoid
> machine address discontiguity? Or would you want to limit such an

E.g Reserve a page pool with continuous 64KB pages, or make grant-map support 
huge page(2MB)?
To be honest, I haven't think much about the detail.

Do you think that's unlikely to implement?
If yes, we have to limit the queue numbers, VM numbers and vdisk/vif numbers in 
a proper way
to make sure the guests won't enter grant-exhausted state.

> improvement to only HVM/PVH/HVMlite guests?
> 
> Jan
> 

-- 
Regards,
-Bob

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] netfront/netback multiqueue exhausting grants

2016-01-21 Thread Wei Liu
On Thu, Jan 21, 2016 at 10:37:51AM +, Ian Campbell wrote:
> On Thu, 2016-01-21 at 10:25 +, Wei Liu wrote:
> > On Thu, Jan 21, 2016 at 10:12:27AM +, Ian Campbell wrote:
> > [...]
> > > > I've asked the reporter to send logs for the 4.4 case to xen-devel.
> > > 
> > > User confirmed[0] that 4.4 is actually OK.
> > > 
> > > Did someone request stable backports yet, or shall I do so?
> > > 
> > 
> > I vaguely remember we requested backport for relevant patches long time
> > ago, but I admit I have lost track. So it wouldn't hurt if you do it
> > again.
> 
> So I think we'd be looking for:
> 
> 32a8440 xen-netfront: respect user provided max_queues
> 4c82ac3 xen-netback: respect user provided max_queues
> ca88ea1 xen-netfront: update num_queues to real created
> 
> which certainly resolves things such that the workarounds work, and I think
> will also fix the default case such that it works with up to 32 vcpus
> (although it will consume all the grants and only get 31/32 queues).
> 
> Does that sound correct?
> 

Yes, it does.

> As Annie said, we may still want to consider what a sensible default max
> queues would be.
> 

Maybe we should set a cap to 8 or 16 by default.

Wei.

> Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] netfront/netback multiqueue exhausting grants

2016-01-21 Thread Ian Campbell
On Thu, 2016-01-21 at 10:56 +, David Vrabel wrote:
> On 20/01/16 12:23, Ian Campbell wrote:
> > There have been a few reports recently[0] which relate to a failure of
> > netfront to allocate sufficient grant refs for all the queues:
> > 
> > [0.533589] xen_netfront: can't alloc rx grant refs
> > [0.533612] net eth0: only created 31 queues
> > 
> > Which can be worked around by increasing the number of grants on the
> > hypervisor command line or by limiting the number of queues permitted
> > by
> > either back or front using a module param (which was broken but is now
> > fixed on both sides, but I'm not sure it has been backported everywhere
> > such that it is a reliable thing to always tell users as a workaround).
> > 
> > Is there any plan to do anything about the default/out of the box
> > experience? Either limiting the number of queues or making both ends
> > cope
> > more gracefully with failure to create some queues (or both) might be
> > sufficient?
> > 
> > I think the crash after the above in the first link at [0] is fixed? I
> > think that was the purpose of ca88ea1247df "xen-netfront: update
> > num_queues
> > to real created" which was in 4.3.
> 
> I think the correct solution is to increase the default maximum grant
> table size.

That could well make sense, but then there will just be another higher
limit, so we should perhaps do both.

i.e. factoring in:
 * performance i.e. ability for N queues to saturate whatever sort of link
   contemporary Linux can saturate these days, plus some headroom, or
   whatever other ceiling seems sensible)
 * grant table resource consumption i.e. (sensible max number of blks * nr
   gnts per blk + sensible max number of vifs * nr gnts per vif + other
   devs needs) < per guest grant limit) to pick both the default gnttab
   size and the default max queuers.

(or s/sensible/supportable/g etc).

> Although, unless you're using the not-yet-applied per-cpu rwlock patches
> multiqueue is terrible on many (multisocket) systems and the number of
> queue should be limited in netback to 4 or even just 2.

Presumably the guest can't tell, so it can't do this.

I think when you say "terrible" you don't mean "worse than without mq" but
rather "not realising the expected gains from a larger nunber of queues",
right?.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] netfront/netback multiqueue exhausting grants

2016-01-21 Thread David Vrabel
On 20/01/16 12:23, Ian Campbell wrote:
> There have been a few reports recently[0] which relate to a failure of
> netfront to allocate sufficient grant refs for all the queues:
> 
> [0.533589] xen_netfront: can't alloc rx grant refs
> [0.533612] net eth0: only created 31 queues
> 
> Which can be worked around by increasing the number of grants on the
> hypervisor command line or by limiting the number of queues permitted by
> either back or front using a module param (which was broken but is now
> fixed on both sides, but I'm not sure it has been backported everywhere
> such that it is a reliable thing to always tell users as a workaround).
> 
> Is there any plan to do anything about the default/out of the box
> experience? Either limiting the number of queues or making both ends cope
> more gracefully with failure to create some queues (or both) might be
> sufficient?
> 
> I think the crash after the above in the first link at [0] is fixed? I
> think that was the purpose of ca88ea1247df "xen-netfront: update num_queues
> to real created" which was in 4.3.

I think the correct solution is to increase the default maximum grant
table size.

Although, unless you're using the not-yet-applied per-cpu rwlock patches
multiqueue is terrible on many (multisocket) systems and the number of
queue should be limited in netback to 4 or even just 2.

David

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] netfront/netback multiqueue exhausting grants

2016-01-21 Thread Ian Campbell
On Thu, 2016-01-21 at 10:25 +, Wei Liu wrote:
> On Thu, Jan 21, 2016 at 10:12:27AM +, Ian Campbell wrote:
> [...]
> > > I've asked the reporter to send logs for the 4.4 case to xen-devel.
> > 
> > User confirmed[0] that 4.4 is actually OK.
> > 
> > Did someone request stable backports yet, or shall I do so?
> > 
> 
> I vaguely remember we requested backport for relevant patches long time
> ago, but I admit I have lost track. So it wouldn't hurt if you do it
> again.

So I think we'd be looking for:

32a8440 xen-netfront: respect user provided max_queues
4c82ac3 xen-netback: respect user provided max_queues
ca88ea1 xen-netfront: update num_queues to real created

which certainly resolves things such that the workarounds work, and I think
will also fix the default case such that it works with up to 32 vcpus
(although it will consume all the grants and only get 31/32 queues).

Does that sound correct?

As Annie said, we may still want to consider what a sensible default max
queues would be.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] netfront/netback multiqueue exhausting grants

2016-01-21 Thread Wei Liu
On Thu, Jan 21, 2016 at 10:12:27AM +, Ian Campbell wrote:
[...]
> > I've asked the reporter to send logs for the 4.4 case to xen-devel.
> 
> User confirmed[0] that 4.4 is actually OK.
> 
> Did someone request stable backports yet, or shall I do so?
> 

I vaguely remember we requested backport for relevant patches long time
ago, but I admit I have lost track. So it wouldn't hurt if you do it
again.

Wei.

> Ian.
> 
> [0] http://lists.xen.org/archives/html/xen-users/2016-01/msg00110.html

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] netfront/netback multiqueue exhausting grants

2016-01-21 Thread Ian Campbell
On Wed, 2016-01-20 at 15:16 +, Ian Campbell wrote:
> On Wed, 2016-01-20 at 10:10 -0500, Boris Ostrovsky wrote:
> > On 01/20/2016 10:02 AM, David Vrabel wrote:
> > > On 20/01/16 14:52, Ian Campbell wrote:
> > > > On Wed, 2016-01-20 at 09:40 -0500, Boris Ostrovsky wrote:
> > > > > On 01/20/2016 07:23 AM, Ian Campbell wrote:
> > > > > > There have been a few reports recently[0] which relate to a
> > > > > > failure of
> > > > > > netfront to allocate sufficient grant refs for all the queues:
> > > > > > 
> > > > > > [0.533589] xen_netfront: can't alloc rx grant refs
> > > > > > [0.533612] net eth0: only created 31 queues
> > > > > > 
> > > > > > Which can be worked around by increasing the number of grants
> > > > > > on
> > > > > > the
> > > > > > hypervisor command line or by limiting the number of queues
> > > > > > permitted
> > > > > > by
> > > > > > either back or front using a module param (which was broken but
> > > > > > is now
> > > > > > fixed on both sides, but I'm not sure it has been backported
> > > > > > everywhere
> > > > > > such that it is a reliable thing to always tell users as a
> > > > > > workaround).
> > > > > > 
> > > > > > Is there any plan to do anything about the default/out of the
> > > > > > box
> > > > > > experience? Either limiting the number of queues or making both
> > > > > > ends
> > > > > > cope
> > > > > > more gracefully with failure to create some queues (or both)
> > > > > > might be
> > > > > > sufficient?
> > > > > > 
> > > > > > I think the crash after the above in the first link at [0] is
> > > > > > fixed? I
> > > > > > think that was the purpose of ca88ea1247df "xen-netfront:
> > > > > > update
> > > > > > num_queues
> > > > > > to real created" which was in 4.3.
> > > > > I think ca88ea1247df is the solution --- it will limit the number
> > > > > of
> > > > > queues.
> > > > That's in 4.4, which the first link at [0] claimed to have tested.
> > > > I
> > > > can
> > > > see this fixing the crash, but does it really fix the "actually
> > > > works
> > > > with
> > > > less queues than it tried to get" issue?
> > 
> > That's what I thought it does too. I didn't notice that 4.4 was tested 
> > as well, so maybe not.
> 
> I've asked the reporter to send logs for the 4.4 case to xen-devel.

User confirmed[0] that 4.4 is actually OK.

Did someone request stable backports yet, or shall I do so?

Ian.

[0] http://lists.xen.org/archives/html/xen-users/2016-01/msg00110.html

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] netfront/netback multiqueue exhausting grants

2016-01-21 Thread David Vrabel
On 21/01/16 12:19, Ian Campbell wrote:
> On Thu, 2016-01-21 at 10:56 +, David Vrabel wrote:
>> On 20/01/16 12:23, Ian Campbell wrote:
>>> There have been a few reports recently[0] which relate to a failure of
>>> netfront to allocate sufficient grant refs for all the queues:
>>>
>>> [0.533589] xen_netfront: can't alloc rx grant refs
>>> [0.533612] net eth0: only created 31 queues
>>>
>>> Which can be worked around by increasing the number of grants on the
>>> hypervisor command line or by limiting the number of queues permitted
>>> by
>>> either back or front using a module param (which was broken but is now
>>> fixed on both sides, but I'm not sure it has been backported everywhere
>>> such that it is a reliable thing to always tell users as a workaround).
>>>
>>> Is there any plan to do anything about the default/out of the box
>>> experience? Either limiting the number of queues or making both ends
>>> cope
>>> more gracefully with failure to create some queues (or both) might be
>>> sufficient?
>>>
>>> I think the crash after the above in the first link at [0] is fixed? I
>>> think that was the purpose of ca88ea1247df "xen-netfront: update
>>> num_queues
>>> to real created" which was in 4.3.
>>
>> I think the correct solution is to increase the default maximum grant
>> table size.
> 
> That could well make sense, but then there will just be another higher
> limit, so we should perhaps do both.
> 
> i.e. factoring in:
>  * performance i.e. ability for N queues to saturate whatever sort of link
>contemporary Linux can saturate these days, plus some headroom, or
>whatever other ceiling seems sensible)
>  * grant table resource consumption i.e. (sensible max number of blks * nr
>gnts per blk + sensible max number of vifs * nr gnts per vif + other
>devs needs) < per guest grant limit) to pick both the default gnttab
>size and the default max queuers.

Yes.

>> Although, unless you're using the not-yet-applied per-cpu rwlock patches
>> multiqueue is terrible on many (multisocket) systems and the number of
>> queue should be limited in netback to 4 or even just 2.
> 
> Presumably the guest can't tell, so it can't do this.
> 
> I think when you say "terrible" you don't mean "worse than without mq" but
> rather "not realising the expected gains from a larger nunber of queues",
> right?.

Malcolm did the analysis but if I remember correctly, 8 queues performed
about the same as 1 queue and 16 were worse than 1 queue.

David

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] netfront/netback multiqueue exhausting grants

2016-01-21 Thread annie li


On 2016/1/21 9:17, David Vrabel wrote:

On 21/01/16 12:19, Ian Campbell wrote:

On Thu, 2016-01-21 at 10:56 +, David Vrabel wrote:

On 20/01/16 12:23, Ian Campbell wrote:

There have been a few reports recently[0] which relate to a failure of
netfront to allocate sufficient grant refs for all the queues:

[0.533589] xen_netfront: can't alloc rx grant refs
[0.533612] net eth0: only created 31 queues

Which can be worked around by increasing the number of grants on the
hypervisor command line or by limiting the number of queues permitted
by
either back or front using a module param (which was broken but is now
fixed on both sides, but I'm not sure it has been backported everywhere
such that it is a reliable thing to always tell users as a workaround).

Is there any plan to do anything about the default/out of the box
experience? Either limiting the number of queues or making both ends
cope
more gracefully with failure to create some queues (or both) might be
sufficient?

I think the crash after the above in the first link at [0] is fixed? I
think that was the purpose of ca88ea1247df "xen-netfront: update
num_queues
to real created" which was in 4.3.

I think the correct solution is to increase the default maximum grant
table size.

That could well make sense, but then there will just be another higher
limit, so we should perhaps do both.

i.e. factoring in:
  * performance i.e. ability for N queues to saturate whatever sort of link
contemporary Linux can saturate these days, plus some headroom, or
whatever other ceiling seems sensible)
  * grant table resource consumption i.e. (sensible max number of blks * nr
gnts per blk + sensible max number of vifs * nr gnts per vif + other
devs needs) < per guest grant limit) to pick both the default gnttab
size and the default max queuers.

Yes.


Would it waste lots of resources in the case where guest vif has lots of 
queue but no network load? Here is an example of gntref consumed by vif,

Dom0 20vcpu, domu 20vcpu,
one vif would consumes 20*256*2=10240 gntref.
If setting the maximum grant table size to 64pages(default value of xen 
is 32pages now?), then only 3 vif is supported in guest. Even blk isn't 
taken account in, and also blk multi-page ring feature.


Thanks
Annie



Although, unless you're using the not-yet-applied per-cpu rwlock patches
multiqueue is terrible on many (multisocket) systems and the number of
queue should be limited in netback to 4 or even just 2.

Presumably the guest can't tell, so it can't do this.

I think when you say "terrible" you don't mean "worse than without mq" but
rather "not realising the expected gains from a larger nunber of queues",
right?.

Malcolm did the analysis but if I remember correctly, 8 queues performed
about the same as 1 queue and 16 were worse than 1 queue.

David

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] netfront/netback multiqueue exhausting grants

2016-01-21 Thread Bob Liu

On 01/21/2016 08:19 PM, Ian Campbell wrote:
> On Thu, 2016-01-21 at 10:56 +, David Vrabel wrote:
>> On 20/01/16 12:23, Ian Campbell wrote:
>>> There have been a few reports recently[0] which relate to a failure of
>>> netfront to allocate sufficient grant refs for all the queues:
>>>
>>> [0.533589] xen_netfront: can't alloc rx grant refs
>>> [0.533612] net eth0: only created 31 queues
>>>
>>> Which can be worked around by increasing the number of grants on the
>>> hypervisor command line or by limiting the number of queues permitted
>>> by
>>> either back or front using a module param (which was broken but is now
>>> fixed on both sides, but I'm not sure it has been backported everywhere
>>> such that it is a reliable thing to always tell users as a workaround).
>>>
>>> Is there any plan to do anything about the default/out of the box
>>> experience? Either limiting the number of queues or making both ends
>>> cope
>>> more gracefully with failure to create some queues (or both) might be
>>> sufficient?
>>>
>>> I think the crash after the above in the first link at [0] is fixed? I
>>> think that was the purpose of ca88ea1247df "xen-netfront: update
>>> num_queues
>>> to real created" which was in 4.3.
>>
>> I think the correct solution is to increase the default maximum grant
>> table size.
> 
> That could well make sense, but then there will just be another higher
> limit, so we should perhaps do both.
> 
> i.e. factoring in:
>  * performance i.e. ability for N queues to saturate whatever sort of link
>contemporary Linux can saturate these days, plus some headroom, or
>whatever other ceiling seems sensible)
>  * grant table resource consumption i.e. (sensible max number of blks * nr
>gnts per blk + sensible max number of vifs * nr gnts per vif + other
>devs needs) < per guest grant limit) to pick both the default gnttab
>size and the default max queuers.
> 

Agree.
By the way, do you think it's possible to make grant table support bigger page 
e.g 64K?
One grant-ref per 64KB instead of 4KB, this should able to reduce the grant 
entry consumption significantly.

Bob

> (or s/sensible/supportable/g etc).
> 
>> Although, unless you're using the not-yet-applied per-cpu rwlock patches
>> multiqueue is terrible on many (multisocket) systems and the number of
>> queue should be limited in netback to 4 or even just 2.
> 
> Presumably the guest can't tell, so it can't do this.
> 
> I think when you say "terrible" you don't mean "worse than without mq" but
> rather "not realising the expected gains from a larger nunber of queues",
> right?.
> 
> Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] netfront/netback multiqueue exhausting grants

2016-01-21 Thread Jan Beulich
>>> On 22.01.16 at 04:36,  wrote:
> By the way, do you think it's possible to make grant table support bigger 
> page e.g 64K?
> One grant-ref per 64KB instead of 4KB, this should able to reduce the grant 
> entry consumption significantly.

How would that work with an underlying page size of 4k, and pages
potentially being non-contiguous in machine address space? Besides
that the grant table hypercall interface isn't prepared to support
64k page size, due to its use of uint16_t for the length of copy ops.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] netfront/netback multiqueue exhausting grants

2016-01-20 Thread Ian Campbell
On Wed, 2016-01-20 at 09:40 -0500, Boris Ostrovsky wrote:
> On 01/20/2016 07:23 AM, Ian Campbell wrote:
> > There have been a few reports recently[0] which relate to a failure of
> > netfront to allocate sufficient grant refs for all the queues:
> > 
> > [0.533589] xen_netfront: can't alloc rx grant refs
> > [0.533612] net eth0: only created 31 queues
> > 
> > Which can be worked around by increasing the number of grants on the
> > hypervisor command line or by limiting the number of queues permitted
> > by
> > either back or front using a module param (which was broken but is now
> > fixed on both sides, but I'm not sure it has been backported everywhere
> > such that it is a reliable thing to always tell users as a workaround).
> > 
> > Is there any plan to do anything about the default/out of the box
> > experience? Either limiting the number of queues or making both ends
> > cope
> > more gracefully with failure to create some queues (or both) might be
> > sufficient?
> > 
> > I think the crash after the above in the first link at [0] is fixed? I
> > think that was the purpose of ca88ea1247df "xen-netfront: update
> > num_queues
> > to real created" which was in 4.3.
> 
> I think ca88ea1247df is the solution --- it will limit the number of 
> queues.

That's in 4.4, which the first link at [0] claimed to have tested. I can
see this fixing the crash, but does it really fix the "actually works with
less queues than it tried to get" issue?

In any case having exhausted the grant entries creating queues there aren't
any left to shuffle actual data around, is there? (or are those
preallocated too?)

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] netfront/netback multiqueue exhausting grants

2016-01-20 Thread David Vrabel
On 20/01/16 14:52, Ian Campbell wrote:
> On Wed, 2016-01-20 at 09:40 -0500, Boris Ostrovsky wrote:
>> On 01/20/2016 07:23 AM, Ian Campbell wrote:
>>> There have been a few reports recently[0] which relate to a failure of
>>> netfront to allocate sufficient grant refs for all the queues:
>>>
>>> [0.533589] xen_netfront: can't alloc rx grant refs
>>> [0.533612] net eth0: only created 31 queues
>>>
>>> Which can be worked around by increasing the number of grants on the
>>> hypervisor command line or by limiting the number of queues permitted
>>> by
>>> either back or front using a module param (which was broken but is now
>>> fixed on both sides, but I'm not sure it has been backported everywhere
>>> such that it is a reliable thing to always tell users as a workaround).
>>>
>>> Is there any plan to do anything about the default/out of the box
>>> experience? Either limiting the number of queues or making both ends
>>> cope
>>> more gracefully with failure to create some queues (or both) might be
>>> sufficient?
>>>
>>> I think the crash after the above in the first link at [0] is fixed? I
>>> think that was the purpose of ca88ea1247df "xen-netfront: update
>>> num_queues
>>> to real created" which was in 4.3.
>>
>> I think ca88ea1247df is the solution --- it will limit the number of 
>> queues.
> 
> That's in 4.4, which the first link at [0] claimed to have tested. I can
> see this fixing the crash, but does it really fix the "actually works with
> less queues than it tried to get" issue?
> 
> In any case having exhausted the grant entries creating queues there aren't
> any left to shuffle actual data around, is there? (or are those
> preallocated too?)

All grants refs for Tx and Rx are preallocated (this is the allocation
that is failing above).

David

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] netfront/netback multiqueue exhausting grants

2016-01-20 Thread annie li


On 2016/1/20 7:23, Ian Campbell wrote:

There have been a few reports recently[0] which relate to a failure of
netfront to allocate sufficient grant refs for all the queues:

[0.533589] xen_netfront: can't alloc rx grant refs
[0.533612] net eth0: only created 31 queues

Which can be worked around by increasing the number of grants on the
hypervisor command line or by limiting the number of queues permitted by
either back or front using a module param (which was broken but is now
fixed on both sides, but I'm not sure it has been backported everywhere
such that it is a reliable thing to always tell users as a workaround).

Following are the patches to fix module param, they exist since v4.3.
xen-netfront: respect user provided max_queues
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=32a844056fd43dda647e1c3c6b9983bdfa04d17d
xen-netback: respect user provided max_queues
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=4c82ac3c37363e8c4ded6a5fe1ec5fa756b34df3


Is there any plan to do anything about the default/out of the box
experience? Either limiting the number of queues or making both ends cope
more gracefully with failure to create some queues (or both) might be
sufficient?
We run into similar issue recently, and guess it is better to suggest 
user to set netback module parameter with the default value as 8? see 
this link,
http://wiki.xenproject.org/wiki/Xen-netback_and_xen-netfront_multi-queue_performance_testing 


Probably more test are needed to get the default number of best experience.



I think the crash after the above in the first link at [0] is fixed? I
think that was the purpose of ca88ea1247df "xen-netfront: update num_queues
to real created" which was in 4.3.

Correct.

Thanks
Annie


Ian.

[0] http://lists.xen.org/archives/html/xen-users/2016-01/msg00100.html
 http://lists.xen.org/archives/html/xen-users/2016-01/msg00072.html
 some before hte xmas break too IIRC

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] netfront/netback multiqueue exhausting grants

2016-01-20 Thread Boris Ostrovsky

On 01/20/2016 10:02 AM, David Vrabel wrote:

On 20/01/16 14:52, Ian Campbell wrote:

On Wed, 2016-01-20 at 09:40 -0500, Boris Ostrovsky wrote:

On 01/20/2016 07:23 AM, Ian Campbell wrote:

There have been a few reports recently[0] which relate to a failure of
netfront to allocate sufficient grant refs for all the queues:

[0.533589] xen_netfront: can't alloc rx grant refs
[0.533612] net eth0: only created 31 queues

Which can be worked around by increasing the number of grants on the
hypervisor command line or by limiting the number of queues permitted
by
either back or front using a module param (which was broken but is now
fixed on both sides, but I'm not sure it has been backported everywhere
such that it is a reliable thing to always tell users as a workaround).

Is there any plan to do anything about the default/out of the box
experience? Either limiting the number of queues or making both ends
cope
more gracefully with failure to create some queues (or both) might be
sufficient?

I think the crash after the above in the first link at [0] is fixed? I
think that was the purpose of ca88ea1247df "xen-netfront: update
num_queues
to real created" which was in 4.3.

I think ca88ea1247df is the solution --- it will limit the number of
queues.

That's in 4.4, which the first link at [0] claimed to have tested. I can
see this fixing the crash, but does it really fix the "actually works with
less queues than it tried to get" issue?


That's what I thought it does too. I didn't notice that 4.4 was tested 
as well, so maybe not.


-boris



In any case having exhausted the grant entries creating queues there aren't
any left to shuffle actual data around, is there? (or are those
preallocated too?)

All grants refs for Tx and Rx are preallocated (this is the allocation
that is failing above).

David



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] netfront/netback multiqueue exhausting grants

2016-01-20 Thread Ian Campbell
On Wed, 2016-01-20 at 10:10 -0500, Boris Ostrovsky wrote:
> On 01/20/2016 10:02 AM, David Vrabel wrote:
> > On 20/01/16 14:52, Ian Campbell wrote:
> > > On Wed, 2016-01-20 at 09:40 -0500, Boris Ostrovsky wrote:
> > > > On 01/20/2016 07:23 AM, Ian Campbell wrote:
> > > > > There have been a few reports recently[0] which relate to a
> > > > > failure of
> > > > > netfront to allocate sufficient grant refs for all the queues:
> > > > > 
> > > > > [0.533589] xen_netfront: can't alloc rx grant refs
> > > > > [0.533612] net eth0: only created 31 queues
> > > > > 
> > > > > Which can be worked around by increasing the number of grants on
> > > > > the
> > > > > hypervisor command line or by limiting the number of queues
> > > > > permitted
> > > > > by
> > > > > either back or front using a module param (which was broken but
> > > > > is now
> > > > > fixed on both sides, but I'm not sure it has been backported
> > > > > everywhere
> > > > > such that it is a reliable thing to always tell users as a
> > > > > workaround).
> > > > > 
> > > > > Is there any plan to do anything about the default/out of the box
> > > > > experience? Either limiting the number of queues or making both
> > > > > ends
> > > > > cope
> > > > > more gracefully with failure to create some queues (or both)
> > > > > might be
> > > > > sufficient?
> > > > > 
> > > > > I think the crash after the above in the first link at [0] is
> > > > > fixed? I
> > > > > think that was the purpose of ca88ea1247df "xen-netfront: update
> > > > > num_queues
> > > > > to real created" which was in 4.3.
> > > > I think ca88ea1247df is the solution --- it will limit the number
> > > > of
> > > > queues.
> > > That's in 4.4, which the first link at [0] claimed to have tested. I
> > > can
> > > see this fixing the crash, but does it really fix the "actually works
> > > with
> > > less queues than it tried to get" issue?
> 
> That's what I thought it does too. I didn't notice that 4.4 was tested 
> as well, so maybe not.

I've asked the reporter to send logs for the 4.4 case to xen-devel.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] netfront/netback multiqueue exhausting grants

2016-01-20 Thread Ian Campbell
There have been a few reports recently[0] which relate to a failure of
netfront to allocate sufficient grant refs for all the queues:

[0.533589] xen_netfront: can't alloc rx grant refs
[0.533612] net eth0: only created 31 queues

Which can be worked around by increasing the number of grants on the
hypervisor command line or by limiting the number of queues permitted by
either back or front using a module param (which was broken but is now
fixed on both sides, but I'm not sure it has been backported everywhere
such that it is a reliable thing to always tell users as a workaround).

Is there any plan to do anything about the default/out of the box
experience? Either limiting the number of queues or making both ends cope
more gracefully with failure to create some queues (or both) might be
sufficient?

I think the crash after the above in the first link at [0] is fixed? I
think that was the purpose of ca88ea1247df "xen-netfront: update num_queues
to real created" which was in 4.3.

Ian.

[0] http://lists.xen.org/archives/html/xen-users/2016-01/msg00100.html
    http://lists.xen.org/archives/html/xen-users/2016-01/msg00072.html
    some before hte xmas break too IIRC

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] netfront/netback multiqueue exhausting grants

2016-01-20 Thread Boris Ostrovsky

On 01/20/2016 07:23 AM, Ian Campbell wrote:

There have been a few reports recently[0] which relate to a failure of
netfront to allocate sufficient grant refs for all the queues:

[0.533589] xen_netfront: can't alloc rx grant refs
[0.533612] net eth0: only created 31 queues

Which can be worked around by increasing the number of grants on the
hypervisor command line or by limiting the number of queues permitted by
either back or front using a module param (which was broken but is now
fixed on both sides, but I'm not sure it has been backported everywhere
such that it is a reliable thing to always tell users as a workaround).

Is there any plan to do anything about the default/out of the box
experience? Either limiting the number of queues or making both ends cope
more gracefully with failure to create some queues (or both) might be
sufficient?

I think the crash after the above in the first link at [0] is fixed? I
think that was the purpose of ca88ea1247df "xen-netfront: update num_queues
to real created" which was in 4.3.


I think ca88ea1247df is the solution --- it will limit the number of 
queues.


And apparently it's not in stable trees. At least not in 4.1.15, which 
is what the first reported is running:


https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/log/drivers/net/xen-netfront.c?id=refs/tags/v4.1.15

-boris




Ian.

[0] http://lists.xen.org/archives/html/xen-users/2016-01/msg00100.html
 http://lists.xen.org/archives/html/xen-users/2016-01/msg00072.html
 some before hte xmas break too IIRC



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel