Re: [Xen-devel] netfront/netback multiqueue exhausting grants
>>> On 23.01.16 at 01:29,wrote: > On 01/22/2016 07:02 PM, Jan Beulich wrote: > On 22.01.16 at 11:40, wrote: >>> On 01/22/2016 03:53 PM, Jan Beulich wrote: >>> On 22.01.16 at 04:36, wrote: > By the way, do you think it's possible to make grant table support bigger > page e.g 64K? > One grant-ref per 64KB instead of 4KB, this should able to reduce the > grant > entry consumption significantly. How would that work with an underlying page size of 4k, and pages potentially being non-contiguous in machine address space? Besides that the grant table hypercall interface isn't prepared to support 64k page size, due to its use of uint16_t for the length of copy ops. >>> >>> Right, and I mean whether we should consider address all the place as your >>> mentioned. >> >> Just from an abstract perspective: How would you envision to avoid >> machine address discontiguity? Or would you want to limit such an > > E.g Reserve a page pool with continuous 64KB pages, or make grant-map support > huge page(2MB)? > To be honest, I haven't think much about the detail. > > Do you think that's unlikely to implement? Contiguous memory (of whatever granularity above 4k) is quite difficult to _guarantee_ in PV guests, so yes, without you or someone else having a fantastic new idea on how to achieve this I indeed see this pretty unlikely a thing to come true. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] netfront/netback multiqueue exhausting grants
>>> On 22.01.16 at 11:40,wrote: > On 01/22/2016 03:53 PM, Jan Beulich wrote: > On 22.01.16 at 04:36, wrote: >>> By the way, do you think it's possible to make grant table support bigger >>> page e.g 64K? >>> One grant-ref per 64KB instead of 4KB, this should able to reduce the grant >>> entry consumption significantly. >> >> How would that work with an underlying page size of 4k, and pages >> potentially being non-contiguous in machine address space? Besides >> that the grant table hypercall interface isn't prepared to support >> 64k page size, due to its use of uint16_t for the length of copy ops. > > Right, and I mean whether we should consider address all the place as your > mentioned. Just from an abstract perspective: How would you envision to avoid machine address discontiguity? Or would you want to limit such an improvement to only HVM/PVH/HVMlite guests? Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] netfront/netback multiqueue exhausting grants
On 01/22/2016 03:53 PM, Jan Beulich wrote: On 22.01.16 at 04:36,wrote: >> By the way, do you think it's possible to make grant table support bigger >> page e.g 64K? >> One grant-ref per 64KB instead of 4KB, this should able to reduce the grant >> entry consumption significantly. > > How would that work with an underlying page size of 4k, and pages > potentially being non-contiguous in machine address space? Besides > that the grant table hypercall interface isn't prepared to support > 64k page size, due to its use of uint16_t for the length of copy ops. > Right, and I mean whether we should consider address all the place as your mentioned. With multi-queue xen-block and xen-network, we got more reports that the grants were exhausted. -- Regards, -Bob ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] netfront/netback multiqueue exhausting grants
On 01/22/2016 07:02 PM, Jan Beulich wrote: On 22.01.16 at 11:40,wrote: >> On 01/22/2016 03:53 PM, Jan Beulich wrote: >> On 22.01.16 at 04:36, wrote: By the way, do you think it's possible to make grant table support bigger page e.g 64K? One grant-ref per 64KB instead of 4KB, this should able to reduce the grant entry consumption significantly. >>> >>> How would that work with an underlying page size of 4k, and pages >>> potentially being non-contiguous in machine address space? Besides >>> that the grant table hypercall interface isn't prepared to support >>> 64k page size, due to its use of uint16_t for the length of copy ops. >> >> Right, and I mean whether we should consider address all the place as your >> mentioned. > > Just from an abstract perspective: How would you envision to avoid > machine address discontiguity? Or would you want to limit such an E.g Reserve a page pool with continuous 64KB pages, or make grant-map support huge page(2MB)? To be honest, I haven't think much about the detail. Do you think that's unlikely to implement? If yes, we have to limit the queue numbers, VM numbers and vdisk/vif numbers in a proper way to make sure the guests won't enter grant-exhausted state. > improvement to only HVM/PVH/HVMlite guests? > > Jan > -- Regards, -Bob ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] netfront/netback multiqueue exhausting grants
On Thu, Jan 21, 2016 at 10:37:51AM +, Ian Campbell wrote: > On Thu, 2016-01-21 at 10:25 +, Wei Liu wrote: > > On Thu, Jan 21, 2016 at 10:12:27AM +, Ian Campbell wrote: > > [...] > > > > I've asked the reporter to send logs for the 4.4 case to xen-devel. > > > > > > User confirmed[0] that 4.4 is actually OK. > > > > > > Did someone request stable backports yet, or shall I do so? > > > > > > > I vaguely remember we requested backport for relevant patches long time > > ago, but I admit I have lost track. So it wouldn't hurt if you do it > > again. > > So I think we'd be looking for: > > 32a8440 xen-netfront: respect user provided max_queues > 4c82ac3 xen-netback: respect user provided max_queues > ca88ea1 xen-netfront: update num_queues to real created > > which certainly resolves things such that the workarounds work, and I think > will also fix the default case such that it works with up to 32 vcpus > (although it will consume all the grants and only get 31/32 queues). > > Does that sound correct? > Yes, it does. > As Annie said, we may still want to consider what a sensible default max > queues would be. > Maybe we should set a cap to 8 or 16 by default. Wei. > Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] netfront/netback multiqueue exhausting grants
On Thu, 2016-01-21 at 10:56 +, David Vrabel wrote: > On 20/01/16 12:23, Ian Campbell wrote: > > There have been a few reports recently[0] which relate to a failure of > > netfront to allocate sufficient grant refs for all the queues: > > > > [0.533589] xen_netfront: can't alloc rx grant refs > > [0.533612] net eth0: only created 31 queues > > > > Which can be worked around by increasing the number of grants on the > > hypervisor command line or by limiting the number of queues permitted > > by > > either back or front using a module param (which was broken but is now > > fixed on both sides, but I'm not sure it has been backported everywhere > > such that it is a reliable thing to always tell users as a workaround). > > > > Is there any plan to do anything about the default/out of the box > > experience? Either limiting the number of queues or making both ends > > cope > > more gracefully with failure to create some queues (or both) might be > > sufficient? > > > > I think the crash after the above in the first link at [0] is fixed? I > > think that was the purpose of ca88ea1247df "xen-netfront: update > > num_queues > > to real created" which was in 4.3. > > I think the correct solution is to increase the default maximum grant > table size. That could well make sense, but then there will just be another higher limit, so we should perhaps do both. i.e. factoring in: * performance i.e. ability for N queues to saturate whatever sort of link contemporary Linux can saturate these days, plus some headroom, or whatever other ceiling seems sensible) * grant table resource consumption i.e. (sensible max number of blks * nr gnts per blk + sensible max number of vifs * nr gnts per vif + other devs needs) < per guest grant limit) to pick both the default gnttab size and the default max queuers. (or s/sensible/supportable/g etc). > Although, unless you're using the not-yet-applied per-cpu rwlock patches > multiqueue is terrible on many (multisocket) systems and the number of > queue should be limited in netback to 4 or even just 2. Presumably the guest can't tell, so it can't do this. I think when you say "terrible" you don't mean "worse than without mq" but rather "not realising the expected gains from a larger nunber of queues", right?. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] netfront/netback multiqueue exhausting grants
On 20/01/16 12:23, Ian Campbell wrote: > There have been a few reports recently[0] which relate to a failure of > netfront to allocate sufficient grant refs for all the queues: > > [0.533589] xen_netfront: can't alloc rx grant refs > [0.533612] net eth0: only created 31 queues > > Which can be worked around by increasing the number of grants on the > hypervisor command line or by limiting the number of queues permitted by > either back or front using a module param (which was broken but is now > fixed on both sides, but I'm not sure it has been backported everywhere > such that it is a reliable thing to always tell users as a workaround). > > Is there any plan to do anything about the default/out of the box > experience? Either limiting the number of queues or making both ends cope > more gracefully with failure to create some queues (or both) might be > sufficient? > > I think the crash after the above in the first link at [0] is fixed? I > think that was the purpose of ca88ea1247df "xen-netfront: update num_queues > to real created" which was in 4.3. I think the correct solution is to increase the default maximum grant table size. Although, unless you're using the not-yet-applied per-cpu rwlock patches multiqueue is terrible on many (multisocket) systems and the number of queue should be limited in netback to 4 or even just 2. David ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] netfront/netback multiqueue exhausting grants
On Thu, 2016-01-21 at 10:25 +, Wei Liu wrote: > On Thu, Jan 21, 2016 at 10:12:27AM +, Ian Campbell wrote: > [...] > > > I've asked the reporter to send logs for the 4.4 case to xen-devel. > > > > User confirmed[0] that 4.4 is actually OK. > > > > Did someone request stable backports yet, or shall I do so? > > > > I vaguely remember we requested backport for relevant patches long time > ago, but I admit I have lost track. So it wouldn't hurt if you do it > again. So I think we'd be looking for: 32a8440 xen-netfront: respect user provided max_queues 4c82ac3 xen-netback: respect user provided max_queues ca88ea1 xen-netfront: update num_queues to real created which certainly resolves things such that the workarounds work, and I think will also fix the default case such that it works with up to 32 vcpus (although it will consume all the grants and only get 31/32 queues). Does that sound correct? As Annie said, we may still want to consider what a sensible default max queues would be. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] netfront/netback multiqueue exhausting grants
On Thu, Jan 21, 2016 at 10:12:27AM +, Ian Campbell wrote: [...] > > I've asked the reporter to send logs for the 4.4 case to xen-devel. > > User confirmed[0] that 4.4 is actually OK. > > Did someone request stable backports yet, or shall I do so? > I vaguely remember we requested backport for relevant patches long time ago, but I admit I have lost track. So it wouldn't hurt if you do it again. Wei. > Ian. > > [0] http://lists.xen.org/archives/html/xen-users/2016-01/msg00110.html ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] netfront/netback multiqueue exhausting grants
On Wed, 2016-01-20 at 15:16 +, Ian Campbell wrote: > On Wed, 2016-01-20 at 10:10 -0500, Boris Ostrovsky wrote: > > On 01/20/2016 10:02 AM, David Vrabel wrote: > > > On 20/01/16 14:52, Ian Campbell wrote: > > > > On Wed, 2016-01-20 at 09:40 -0500, Boris Ostrovsky wrote: > > > > > On 01/20/2016 07:23 AM, Ian Campbell wrote: > > > > > > There have been a few reports recently[0] which relate to a > > > > > > failure of > > > > > > netfront to allocate sufficient grant refs for all the queues: > > > > > > > > > > > > [0.533589] xen_netfront: can't alloc rx grant refs > > > > > > [0.533612] net eth0: only created 31 queues > > > > > > > > > > > > Which can be worked around by increasing the number of grants > > > > > > on > > > > > > the > > > > > > hypervisor command line or by limiting the number of queues > > > > > > permitted > > > > > > by > > > > > > either back or front using a module param (which was broken but > > > > > > is now > > > > > > fixed on both sides, but I'm not sure it has been backported > > > > > > everywhere > > > > > > such that it is a reliable thing to always tell users as a > > > > > > workaround). > > > > > > > > > > > > Is there any plan to do anything about the default/out of the > > > > > > box > > > > > > experience? Either limiting the number of queues or making both > > > > > > ends > > > > > > cope > > > > > > more gracefully with failure to create some queues (or both) > > > > > > might be > > > > > > sufficient? > > > > > > > > > > > > I think the crash after the above in the first link at [0] is > > > > > > fixed? I > > > > > > think that was the purpose of ca88ea1247df "xen-netfront: > > > > > > update > > > > > > num_queues > > > > > > to real created" which was in 4.3. > > > > > I think ca88ea1247df is the solution --- it will limit the number > > > > > of > > > > > queues. > > > > That's in 4.4, which the first link at [0] claimed to have tested. > > > > I > > > > can > > > > see this fixing the crash, but does it really fix the "actually > > > > works > > > > with > > > > less queues than it tried to get" issue? > > > > That's what I thought it does too. I didn't notice that 4.4 was tested > > as well, so maybe not. > > I've asked the reporter to send logs for the 4.4 case to xen-devel. User confirmed[0] that 4.4 is actually OK. Did someone request stable backports yet, or shall I do so? Ian. [0] http://lists.xen.org/archives/html/xen-users/2016-01/msg00110.html ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] netfront/netback multiqueue exhausting grants
On 21/01/16 12:19, Ian Campbell wrote: > On Thu, 2016-01-21 at 10:56 +, David Vrabel wrote: >> On 20/01/16 12:23, Ian Campbell wrote: >>> There have been a few reports recently[0] which relate to a failure of >>> netfront to allocate sufficient grant refs for all the queues: >>> >>> [0.533589] xen_netfront: can't alloc rx grant refs >>> [0.533612] net eth0: only created 31 queues >>> >>> Which can be worked around by increasing the number of grants on the >>> hypervisor command line or by limiting the number of queues permitted >>> by >>> either back or front using a module param (which was broken but is now >>> fixed on both sides, but I'm not sure it has been backported everywhere >>> such that it is a reliable thing to always tell users as a workaround). >>> >>> Is there any plan to do anything about the default/out of the box >>> experience? Either limiting the number of queues or making both ends >>> cope >>> more gracefully with failure to create some queues (or both) might be >>> sufficient? >>> >>> I think the crash after the above in the first link at [0] is fixed? I >>> think that was the purpose of ca88ea1247df "xen-netfront: update >>> num_queues >>> to real created" which was in 4.3. >> >> I think the correct solution is to increase the default maximum grant >> table size. > > That could well make sense, but then there will just be another higher > limit, so we should perhaps do both. > > i.e. factoring in: > * performance i.e. ability for N queues to saturate whatever sort of link >contemporary Linux can saturate these days, plus some headroom, or >whatever other ceiling seems sensible) > * grant table resource consumption i.e. (sensible max number of blks * nr >gnts per blk + sensible max number of vifs * nr gnts per vif + other >devs needs) < per guest grant limit) to pick both the default gnttab >size and the default max queuers. Yes. >> Although, unless you're using the not-yet-applied per-cpu rwlock patches >> multiqueue is terrible on many (multisocket) systems and the number of >> queue should be limited in netback to 4 or even just 2. > > Presumably the guest can't tell, so it can't do this. > > I think when you say "terrible" you don't mean "worse than without mq" but > rather "not realising the expected gains from a larger nunber of queues", > right?. Malcolm did the analysis but if I remember correctly, 8 queues performed about the same as 1 queue and 16 were worse than 1 queue. David ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] netfront/netback multiqueue exhausting grants
On 2016/1/21 9:17, David Vrabel wrote: On 21/01/16 12:19, Ian Campbell wrote: On Thu, 2016-01-21 at 10:56 +, David Vrabel wrote: On 20/01/16 12:23, Ian Campbell wrote: There have been a few reports recently[0] which relate to a failure of netfront to allocate sufficient grant refs for all the queues: [0.533589] xen_netfront: can't alloc rx grant refs [0.533612] net eth0: only created 31 queues Which can be worked around by increasing the number of grants on the hypervisor command line or by limiting the number of queues permitted by either back or front using a module param (which was broken but is now fixed on both sides, but I'm not sure it has been backported everywhere such that it is a reliable thing to always tell users as a workaround). Is there any plan to do anything about the default/out of the box experience? Either limiting the number of queues or making both ends cope more gracefully with failure to create some queues (or both) might be sufficient? I think the crash after the above in the first link at [0] is fixed? I think that was the purpose of ca88ea1247df "xen-netfront: update num_queues to real created" which was in 4.3. I think the correct solution is to increase the default maximum grant table size. That could well make sense, but then there will just be another higher limit, so we should perhaps do both. i.e. factoring in: * performance i.e. ability for N queues to saturate whatever sort of link contemporary Linux can saturate these days, plus some headroom, or whatever other ceiling seems sensible) * grant table resource consumption i.e. (sensible max number of blks * nr gnts per blk + sensible max number of vifs * nr gnts per vif + other devs needs) < per guest grant limit) to pick both the default gnttab size and the default max queuers. Yes. Would it waste lots of resources in the case where guest vif has lots of queue but no network load? Here is an example of gntref consumed by vif, Dom0 20vcpu, domu 20vcpu, one vif would consumes 20*256*2=10240 gntref. If setting the maximum grant table size to 64pages(default value of xen is 32pages now?), then only 3 vif is supported in guest. Even blk isn't taken account in, and also blk multi-page ring feature. Thanks Annie Although, unless you're using the not-yet-applied per-cpu rwlock patches multiqueue is terrible on many (multisocket) systems and the number of queue should be limited in netback to 4 or even just 2. Presumably the guest can't tell, so it can't do this. I think when you say "terrible" you don't mean "worse than without mq" but rather "not realising the expected gains from a larger nunber of queues", right?. Malcolm did the analysis but if I remember correctly, 8 queues performed about the same as 1 queue and 16 were worse than 1 queue. David ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] netfront/netback multiqueue exhausting grants
On 01/21/2016 08:19 PM, Ian Campbell wrote: > On Thu, 2016-01-21 at 10:56 +, David Vrabel wrote: >> On 20/01/16 12:23, Ian Campbell wrote: >>> There have been a few reports recently[0] which relate to a failure of >>> netfront to allocate sufficient grant refs for all the queues: >>> >>> [0.533589] xen_netfront: can't alloc rx grant refs >>> [0.533612] net eth0: only created 31 queues >>> >>> Which can be worked around by increasing the number of grants on the >>> hypervisor command line or by limiting the number of queues permitted >>> by >>> either back or front using a module param (which was broken but is now >>> fixed on both sides, but I'm not sure it has been backported everywhere >>> such that it is a reliable thing to always tell users as a workaround). >>> >>> Is there any plan to do anything about the default/out of the box >>> experience? Either limiting the number of queues or making both ends >>> cope >>> more gracefully with failure to create some queues (or both) might be >>> sufficient? >>> >>> I think the crash after the above in the first link at [0] is fixed? I >>> think that was the purpose of ca88ea1247df "xen-netfront: update >>> num_queues >>> to real created" which was in 4.3. >> >> I think the correct solution is to increase the default maximum grant >> table size. > > That could well make sense, but then there will just be another higher > limit, so we should perhaps do both. > > i.e. factoring in: > * performance i.e. ability for N queues to saturate whatever sort of link >contemporary Linux can saturate these days, plus some headroom, or >whatever other ceiling seems sensible) > * grant table resource consumption i.e. (sensible max number of blks * nr >gnts per blk + sensible max number of vifs * nr gnts per vif + other >devs needs) < per guest grant limit) to pick both the default gnttab >size and the default max queuers. > Agree. By the way, do you think it's possible to make grant table support bigger page e.g 64K? One grant-ref per 64KB instead of 4KB, this should able to reduce the grant entry consumption significantly. Bob > (or s/sensible/supportable/g etc). > >> Although, unless you're using the not-yet-applied per-cpu rwlock patches >> multiqueue is terrible on many (multisocket) systems and the number of >> queue should be limited in netback to 4 or even just 2. > > Presumably the guest can't tell, so it can't do this. > > I think when you say "terrible" you don't mean "worse than without mq" but > rather "not realising the expected gains from a larger nunber of queues", > right?. > > Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] netfront/netback multiqueue exhausting grants
>>> On 22.01.16 at 04:36,wrote: > By the way, do you think it's possible to make grant table support bigger > page e.g 64K? > One grant-ref per 64KB instead of 4KB, this should able to reduce the grant > entry consumption significantly. How would that work with an underlying page size of 4k, and pages potentially being non-contiguous in machine address space? Besides that the grant table hypercall interface isn't prepared to support 64k page size, due to its use of uint16_t for the length of copy ops. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] netfront/netback multiqueue exhausting grants
On Wed, 2016-01-20 at 09:40 -0500, Boris Ostrovsky wrote: > On 01/20/2016 07:23 AM, Ian Campbell wrote: > > There have been a few reports recently[0] which relate to a failure of > > netfront to allocate sufficient grant refs for all the queues: > > > > [0.533589] xen_netfront: can't alloc rx grant refs > > [0.533612] net eth0: only created 31 queues > > > > Which can be worked around by increasing the number of grants on the > > hypervisor command line or by limiting the number of queues permitted > > by > > either back or front using a module param (which was broken but is now > > fixed on both sides, but I'm not sure it has been backported everywhere > > such that it is a reliable thing to always tell users as a workaround). > > > > Is there any plan to do anything about the default/out of the box > > experience? Either limiting the number of queues or making both ends > > cope > > more gracefully with failure to create some queues (or both) might be > > sufficient? > > > > I think the crash after the above in the first link at [0] is fixed? I > > think that was the purpose of ca88ea1247df "xen-netfront: update > > num_queues > > to real created" which was in 4.3. > > I think ca88ea1247df is the solution --- it will limit the number of > queues. That's in 4.4, which the first link at [0] claimed to have tested. I can see this fixing the crash, but does it really fix the "actually works with less queues than it tried to get" issue? In any case having exhausted the grant entries creating queues there aren't any left to shuffle actual data around, is there? (or are those preallocated too?) Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] netfront/netback multiqueue exhausting grants
On 20/01/16 14:52, Ian Campbell wrote: > On Wed, 2016-01-20 at 09:40 -0500, Boris Ostrovsky wrote: >> On 01/20/2016 07:23 AM, Ian Campbell wrote: >>> There have been a few reports recently[0] which relate to a failure of >>> netfront to allocate sufficient grant refs for all the queues: >>> >>> [0.533589] xen_netfront: can't alloc rx grant refs >>> [0.533612] net eth0: only created 31 queues >>> >>> Which can be worked around by increasing the number of grants on the >>> hypervisor command line or by limiting the number of queues permitted >>> by >>> either back or front using a module param (which was broken but is now >>> fixed on both sides, but I'm not sure it has been backported everywhere >>> such that it is a reliable thing to always tell users as a workaround). >>> >>> Is there any plan to do anything about the default/out of the box >>> experience? Either limiting the number of queues or making both ends >>> cope >>> more gracefully with failure to create some queues (or both) might be >>> sufficient? >>> >>> I think the crash after the above in the first link at [0] is fixed? I >>> think that was the purpose of ca88ea1247df "xen-netfront: update >>> num_queues >>> to real created" which was in 4.3. >> >> I think ca88ea1247df is the solution --- it will limit the number of >> queues. > > That's in 4.4, which the first link at [0] claimed to have tested. I can > see this fixing the crash, but does it really fix the "actually works with > less queues than it tried to get" issue? > > In any case having exhausted the grant entries creating queues there aren't > any left to shuffle actual data around, is there? (or are those > preallocated too?) All grants refs for Tx and Rx are preallocated (this is the allocation that is failing above). David ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] netfront/netback multiqueue exhausting grants
On 2016/1/20 7:23, Ian Campbell wrote: There have been a few reports recently[0] which relate to a failure of netfront to allocate sufficient grant refs for all the queues: [0.533589] xen_netfront: can't alloc rx grant refs [0.533612] net eth0: only created 31 queues Which can be worked around by increasing the number of grants on the hypervisor command line or by limiting the number of queues permitted by either back or front using a module param (which was broken but is now fixed on both sides, but I'm not sure it has been backported everywhere such that it is a reliable thing to always tell users as a workaround). Following are the patches to fix module param, they exist since v4.3. xen-netfront: respect user provided max_queues https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=32a844056fd43dda647e1c3c6b9983bdfa04d17d xen-netback: respect user provided max_queues https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=4c82ac3c37363e8c4ded6a5fe1ec5fa756b34df3 Is there any plan to do anything about the default/out of the box experience? Either limiting the number of queues or making both ends cope more gracefully with failure to create some queues (or both) might be sufficient? We run into similar issue recently, and guess it is better to suggest user to set netback module parameter with the default value as 8? see this link, http://wiki.xenproject.org/wiki/Xen-netback_and_xen-netfront_multi-queue_performance_testing Probably more test are needed to get the default number of best experience. I think the crash after the above in the first link at [0] is fixed? I think that was the purpose of ca88ea1247df "xen-netfront: update num_queues to real created" which was in 4.3. Correct. Thanks Annie Ian. [0] http://lists.xen.org/archives/html/xen-users/2016-01/msg00100.html http://lists.xen.org/archives/html/xen-users/2016-01/msg00072.html some before hte xmas break too IIRC ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] netfront/netback multiqueue exhausting grants
On 01/20/2016 10:02 AM, David Vrabel wrote: On 20/01/16 14:52, Ian Campbell wrote: On Wed, 2016-01-20 at 09:40 -0500, Boris Ostrovsky wrote: On 01/20/2016 07:23 AM, Ian Campbell wrote: There have been a few reports recently[0] which relate to a failure of netfront to allocate sufficient grant refs for all the queues: [0.533589] xen_netfront: can't alloc rx grant refs [0.533612] net eth0: only created 31 queues Which can be worked around by increasing the number of grants on the hypervisor command line or by limiting the number of queues permitted by either back or front using a module param (which was broken but is now fixed on both sides, but I'm not sure it has been backported everywhere such that it is a reliable thing to always tell users as a workaround). Is there any plan to do anything about the default/out of the box experience? Either limiting the number of queues or making both ends cope more gracefully with failure to create some queues (or both) might be sufficient? I think the crash after the above in the first link at [0] is fixed? I think that was the purpose of ca88ea1247df "xen-netfront: update num_queues to real created" which was in 4.3. I think ca88ea1247df is the solution --- it will limit the number of queues. That's in 4.4, which the first link at [0] claimed to have tested. I can see this fixing the crash, but does it really fix the "actually works with less queues than it tried to get" issue? That's what I thought it does too. I didn't notice that 4.4 was tested as well, so maybe not. -boris In any case having exhausted the grant entries creating queues there aren't any left to shuffle actual data around, is there? (or are those preallocated too?) All grants refs for Tx and Rx are preallocated (this is the allocation that is failing above). David ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] netfront/netback multiqueue exhausting grants
On Wed, 2016-01-20 at 10:10 -0500, Boris Ostrovsky wrote: > On 01/20/2016 10:02 AM, David Vrabel wrote: > > On 20/01/16 14:52, Ian Campbell wrote: > > > On Wed, 2016-01-20 at 09:40 -0500, Boris Ostrovsky wrote: > > > > On 01/20/2016 07:23 AM, Ian Campbell wrote: > > > > > There have been a few reports recently[0] which relate to a > > > > > failure of > > > > > netfront to allocate sufficient grant refs for all the queues: > > > > > > > > > > [0.533589] xen_netfront: can't alloc rx grant refs > > > > > [0.533612] net eth0: only created 31 queues > > > > > > > > > > Which can be worked around by increasing the number of grants on > > > > > the > > > > > hypervisor command line or by limiting the number of queues > > > > > permitted > > > > > by > > > > > either back or front using a module param (which was broken but > > > > > is now > > > > > fixed on both sides, but I'm not sure it has been backported > > > > > everywhere > > > > > such that it is a reliable thing to always tell users as a > > > > > workaround). > > > > > > > > > > Is there any plan to do anything about the default/out of the box > > > > > experience? Either limiting the number of queues or making both > > > > > ends > > > > > cope > > > > > more gracefully with failure to create some queues (or both) > > > > > might be > > > > > sufficient? > > > > > > > > > > I think the crash after the above in the first link at [0] is > > > > > fixed? I > > > > > think that was the purpose of ca88ea1247df "xen-netfront: update > > > > > num_queues > > > > > to real created" which was in 4.3. > > > > I think ca88ea1247df is the solution --- it will limit the number > > > > of > > > > queues. > > > That's in 4.4, which the first link at [0] claimed to have tested. I > > > can > > > see this fixing the crash, but does it really fix the "actually works > > > with > > > less queues than it tried to get" issue? > > That's what I thought it does too. I didn't notice that 4.4 was tested > as well, so maybe not. I've asked the reporter to send logs for the 4.4 case to xen-devel. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] netfront/netback multiqueue exhausting grants
There have been a few reports recently[0] which relate to a failure of netfront to allocate sufficient grant refs for all the queues: [0.533589] xen_netfront: can't alloc rx grant refs [0.533612] net eth0: only created 31 queues Which can be worked around by increasing the number of grants on the hypervisor command line or by limiting the number of queues permitted by either back or front using a module param (which was broken but is now fixed on both sides, but I'm not sure it has been backported everywhere such that it is a reliable thing to always tell users as a workaround). Is there any plan to do anything about the default/out of the box experience? Either limiting the number of queues or making both ends cope more gracefully with failure to create some queues (or both) might be sufficient? I think the crash after the above in the first link at [0] is fixed? I think that was the purpose of ca88ea1247df "xen-netfront: update num_queues to real created" which was in 4.3. Ian. [0] http://lists.xen.org/archives/html/xen-users/2016-01/msg00100.html http://lists.xen.org/archives/html/xen-users/2016-01/msg00072.html some before hte xmas break too IIRC ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] netfront/netback multiqueue exhausting grants
On 01/20/2016 07:23 AM, Ian Campbell wrote: There have been a few reports recently[0] which relate to a failure of netfront to allocate sufficient grant refs for all the queues: [0.533589] xen_netfront: can't alloc rx grant refs [0.533612] net eth0: only created 31 queues Which can be worked around by increasing the number of grants on the hypervisor command line or by limiting the number of queues permitted by either back or front using a module param (which was broken but is now fixed on both sides, but I'm not sure it has been backported everywhere such that it is a reliable thing to always tell users as a workaround). Is there any plan to do anything about the default/out of the box experience? Either limiting the number of queues or making both ends cope more gracefully with failure to create some queues (or both) might be sufficient? I think the crash after the above in the first link at [0] is fixed? I think that was the purpose of ca88ea1247df "xen-netfront: update num_queues to real created" which was in 4.3. I think ca88ea1247df is the solution --- it will limit the number of queues. And apparently it's not in stable trees. At least not in 4.1.15, which is what the first reported is running: https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/log/drivers/net/xen-netfront.c?id=refs/tags/v4.1.15 -boris Ian. [0] http://lists.xen.org/archives/html/xen-users/2016-01/msg00100.html http://lists.xen.org/archives/html/xen-users/2016-01/msg00072.html some before hte xmas break too IIRC ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel