Re: [Xen-devel] [PATCH 1/1] xen-blkback: stop blkback thread of every queue in xen_blkif_disconnect
On 8/23/2017 5:20 PM, Konrad Rzeszutek Wilk wrote: ..snip.. Acked-by: Roger Pau MonnéForgot to add, this needs to be backported to stable branches, so: Annie, could you resend the patch with the tags and an update to the description to me please? Done Thanks Annie Cc: sta...@vger.kernel.org Roger. ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH v2 1/1] xen-blkback: stop blkback thread of every queue in xen_blkif_disconnect
In xen_blkif_disconnect, before checking inflight I/O, following code stops the blkback thread, if (ring->xenblkd) { kthread_stop(ring->xenblkd); wake_up(>shutdown_wq); } If there is inflight I/O in any non-last queue, blkback returns -EBUSY directly, and above code would not be called to stop thread of remaining queue and processs them. When removing vbd device with lots of disk I/O load, some queues with inflight I/O still have blkback thread running even though the corresponding vbd device or guest is gone. And this could cause some problems, for example, if the backend device type is file, some loop devices and blkback thread always lingers there forever after guest is destroyed, and this causes failure of umounting repositories unless rebooting the dom0. This patch allows thread of every queue has the chance to get stopped. Otherwise, only thread of queue previous to(including) first busy one get stopped, blkthread of remaining queue will still run. So stop all threads properly and return -EBUSY if any queue has inflight I/O. Signed-off-by: Annie Li <annie...@oracle.com> Reviewed-by: Herbert van den Bergh <herbert.van.den.be...@oracle.com> Reviewed-by: Bhavesh Davda <bhavesh.da...@oracle.com> Reviewed-by: Adnan Misherfi <adnan.mishe...@oracle.com> Acked-by: Roger Pau Monné <roger@citrix.com> --- v2: enhance patch description and add Acked-by --- drivers/block/xen-blkback/xenbus.c | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c index 792da68..2adb859 100644 --- a/drivers/block/xen-blkback/xenbus.c +++ b/drivers/block/xen-blkback/xenbus.c @@ -244,6 +244,7 @@ static int xen_blkif_disconnect(struct xen_blkif *blkif) { struct pending_req *req, *n; unsigned int j, r; + bool busy = false; for (r = 0; r < blkif->nr_rings; r++) { struct xen_blkif_ring *ring = >rings[r]; @@ -261,8 +262,10 @@ static int xen_blkif_disconnect(struct xen_blkif *blkif) * don't have any discard_io or other_io requests. So, checking * for inflight IO is enough. */ - if (atomic_read(>inflight) > 0) - return -EBUSY; + if (atomic_read(>inflight) > 0) { + busy = true; + continue; + } if (ring->irq) { unbind_from_irqhandler(ring->irq, ring); @@ -300,6 +303,9 @@ static int xen_blkif_disconnect(struct xen_blkif *blkif) WARN_ON(i != (XEN_BLKIF_REQS_PER_PAGE * blkif->nr_ring_pages)); ring->active = false; } + if (busy) + return -EBUSY; + blkif->nr_ring_pages = 0; /* * blkif->rings was allocated in connect_ring, so we should free it in -- 1.9.3 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 1/1] xen-blkback: stop blkback thread of every queue in xen_blkif_disconnect
On 8/18/2017 5:14 AM, Roger Pau Monné wrote: On Thu, Aug 17, 2017 at 06:43:46PM -0400, Annie Li wrote: If there is inflight I/O in any non-last queue, blkback returns -EBUSY directly, and never stops thread of remaining queue and processs them. When removing vbd device with lots of disk I/O load, some queues with inflight I/O still have blkback thread running even though the corresponding vbd device or guest is gone. And this could cause some problems, for example, if the backend device type is file, some loop devices and blkback thread always lingers there forever after guest is destroyed, and this causes failure of umounting repositories unless rebooting the dom0. So stop all threads properly and return -EBUSY if any queue has inflight I/O. Signed-off-by: Annie Li <annie...@oracle.com> Reviewed-by: Herbert van den Bergh <herbert.van.den.be...@oracle.com> Reviewed-by: Bhavesh Davda <bhavesh.da...@oracle.com> Reviewed-by: Adnan Misherfi <adnan.mishe...@oracle.com> --- drivers/block/xen-blkback/xenbus.c | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c index 792da68..2adb859 100644 --- a/drivers/block/xen-blkback/xenbus.c +++ b/drivers/block/xen-blkback/xenbus.c @@ -244,6 +244,7 @@ static int xen_blkif_disconnect(struct xen_blkif *blkif) { struct pending_req *req, *n; unsigned int j, r; + bool busy = false; for (r = 0; r < blkif->nr_rings; r++) { struct xen_blkif_ring *ring = >rings[r]; @@ -261,8 +262,10 @@ static int xen_blkif_disconnect(struct xen_blkif *blkif) * don't have any discard_io or other_io requests. So, checking * for inflight IO is enough. */ - if (atomic_read(>inflight) > 0) - return -EBUSY; + if (atomic_read(>inflight) > 0) { + busy = true; + continue; + } I guess I'm missing something, but I don't see how this is solving the problem described in the description. If the problem is that xen_blkif_disconnect returns without cleaning all the queues, this patch keeps the current behavior, just that it will try to remove more queues before returning, as opposed to returning when finding the first busy queue. Before checking inflight, following code stops the blkback thread, if (ring->xenblkd) { kthread_stop(ring->xenblkd); wake_up(>shutdown_wq); } This patch allows thread of every queue has the chance to get stopped. Otherwise, only thread of queue before(including) first busy one get stopped, threads of remaining queue will still run, and these blkthread and corresponding loop device will linger forever even after guest is destroyed. Thanks Annie ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH 1/1] xen-blkback: stop blkback thread of every queue in xen_blkif_disconnect
If there is inflight I/O in any non-last queue, blkback returns -EBUSY directly, and never stops thread of remaining queue and processs them. When removing vbd device with lots of disk I/O load, some queues with inflight I/O still have blkback thread running even though the corresponding vbd device or guest is gone. And this could cause some problems, for example, if the backend device type is file, some loop devices and blkback thread always lingers there forever after guest is destroyed, and this causes failure of umounting repositories unless rebooting the dom0. So stop all threads properly and return -EBUSY if any queue has inflight I/O. Signed-off-by: Annie Li <annie...@oracle.com> Reviewed-by: Herbert van den Bergh <herbert.van.den.be...@oracle.com> Reviewed-by: Bhavesh Davda <bhavesh.da...@oracle.com> Reviewed-by: Adnan Misherfi <adnan.mishe...@oracle.com> --- drivers/block/xen-blkback/xenbus.c | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c index 792da68..2adb859 100644 --- a/drivers/block/xen-blkback/xenbus.c +++ b/drivers/block/xen-blkback/xenbus.c @@ -244,6 +244,7 @@ static int xen_blkif_disconnect(struct xen_blkif *blkif) { struct pending_req *req, *n; unsigned int j, r; + bool busy = false; for (r = 0; r < blkif->nr_rings; r++) { struct xen_blkif_ring *ring = >rings[r]; @@ -261,8 +262,10 @@ static int xen_blkif_disconnect(struct xen_blkif *blkif) * don't have any discard_io or other_io requests. So, checking * for inflight IO is enough. */ - if (atomic_read(>inflight) > 0) - return -EBUSY; + if (atomic_read(>inflight) > 0) { + busy = true; + continue; + } if (ring->irq) { unbind_from_irqhandler(ring->irq, ring); @@ -300,6 +303,9 @@ static int xen_blkif_disconnect(struct xen_blkif *blkif) WARN_ON(i != (XEN_BLKIF_REQS_PER_PAGE * blkif->nr_ring_pages)); ring->active = false; } + if (busy) + return -EBUSY; + blkif->nr_ring_pages = 0; /* * blkif->rings was allocated in connect_ring, so we should free it in -- 1.9.3 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 04/10] acpi: Power and Sleep ACPI buttons are not emulated
On 11/7/2016 10:38 AM, Konrad Rzeszutek Wilk wrote: On Sun, Nov 06, 2016 at 04:42:37PM -0500, Boris Ostrovsky wrote: .. for PVH guests. However, since emulating them for HVM guests also doesn't seem useful we can have FADT disable those buttons for both types of guests. Wait, we need S3 suspend for HVM guests (think Windows without PV drivers)! And I this is how we "inject" the event to the guest. CC-ing Annie and Adnan who I believe had to do something like this in old version of OVS, maybe they can recall more details. Our winpv supports S4(hibernation) which will be tested by WHQL, and S3(Sleep) is optional(no real process for it yet). Thanks Annie Signed-off-by: Boris Ostrovsky--- tools/libacpi/static_tables.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/tools/libacpi/static_tables.c b/tools/libacpi/static_tables.c index 413abcc..ebe8ffe 100644 --- a/tools/libacpi/static_tables.c +++ b/tools/libacpi/static_tables.c @@ -61,7 +61,8 @@ struct acpi_20_fadt Fadt = { .flags = (ACPI_PROC_C1 | ACPI_WBINVD | ACPI_FIX_RTC | ACPI_TMR_VAL_EXT | - ACPI_USE_PLATFORM_CLOCK), + ACPI_USE_PLATFORM_CLOCK | + ACPI_PWR_BUTTON | ACPI_SLP_BUTTON), .reset_reg = { .address_space_id= ACPI_SYSTEM_IO, -- 2.7.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] xl migrate regression in staging
On 6/2/2016 11:21 AM, Wei Liu wrote: On Thu, Jun 02, 2016 at 11:18:36AM -0400, annie li wrote: On 4/14/2016 11:36 AM, Olaf Hering wrote: On Thu, Apr 14, Wei Liu wrote: Maybe go back to 96ae556569b8eaedc0bb242932842c3277b515d8 and try again? Then 5cf46a66883ad7a56c5bdee97696373473f80974 and try? So that I can know if it is related to COLO series. No, don't try to bisect that because it's broken in the middle. I think I took enough snapshots for my rpm packages to get close to the above commits. It will take some time to cycle through them. Also as Andrew suggested if you can reproduce it with a smaller domain or share the guest image with us, that would be helpful. I have already sent him the link. Any update, guys? I hit similar problem recently too. Yes, this has been fixed in master branch. See de28b189bd329dd20a245a318be8feea2c4cc60a. Nice! tested with newer version with this patch, it works. Thanks Annie Wei. Thanks Annie ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] xl migrate regression in staging
On 4/14/2016 11:36 AM, Olaf Hering wrote: On Thu, Apr 14, Wei Liu wrote: Maybe go back to 96ae556569b8eaedc0bb242932842c3277b515d8 and try again? Then 5cf46a66883ad7a56c5bdee97696373473f80974 and try? So that I can know if it is related to COLO series. No, don't try to bisect that because it's broken in the middle. I think I took enough snapshots for my rpm packages to get close to the above commits. It will take some time to cycle through them. Also as Andrew suggested if you can reproduce it with a smaller domain or share the guest image with us, that would be helpful. I have already sent him the link. Any update, guys? I hit similar problem recently too. Thanks Annie ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] netfront/netback multiqueue exhausting grants
On 2016/1/21 9:17, David Vrabel wrote: On 21/01/16 12:19, Ian Campbell wrote: On Thu, 2016-01-21 at 10:56 +, David Vrabel wrote: On 20/01/16 12:23, Ian Campbell wrote: There have been a few reports recently[0] which relate to a failure of netfront to allocate sufficient grant refs for all the queues: [0.533589] xen_netfront: can't alloc rx grant refs [0.533612] net eth0: only created 31 queues Which can be worked around by increasing the number of grants on the hypervisor command line or by limiting the number of queues permitted by either back or front using a module param (which was broken but is now fixed on both sides, but I'm not sure it has been backported everywhere such that it is a reliable thing to always tell users as a workaround). Is there any plan to do anything about the default/out of the box experience? Either limiting the number of queues or making both ends cope more gracefully with failure to create some queues (or both) might be sufficient? I think the crash after the above in the first link at [0] is fixed? I think that was the purpose of ca88ea1247df "xen-netfront: update num_queues to real created" which was in 4.3. I think the correct solution is to increase the default maximum grant table size. That could well make sense, but then there will just be another higher limit, so we should perhaps do both. i.e. factoring in: * performance i.e. ability for N queues to saturate whatever sort of link contemporary Linux can saturate these days, plus some headroom, or whatever other ceiling seems sensible) * grant table resource consumption i.e. (sensible max number of blks * nr gnts per blk + sensible max number of vifs * nr gnts per vif + other devs needs) < per guest grant limit) to pick both the default gnttab size and the default max queuers. Yes. Would it waste lots of resources in the case where guest vif has lots of queue but no network load? Here is an example of gntref consumed by vif, Dom0 20vcpu, domu 20vcpu, one vif would consumes 20*256*2=10240 gntref. If setting the maximum grant table size to 64pages(default value of xen is 32pages now?), then only 3 vif is supported in guest. Even blk isn't taken account in, and also blk multi-page ring feature. Thanks Annie Although, unless you're using the not-yet-applied per-cpu rwlock patches multiqueue is terrible on many (multisocket) systems and the number of queue should be limited in netback to 4 or even just 2. Presumably the guest can't tell, so it can't do this. I think when you say "terrible" you don't mean "worse than without mq" but rather "not realising the expected gains from a larger nunber of queues", right?. Malcolm did the analysis but if I remember correctly, 8 queues performed about the same as 1 queue and 16 were worse than 1 queue. David ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] netfront/netback multiqueue exhausting grants
On 2016/1/20 7:23, Ian Campbell wrote: There have been a few reports recently[0] which relate to a failure of netfront to allocate sufficient grant refs for all the queues: [0.533589] xen_netfront: can't alloc rx grant refs [0.533612] net eth0: only created 31 queues Which can be worked around by increasing the number of grants on the hypervisor command line or by limiting the number of queues permitted by either back or front using a module param (which was broken but is now fixed on both sides, but I'm not sure it has been backported everywhere such that it is a reliable thing to always tell users as a workaround). Following are the patches to fix module param, they exist since v4.3. xen-netfront: respect user provided max_queues https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=32a844056fd43dda647e1c3c6b9983bdfa04d17d xen-netback: respect user provided max_queues https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=4c82ac3c37363e8c4ded6a5fe1ec5fa756b34df3 Is there any plan to do anything about the default/out of the box experience? Either limiting the number of queues or making both ends cope more gracefully with failure to create some queues (or both) might be sufficient? We run into similar issue recently, and guess it is better to suggest user to set netback module parameter with the default value as 8? see this link, http://wiki.xenproject.org/wiki/Xen-netback_and_xen-netfront_multi-queue_performance_testing Probably more test are needed to get the default number of best experience. I think the crash after the above in the first link at [0] is fixed? I think that was the purpose of ca88ea1247df "xen-netfront: update num_queues to real created" which was in 4.3. Correct. Thanks Annie Ian. [0] http://lists.xen.org/archives/html/xen-users/2016-01/msg00100.html http://lists.xen.org/archives/html/xen-users/2016-01/msg00072.html some before hte xmas break too IIRC ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel