Re: [Xen-devel] [PATCH 1/1] xen-blkback: stop blkback thread of every queue in xen_blkif_disconnect

2017-08-24 Thread annie li


On 8/23/2017 5:20 PM, Konrad Rzeszutek Wilk wrote:

..snip..

Acked-by: Roger Pau Monné 

Forgot to add, this needs to be backported to stable branches, so:

Annie, could you resend the patch with the tags and an update
to the description to me please?

Done

Thanks
Annie

Cc: sta...@vger.kernel.org

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel



___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 1/1] xen-blkback: stop blkback thread of every queue in xen_blkif_disconnect

2017-08-24 Thread Annie Li
In xen_blkif_disconnect, before checking inflight I/O, following code
stops the blkback thread,
if (ring->xenblkd) {
kthread_stop(ring->xenblkd);
wake_up(>shutdown_wq);
}
If there is inflight I/O in any non-last queue, blkback returns -EBUSY
directly, and above code would not be called to stop thread of remaining
queue and processs them. When removing vbd device with lots of disk I/O
load, some queues with inflight I/O still have blkback thread running even
though the corresponding vbd device or guest is gone.
And this could cause some problems, for example, if the backend device type
is file, some loop devices and blkback thread always lingers there forever
after guest is destroyed, and this causes failure of umounting repositories
unless rebooting the dom0.
This patch allows thread of every queue has the chance to get stopped.
Otherwise, only thread of queue previous to(including) first busy one get
stopped, blkthread of remaining queue will still run.  So stop all threads
properly and return -EBUSY if any queue has inflight I/O.

Signed-off-by: Annie Li <annie...@oracle.com>
Reviewed-by: Herbert van den Bergh <herbert.van.den.be...@oracle.com>
Reviewed-by: Bhavesh Davda <bhavesh.da...@oracle.com>
Reviewed-by: Adnan Misherfi <adnan.mishe...@oracle.com>
Acked-by: Roger Pau Monné <roger@citrix.com>
---
  v2: enhance patch description and add Acked-by
---
 drivers/block/xen-blkback/xenbus.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/block/xen-blkback/xenbus.c 
b/drivers/block/xen-blkback/xenbus.c
index 792da68..2adb859 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -244,6 +244,7 @@ static int xen_blkif_disconnect(struct xen_blkif *blkif)
 {
struct pending_req *req, *n;
unsigned int j, r;
+   bool busy = false;
 
for (r = 0; r < blkif->nr_rings; r++) {
struct xen_blkif_ring *ring = >rings[r];
@@ -261,8 +262,10 @@ static int xen_blkif_disconnect(struct xen_blkif *blkif)
 * don't have any discard_io or other_io requests. So, checking
 * for inflight IO is enough.
 */
-   if (atomic_read(>inflight) > 0)
-   return -EBUSY;
+   if (atomic_read(>inflight) > 0) {
+   busy = true;
+   continue;
+   }
 
if (ring->irq) {
unbind_from_irqhandler(ring->irq, ring);
@@ -300,6 +303,9 @@ static int xen_blkif_disconnect(struct xen_blkif *blkif)
WARN_ON(i != (XEN_BLKIF_REQS_PER_PAGE * blkif->nr_ring_pages));
ring->active = false;
}
+   if (busy)
+   return -EBUSY;
+
blkif->nr_ring_pages = 0;
/*
 * blkif->rings was allocated in connect_ring, so we should free it in
-- 
1.9.3


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 1/1] xen-blkback: stop blkback thread of every queue in xen_blkif_disconnect

2017-08-18 Thread annie li


On 8/18/2017 5:14 AM, Roger Pau Monné wrote:

On Thu, Aug 17, 2017 at 06:43:46PM -0400, Annie Li wrote:

If there is inflight I/O in any non-last queue, blkback returns -EBUSY
directly, and never stops thread of remaining queue and processs them. When
removing vbd device with lots of disk I/O load, some queues with inflight
I/O still have blkback thread running even though the corresponding vbd
device or guest is gone.
And this could cause some problems, for example, if the backend device type
is file, some loop devices and blkback thread always lingers there forever
after guest is destroyed, and this causes failure of umounting repositories
unless rebooting the dom0. So stop all threads properly and return -EBUSY
if any queue has inflight I/O.

Signed-off-by: Annie Li <annie...@oracle.com>
Reviewed-by: Herbert van den Bergh <herbert.van.den.be...@oracle.com>
Reviewed-by: Bhavesh Davda <bhavesh.da...@oracle.com>
Reviewed-by: Adnan Misherfi <adnan.mishe...@oracle.com>
---
  drivers/block/xen-blkback/xenbus.c | 10 --
  1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/block/xen-blkback/xenbus.c 
b/drivers/block/xen-blkback/xenbus.c
index 792da68..2adb859 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -244,6 +244,7 @@ static int xen_blkif_disconnect(struct xen_blkif *blkif)
  {
struct pending_req *req, *n;
unsigned int j, r;
+   bool busy = false;
  
  	for (r = 0; r < blkif->nr_rings; r++) {

struct xen_blkif_ring *ring = >rings[r];
@@ -261,8 +262,10 @@ static int xen_blkif_disconnect(struct xen_blkif *blkif)
 * don't have any discard_io or other_io requests. So, checking
 * for inflight IO is enough.
 */
-   if (atomic_read(>inflight) > 0)
-   return -EBUSY;
+   if (atomic_read(>inflight) > 0) {
+   busy = true;
+   continue;
+   }

I guess I'm missing something, but I don't see how this is solving the
problem described in the description.

If the problem is that xen_blkif_disconnect returns without cleaning
all the queues, this patch keeps the current behavior, just that it
will try to remove more queues before returning, as opposed to
returning when finding the first busy queue.

Before checking inflight, following code stops the blkback thread,
if (ring->xenblkd) {
kthread_stop(ring->xenblkd);
wake_up(>shutdown_wq);
}
This patch allows thread of every queue has the chance to get stopped. 
Otherwise, only thread of queue before(including) first busy one get 
stopped, threads of remaining queue will still run, and these blkthread 
and corresponding loop device will linger forever even after guest is 
destroyed.


Thanks
Annie


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH 1/1] xen-blkback: stop blkback thread of every queue in xen_blkif_disconnect

2017-08-17 Thread Annie Li
If there is inflight I/O in any non-last queue, blkback returns -EBUSY
directly, and never stops thread of remaining queue and processs them. When
removing vbd device with lots of disk I/O load, some queues with inflight
I/O still have blkback thread running even though the corresponding vbd
device or guest is gone.
And this could cause some problems, for example, if the backend device type
is file, some loop devices and blkback thread always lingers there forever
after guest is destroyed, and this causes failure of umounting repositories
unless rebooting the dom0. So stop all threads properly and return -EBUSY
if any queue has inflight I/O.

Signed-off-by: Annie Li <annie...@oracle.com>
Reviewed-by: Herbert van den Bergh <herbert.van.den.be...@oracle.com>
Reviewed-by: Bhavesh Davda <bhavesh.da...@oracle.com>
Reviewed-by: Adnan Misherfi <adnan.mishe...@oracle.com>
---
 drivers/block/xen-blkback/xenbus.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/block/xen-blkback/xenbus.c 
b/drivers/block/xen-blkback/xenbus.c
index 792da68..2adb859 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -244,6 +244,7 @@ static int xen_blkif_disconnect(struct xen_blkif *blkif)
 {
struct pending_req *req, *n;
unsigned int j, r;
+   bool busy = false;
 
for (r = 0; r < blkif->nr_rings; r++) {
struct xen_blkif_ring *ring = >rings[r];
@@ -261,8 +262,10 @@ static int xen_blkif_disconnect(struct xen_blkif *blkif)
 * don't have any discard_io or other_io requests. So, checking
 * for inflight IO is enough.
 */
-   if (atomic_read(>inflight) > 0)
-   return -EBUSY;
+   if (atomic_read(>inflight) > 0) {
+   busy = true;
+   continue;
+   }
 
if (ring->irq) {
unbind_from_irqhandler(ring->irq, ring);
@@ -300,6 +303,9 @@ static int xen_blkif_disconnect(struct xen_blkif *blkif)
WARN_ON(i != (XEN_BLKIF_REQS_PER_PAGE * blkif->nr_ring_pages));
ring->active = false;
}
+   if (busy)
+   return -EBUSY;
+
blkif->nr_ring_pages = 0;
/*
 * blkif->rings was allocated in connect_ring, so we should free it in
-- 
1.9.3


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 04/10] acpi: Power and Sleep ACPI buttons are not emulated

2016-11-07 Thread annie li



On 11/7/2016 10:38 AM, Konrad Rzeszutek Wilk wrote:

On Sun, Nov 06, 2016 at 04:42:37PM -0500, Boris Ostrovsky wrote:

.. for PVH guests. However, since emulating them for HVM guests
also doesn't seem useful we can have FADT disable those buttons
for both types of guests.

Wait, we need S3 suspend for HVM guests (think Windows without PV drivers)!
And I this is how we "inject" the event to the guest.

CC-ing Annie and Adnan who I believe had to do something like this
in old version of OVS, maybe they can recall more details.
Our winpv supports S4(hibernation) which will be tested by WHQL, and 
S3(Sleep) is optional(no real process for it yet).


Thanks
Annie

Signed-off-by: Boris Ostrovsky 
---
  tools/libacpi/static_tables.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tools/libacpi/static_tables.c b/tools/libacpi/static_tables.c
index 413abcc..ebe8ffe 100644
--- a/tools/libacpi/static_tables.c
+++ b/tools/libacpi/static_tables.c
@@ -61,7 +61,8 @@ struct acpi_20_fadt Fadt = {
  .flags = (ACPI_PROC_C1 |
ACPI_WBINVD |
ACPI_FIX_RTC | ACPI_TMR_VAL_EXT |
-  ACPI_USE_PLATFORM_CLOCK),
+  ACPI_USE_PLATFORM_CLOCK |
+  ACPI_PWR_BUTTON | ACPI_SLP_BUTTON),
  
  .reset_reg = {

  .address_space_id= ACPI_SYSTEM_IO,
--
2.7.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel



___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] xl migrate regression in staging

2016-06-02 Thread annie li


On 6/2/2016 11:21 AM, Wei Liu wrote:

On Thu, Jun 02, 2016 at 11:18:36AM -0400, annie li wrote:

On 4/14/2016 11:36 AM, Olaf Hering wrote:

On Thu, Apr 14, Wei Liu wrote:


Maybe go back to 96ae556569b8eaedc0bb242932842c3277b515d8 and try again?
Then 5cf46a66883ad7a56c5bdee97696373473f80974 and try? So that I can
know if it is related to COLO series. No, don't try to bisect that
because it's broken in the middle.

I think I took enough snapshots for my rpm packages to get close to the
above commits. It will take some time to cycle through them.



Also as Andrew suggested if you can reproduce it with a smaller domain
or share the guest image with us, that would be helpful.

I have already sent him the link.

Any update, guys?
I hit similar problem recently too.


Yes, this has been fixed in master branch.

See de28b189bd329dd20a245a318be8feea2c4cc60a.

Nice! tested with newer version with this patch, it works.

Thanks
Annie


Wei.


Thanks
Annie

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] xl migrate regression in staging

2016-06-02 Thread annie li


On 4/14/2016 11:36 AM, Olaf Hering wrote:

On Thu, Apr 14, Wei Liu wrote:


Maybe go back to 96ae556569b8eaedc0bb242932842c3277b515d8 and try again?
Then 5cf46a66883ad7a56c5bdee97696373473f80974 and try? So that I can
know if it is related to COLO series. No, don't try to bisect that
because it's broken in the middle.

I think I took enough snapshots for my rpm packages to get close to the
above commits. It will take some time to cycle through them.



Also as Andrew suggested if you can reproduce it with a smaller domain
or share the guest image with us, that would be helpful.

I have already sent him the link.

Any update, guys?
I hit similar problem recently too.

Thanks
Annie

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] netfront/netback multiqueue exhausting grants

2016-01-21 Thread annie li


On 2016/1/21 9:17, David Vrabel wrote:

On 21/01/16 12:19, Ian Campbell wrote:

On Thu, 2016-01-21 at 10:56 +, David Vrabel wrote:

On 20/01/16 12:23, Ian Campbell wrote:

There have been a few reports recently[0] which relate to a failure of
netfront to allocate sufficient grant refs for all the queues:

[0.533589] xen_netfront: can't alloc rx grant refs
[0.533612] net eth0: only created 31 queues

Which can be worked around by increasing the number of grants on the
hypervisor command line or by limiting the number of queues permitted
by
either back or front using a module param (which was broken but is now
fixed on both sides, but I'm not sure it has been backported everywhere
such that it is a reliable thing to always tell users as a workaround).

Is there any plan to do anything about the default/out of the box
experience? Either limiting the number of queues or making both ends
cope
more gracefully with failure to create some queues (or both) might be
sufficient?

I think the crash after the above in the first link at [0] is fixed? I
think that was the purpose of ca88ea1247df "xen-netfront: update
num_queues
to real created" which was in 4.3.

I think the correct solution is to increase the default maximum grant
table size.

That could well make sense, but then there will just be another higher
limit, so we should perhaps do both.

i.e. factoring in:
  * performance i.e. ability for N queues to saturate whatever sort of link
contemporary Linux can saturate these days, plus some headroom, or
whatever other ceiling seems sensible)
  * grant table resource consumption i.e. (sensible max number of blks * nr
gnts per blk + sensible max number of vifs * nr gnts per vif + other
devs needs) < per guest grant limit) to pick both the default gnttab
size and the default max queuers.

Yes.


Would it waste lots of resources in the case where guest vif has lots of 
queue but no network load? Here is an example of gntref consumed by vif,

Dom0 20vcpu, domu 20vcpu,
one vif would consumes 20*256*2=10240 gntref.
If setting the maximum grant table size to 64pages(default value of xen 
is 32pages now?), then only 3 vif is supported in guest. Even blk isn't 
taken account in, and also blk multi-page ring feature.


Thanks
Annie



Although, unless you're using the not-yet-applied per-cpu rwlock patches
multiqueue is terrible on many (multisocket) systems and the number of
queue should be limited in netback to 4 or even just 2.

Presumably the guest can't tell, so it can't do this.

I think when you say "terrible" you don't mean "worse than without mq" but
rather "not realising the expected gains from a larger nunber of queues",
right?.

Malcolm did the analysis but if I remember correctly, 8 queues performed
about the same as 1 queue and 16 were worse than 1 queue.

David

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] netfront/netback multiqueue exhausting grants

2016-01-20 Thread annie li


On 2016/1/20 7:23, Ian Campbell wrote:

There have been a few reports recently[0] which relate to a failure of
netfront to allocate sufficient grant refs for all the queues:

[0.533589] xen_netfront: can't alloc rx grant refs
[0.533612] net eth0: only created 31 queues

Which can be worked around by increasing the number of grants on the
hypervisor command line or by limiting the number of queues permitted by
either back or front using a module param (which was broken but is now
fixed on both sides, but I'm not sure it has been backported everywhere
such that it is a reliable thing to always tell users as a workaround).

Following are the patches to fix module param, they exist since v4.3.
xen-netfront: respect user provided max_queues
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=32a844056fd43dda647e1c3c6b9983bdfa04d17d
xen-netback: respect user provided max_queues
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=4c82ac3c37363e8c4ded6a5fe1ec5fa756b34df3


Is there any plan to do anything about the default/out of the box
experience? Either limiting the number of queues or making both ends cope
more gracefully with failure to create some queues (or both) might be
sufficient?
We run into similar issue recently, and guess it is better to suggest 
user to set netback module parameter with the default value as 8? see 
this link,
http://wiki.xenproject.org/wiki/Xen-netback_and_xen-netfront_multi-queue_performance_testing 


Probably more test are needed to get the default number of best experience.



I think the crash after the above in the first link at [0] is fixed? I
think that was the purpose of ca88ea1247df "xen-netfront: update num_queues
to real created" which was in 4.3.

Correct.

Thanks
Annie


Ian.

[0] http://lists.xen.org/archives/html/xen-users/2016-01/msg00100.html
 http://lists.xen.org/archives/html/xen-users/2016-01/msg00072.html
 some before hte xmas break too IIRC

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel