Re: [Xen-devel] [PATCH] libxl: do not fail device removal if backend domain is gone

2018-02-23 Thread Wei Liu
On Fri, Feb 09, 2018 at 12:22:13AM +0100, Marek Marczykowski-Górecki wrote:
> Backend domain may be independently destroyed - there is no
> synchronization of libxl structures (including /libxl tree) elsewhere.
> Backend might also remove the device info from its backend xenstore
> subtree on its own.
> If such situation is detected, do not fail the removal, but finish the
> cleanup of the frontend side.
> 
> This is just workaround, the real fix should watch when the device
> backend is removed (including backend domain destruction) and remove
> frontend at that time. And report such event to higher layer code, so
> for example libvirt could synchronize its state.
> 
> Signed-off-by: Marek Marczykowski-Górecki 

Acked-by: Wei Liu 

Can you please resend with commit message updated, thanks.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] libxl: do not fail device removal if backend domain is gone

2018-02-09 Thread Roger Pau Monné
On Fri, Feb 09, 2018 at 05:32:47PM +0100, Marek Marczykowski-Górecki wrote:
> On Fri, Feb 09, 2018 at 03:33:40PM +, Roger Pau Monné wrote:
> > I'm sorry, I'm a little foggy today. Does this mean the call to
> > libxl__xs_path_cleanup is simply not needed in
> > libxl__initiate_device_generic_remove?
> 
> It is, it's an alternative to setting be/state=XenbusStateClosing, when
> frontend is unresponsive. To let the backend know that frontend is gone,
> so it can set be/state=XenbusStateClosed.
> 
> We have various cases (not comprehensive list):
> 
>  - both frontend and backend operational: after setting
>be/state=XenbusStateClosing backend wait for frontend confirmation
>and respond with be/state=XenbusStateClosed; then libxl in dom0
>remove frontend entries and libxl in backend domain (which may be the
>same) remove backend entries
>  - unresponsive backend/frontend: after a timeout, force=1 is used to remove
>frontend entries, instead of just setting
>be/state=XenbusStateClosing; then wait for be/state=XenbusStateClosed.
>If that timeout too, remove both frontend and backend entries
>  - backend gone, with this patch: no place for setting/waiting on
>be/state - go directly to removing frontend entries, without waiting
>for be/state=XenbusStateClosed (this is the difference vs force=1)
> 
> Without this patch the end result is similar, both frontend and backend
> entries are removed, but in case of backend gone:
>  - libxl waits for be/state=XenbusStateClosed (and obviously timeout)
>  - return value from the function signal an error, which for example
>confuse libvirt - it thinks the device remove failed, so is still
>there

Thanks. I think I've got it now and I agree on the patch. However it
might be nice to add the above explanation to the commit message.

Reviewed-by: Roger Pau Monné 

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] libxl: do not fail device removal if backend domain is gone

2018-02-09 Thread Marek Marczykowski-Górecki
On Fri, Feb 09, 2018 at 03:33:40PM +, Roger Pau Monné wrote:
> I'm sorry, I'm a little foggy today. Does this mean the call to
> libxl__xs_path_cleanup is simply not needed in
> libxl__initiate_device_generic_remove?

It is, it's an alternative to setting be/state=XenbusStateClosing, when
frontend is unresponsive. To let the backend know that frontend is gone,
so it can set be/state=XenbusStateClosed.

We have various cases (not comprehensive list):

 - both frontend and backend operational: after setting
   be/state=XenbusStateClosing backend wait for frontend confirmation
   and respond with be/state=XenbusStateClosed; then libxl in dom0
   remove frontend entries and libxl in backend domain (which may be the
   same) remove backend entries
 - unresponsive backend/frontend: after a timeout, force=1 is used to remove
   frontend entries, instead of just setting
   be/state=XenbusStateClosing; then wait for be/state=XenbusStateClosed.
   If that timeout too, remove both frontend and backend entries
 - backend gone, with this patch: no place for setting/waiting on
   be/state - go directly to removing frontend entries, without waiting
   for be/state=XenbusStateClosed (this is the difference vs force=1)

Without this patch the end result is similar, both frontend and backend
entries are removed, but in case of backend gone:
 - libxl waits for be/state=XenbusStateClosed (and obviously timeout)
 - return value from the function signal an error, which for example
   confuse libvirt - it thinks the device remove failed, so is still
   there

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?


signature.asc
Description: PGP signature
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] libxl: do not fail device removal if backend domain is gone

2018-02-09 Thread Roger Pau Monné
On Fri, Feb 09, 2018 at 04:11:06PM +0100, Marek Marczykowski-Górecki wrote:
> On Fri, Feb 09, 2018 at 02:39:08PM +, Roger Pau Monné wrote:
> > On Fri, Feb 09, 2018 at 02:08:33PM +0100, Marek Marczykowski-Górecki wrote:
> > > On Fri, Feb 09, 2018 at 12:10:39PM +, Roger Pau Monné wrote:
> > > > On Fri, Feb 09, 2018 at 12:41:58PM +0100, Marek Marczykowski-Górecki 
> > > > wrote:
> > > > > On Fri, Feb 09, 2018 at 11:27:04AM +, Roger Pau Monné wrote:
> > > > > > I'm also wondering, if you jump to 'out' here, you avoid the call to
> > > > > > libxl__xs_transaction_commit and instead end up calling
> > > > > > libxl__xs_transaction_abort, which means the above call to
> > > > > > libxl__xs_path_cleanup will not be committed to xenstore, is this
> > > > > > really desired?
> > > > > >
> > > > > > It seems to me libxl might leak xenstore frontend entries in that
> > > > > > case.
> > > > > 
> > > > > That call is only if aodev->force. In other cases cleanup is done in
> > > > > device_hotplug_done()->libxl__device_destroy(), which have its own 
> > > > > transaction.
> > > > 
> > > > Hm, right, but this would still be incorrect in the force case then?
> > > > Or is this simply not needed for the 'force' case?
> > > 
> > > In that case, the first libxl__xs_path_cleanup will indeed be aborted.
> > > But then it will be cleaned up the same way as in !force case.
> > > Anyway, this is about the case when backend is already gone, so 'force'
> > > doesn't really change anything - it was forcefully removed already, by
> > > shutting down the backend domain (or removing backend using something
> > > else)...
> > 
> > Are you sure? The libxl__xs_path_cleanup call is not for removing the
> > backend entries but the frontend ones. Ie: the backend entries not
> > being present doesn't imply the frontend entries also not being
> > present.
> 
> But libxl__device_destroy do remove frontend entries.

I'm sorry, I'm a little foggy today. Does this mean the call to
libxl__xs_path_cleanup is simply not needed in
libxl__initiate_device_generic_remove?

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] libxl: do not fail device removal if backend domain is gone

2018-02-09 Thread Marek Marczykowski-Górecki
On Fri, Feb 09, 2018 at 02:39:08PM +, Roger Pau Monné wrote:
> On Fri, Feb 09, 2018 at 02:08:33PM +0100, Marek Marczykowski-Górecki wrote:
> > On Fri, Feb 09, 2018 at 12:10:39PM +, Roger Pau Monné wrote:
> > > On Fri, Feb 09, 2018 at 12:41:58PM +0100, Marek Marczykowski-Górecki 
> > > wrote:
> > > > On Fri, Feb 09, 2018 at 11:27:04AM +, Roger Pau Monné wrote:
> > > > > I'm also wondering, if you jump to 'out' here, you avoid the call to
> > > > > libxl__xs_transaction_commit and instead end up calling
> > > > > libxl__xs_transaction_abort, which means the above call to
> > > > > libxl__xs_path_cleanup will not be committed to xenstore, is this
> > > > > really desired?
> > > > >
> > > > > It seems to me libxl might leak xenstore frontend entries in that
> > > > > case.
> > > > 
> > > > That call is only if aodev->force. In other cases cleanup is done in
> > > > device_hotplug_done()->libxl__device_destroy(), which have its own 
> > > > transaction.
> > > 
> > > Hm, right, but this would still be incorrect in the force case then?
> > > Or is this simply not needed for the 'force' case?
> > 
> > In that case, the first libxl__xs_path_cleanup will indeed be aborted.
> > But then it will be cleaned up the same way as in !force case.
> > Anyway, this is about the case when backend is already gone, so 'force'
> > doesn't really change anything - it was forcefully removed already, by
> > shutting down the backend domain (or removing backend using something
> > else)...
> 
> Are you sure? The libxl__xs_path_cleanup call is not for removing the
> backend entries but the frontend ones. Ie: the backend entries not
> being present doesn't imply the frontend entries also not being
> present.

But libxl__device_destroy do remove frontend entries.

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?


signature.asc
Description: PGP signature
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] libxl: do not fail device removal if backend domain is gone

2018-02-09 Thread Roger Pau Monné
On Fri, Feb 09, 2018 at 02:08:33PM +0100, Marek Marczykowski-Górecki wrote:
> On Fri, Feb 09, 2018 at 12:10:39PM +, Roger Pau Monné wrote:
> > On Fri, Feb 09, 2018 at 12:41:58PM +0100, Marek Marczykowski-Górecki wrote:
> > > On Fri, Feb 09, 2018 at 11:27:04AM +, Roger Pau Monné wrote:
> > > > I'm also wondering, if you jump to 'out' here, you avoid the call to
> > > > libxl__xs_transaction_commit and instead end up calling
> > > > libxl__xs_transaction_abort, which means the above call to
> > > > libxl__xs_path_cleanup will not be committed to xenstore, is this
> > > > really desired?
> > > >
> > > > It seems to me libxl might leak xenstore frontend entries in that
> > > > case.
> > > 
> > > That call is only if aodev->force. In other cases cleanup is done in
> > > device_hotplug_done()->libxl__device_destroy(), which have its own 
> > > transaction.
> > 
> > Hm, right, but this would still be incorrect in the force case then?
> > Or is this simply not needed for the 'force' case?
> 
> In that case, the first libxl__xs_path_cleanup will indeed be aborted.
> But then it will be cleaned up the same way as in !force case.
> Anyway, this is about the case when backend is already gone, so 'force'
> doesn't really change anything - it was forcefully removed already, by
> shutting down the backend domain (or removing backend using something
> else)...

Are you sure? The libxl__xs_path_cleanup call is not for removing the
backend entries but the frontend ones. Ie: the backend entries not
being present doesn't imply the frontend entries also not being
present.

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] libxl: do not fail device removal if backend domain is gone

2018-02-09 Thread Marek Marczykowski-Górecki
On Fri, Feb 09, 2018 at 12:10:39PM +, Roger Pau Monné wrote:
> On Fri, Feb 09, 2018 at 12:41:58PM +0100, Marek Marczykowski-Górecki wrote:
> > On Fri, Feb 09, 2018 at 11:27:04AM +, Roger Pau Monné wrote:
> > > I'm also wondering, if you jump to 'out' here, you avoid the call to
> > > libxl__xs_transaction_commit and instead end up calling
> > > libxl__xs_transaction_abort, which means the above call to
> > > libxl__xs_path_cleanup will not be committed to xenstore, is this
> > > really desired?
> > >
> > > It seems to me libxl might leak xenstore frontend entries in that
> > > case.
> > 
> > That call is only if aodev->force. In other cases cleanup is done in
> > device_hotplug_done()->libxl__device_destroy(), which have its own 
> > transaction.
> 
> Hm, right, but this would still be incorrect in the force case then?
> Or is this simply not needed for the 'force' case?

In that case, the first libxl__xs_path_cleanup will indeed be aborted.
But then it will be cleaned up the same way as in !force case.
Anyway, this is about the case when backend is already gone, so 'force'
doesn't really change anything - it was forcefully removed already, by
shutting down the backend domain (or removing backend using something
else)...

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?


signature.asc
Description: PGP signature
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] libxl: do not fail device removal if backend domain is gone

2018-02-09 Thread Roger Pau Monné
On Fri, Feb 09, 2018 at 12:41:58PM +0100, Marek Marczykowski-Górecki wrote:
> On Fri, Feb 09, 2018 at 11:27:04AM +, Roger Pau Monné wrote:
> > I'm also wondering, if you jump to 'out' here, you avoid the call to
> > libxl__xs_transaction_commit and instead end up calling
> > libxl__xs_transaction_abort, which means the above call to
> > libxl__xs_path_cleanup will not be committed to xenstore, is this
> > really desired?
> >
> > It seems to me libxl might leak xenstore frontend entries in that
> > case.
> 
> That call is only if aodev->force. In other cases cleanup is done in
> device_hotplug_done()->libxl__device_destroy(), which have its own 
> transaction.

Hm, right, but this would still be incorrect in the force case then?
Or is this simply not needed for the 'force' case?

Thanks, Roger

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] libxl: do not fail device removal if backend domain is gone

2018-02-09 Thread Marek Marczykowski-Górecki
On Fri, Feb 09, 2018 at 11:27:04AM +, Roger Pau Monné wrote:
> On Fri, Feb 09, 2018 at 12:22:13AM +0100, Marek Marczykowski-Górecki wrote:
> > Backend domain may be independently destroyed - there is no
> > synchronization of libxl structures (including /libxl tree) elsewhere.
> > Backend might also remove the device info from its backend xenstore
> > subtree on its own.
> > If such situation is detected, do not fail the removal, but finish the
> > cleanup of the frontend side.
> > 
> > This is just workaround, the real fix should watch when the device
> > backend is removed (including backend domain destruction) and remove
> > frontend at that time. And report such event to higher layer code, so
> > for example libvirt could synchronize its state.
> > 
> > Signed-off-by: Marek Marczykowski-Górecki 
> > ---
> >  tools/libxl/libxl_device.c | 7 +++
> >  1 file changed, 7 insertions(+)
> > 
> > diff --git a/tools/libxl/libxl_device.c b/tools/libxl/libxl_device.c
> > index 1b796bd392..1f58a99a23 100644
> > --- a/tools/libxl/libxl_device.c
> > +++ b/tools/libxl/libxl_device.c
> > @@ -997,6 +997,13 @@ void libxl__initiate_device_generic_remove(libxl__egc 
> > *egc,
> >  goto out;
> >  }
> >  
> > +/* if state_path is empty, assume backend is gone (backend domain
> > + * shutdown?), cleanup frontend only; rc=0 */
> > +if (!state) {
> > +LOG(WARN, "backend %s already removed, cleanup frontend only", 
> > be_path);
> > +goto out;
> > +}
> > +
> 
> I think INFO should be used instead of WARN, since this doesn't look
> to be a cause for concern from an admin PoV.

Ok, will change.

> I'm also wondering, if you jump to 'out' here, you avoid the call to
> libxl__xs_transaction_commit and instead end up calling
> libxl__xs_transaction_abort, which means the above call to
> libxl__xs_path_cleanup will not be committed to xenstore, is this
> really desired?
>
> It seems to me libxl might leak xenstore frontend entries in that
> case.

That call is only if aodev->force. In other cases cleanup is done in
device_hotplug_done()->libxl__device_destroy(), which have its own transaction.

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?


signature.asc
Description: PGP signature
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH] libxl: do not fail device removal if backend domain is gone

2018-02-08 Thread Marek Marczykowski-Górecki
Backend domain may be independently destroyed - there is no
synchronization of libxl structures (including /libxl tree) elsewhere.
Backend might also remove the device info from its backend xenstore
subtree on its own.
If such situation is detected, do not fail the removal, but finish the
cleanup of the frontend side.

This is just workaround, the real fix should watch when the device
backend is removed (including backend domain destruction) and remove
frontend at that time. And report such event to higher layer code, so
for example libvirt could synchronize its state.

Signed-off-by: Marek Marczykowski-Górecki 
---
 tools/libxl/libxl_device.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/tools/libxl/libxl_device.c b/tools/libxl/libxl_device.c
index 1b796bd392..1f58a99a23 100644
--- a/tools/libxl/libxl_device.c
+++ b/tools/libxl/libxl_device.c
@@ -997,6 +997,13 @@ void libxl__initiate_device_generic_remove(libxl__egc *egc,
 goto out;
 }
 
+/* if state_path is empty, assume backend is gone (backend domain
+ * shutdown?), cleanup frontend only; rc=0 */
+if (!state) {
+LOG(WARN, "backend %s already removed, cleanup frontend only", 
be_path);
+goto out;
+}
+
 rc = libxl__xs_write_checked(gc, t, online_path, "0");
 if (rc)
 goto out;
-- 
2.13.6


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel