Re: [Xen-devel] [PATCH] libxl: do not fail device removal if backend domain is gone
On Fri, Feb 09, 2018 at 12:22:13AM +0100, Marek Marczykowski-Górecki wrote: > Backend domain may be independently destroyed - there is no > synchronization of libxl structures (including /libxl tree) elsewhere. > Backend might also remove the device info from its backend xenstore > subtree on its own. > If such situation is detected, do not fail the removal, but finish the > cleanup of the frontend side. > > This is just workaround, the real fix should watch when the device > backend is removed (including backend domain destruction) and remove > frontend at that time. And report such event to higher layer code, so > for example libvirt could synchronize its state. > > Signed-off-by: Marek Marczykowski-GóreckiAcked-by: Wei Liu Can you please resend with commit message updated, thanks. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] libxl: do not fail device removal if backend domain is gone
On Fri, Feb 09, 2018 at 05:32:47PM +0100, Marek Marczykowski-Górecki wrote: > On Fri, Feb 09, 2018 at 03:33:40PM +, Roger Pau Monné wrote: > > I'm sorry, I'm a little foggy today. Does this mean the call to > > libxl__xs_path_cleanup is simply not needed in > > libxl__initiate_device_generic_remove? > > It is, it's an alternative to setting be/state=XenbusStateClosing, when > frontend is unresponsive. To let the backend know that frontend is gone, > so it can set be/state=XenbusStateClosed. > > We have various cases (not comprehensive list): > > - both frontend and backend operational: after setting >be/state=XenbusStateClosing backend wait for frontend confirmation >and respond with be/state=XenbusStateClosed; then libxl in dom0 >remove frontend entries and libxl in backend domain (which may be the >same) remove backend entries > - unresponsive backend/frontend: after a timeout, force=1 is used to remove >frontend entries, instead of just setting >be/state=XenbusStateClosing; then wait for be/state=XenbusStateClosed. >If that timeout too, remove both frontend and backend entries > - backend gone, with this patch: no place for setting/waiting on >be/state - go directly to removing frontend entries, without waiting >for be/state=XenbusStateClosed (this is the difference vs force=1) > > Without this patch the end result is similar, both frontend and backend > entries are removed, but in case of backend gone: > - libxl waits for be/state=XenbusStateClosed (and obviously timeout) > - return value from the function signal an error, which for example >confuse libvirt - it thinks the device remove failed, so is still >there Thanks. I think I've got it now and I agree on the patch. However it might be nice to add the above explanation to the commit message. Reviewed-by: Roger Pau MonnéRoger. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] libxl: do not fail device removal if backend domain is gone
On Fri, Feb 09, 2018 at 03:33:40PM +, Roger Pau Monné wrote: > I'm sorry, I'm a little foggy today. Does this mean the call to > libxl__xs_path_cleanup is simply not needed in > libxl__initiate_device_generic_remove? It is, it's an alternative to setting be/state=XenbusStateClosing, when frontend is unresponsive. To let the backend know that frontend is gone, so it can set be/state=XenbusStateClosed. We have various cases (not comprehensive list): - both frontend and backend operational: after setting be/state=XenbusStateClosing backend wait for frontend confirmation and respond with be/state=XenbusStateClosed; then libxl in dom0 remove frontend entries and libxl in backend domain (which may be the same) remove backend entries - unresponsive backend/frontend: after a timeout, force=1 is used to remove frontend entries, instead of just setting be/state=XenbusStateClosing; then wait for be/state=XenbusStateClosed. If that timeout too, remove both frontend and backend entries - backend gone, with this patch: no place for setting/waiting on be/state - go directly to removing frontend entries, without waiting for be/state=XenbusStateClosed (this is the difference vs force=1) Without this patch the end result is similar, both frontend and backend entries are removed, but in case of backend gone: - libxl waits for be/state=XenbusStateClosed (and obviously timeout) - return value from the function signal an error, which for example confuse libvirt - it thinks the device remove failed, so is still there -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? signature.asc Description: PGP signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] libxl: do not fail device removal if backend domain is gone
On Fri, Feb 09, 2018 at 04:11:06PM +0100, Marek Marczykowski-Górecki wrote: > On Fri, Feb 09, 2018 at 02:39:08PM +, Roger Pau Monné wrote: > > On Fri, Feb 09, 2018 at 02:08:33PM +0100, Marek Marczykowski-Górecki wrote: > > > On Fri, Feb 09, 2018 at 12:10:39PM +, Roger Pau Monné wrote: > > > > On Fri, Feb 09, 2018 at 12:41:58PM +0100, Marek Marczykowski-Górecki > > > > wrote: > > > > > On Fri, Feb 09, 2018 at 11:27:04AM +, Roger Pau Monné wrote: > > > > > > I'm also wondering, if you jump to 'out' here, you avoid the call to > > > > > > libxl__xs_transaction_commit and instead end up calling > > > > > > libxl__xs_transaction_abort, which means the above call to > > > > > > libxl__xs_path_cleanup will not be committed to xenstore, is this > > > > > > really desired? > > > > > > > > > > > > It seems to me libxl might leak xenstore frontend entries in that > > > > > > case. > > > > > > > > > > That call is only if aodev->force. In other cases cleanup is done in > > > > > device_hotplug_done()->libxl__device_destroy(), which have its own > > > > > transaction. > > > > > > > > Hm, right, but this would still be incorrect in the force case then? > > > > Or is this simply not needed for the 'force' case? > > > > > > In that case, the first libxl__xs_path_cleanup will indeed be aborted. > > > But then it will be cleaned up the same way as in !force case. > > > Anyway, this is about the case when backend is already gone, so 'force' > > > doesn't really change anything - it was forcefully removed already, by > > > shutting down the backend domain (or removing backend using something > > > else)... > > > > Are you sure? The libxl__xs_path_cleanup call is not for removing the > > backend entries but the frontend ones. Ie: the backend entries not > > being present doesn't imply the frontend entries also not being > > present. > > But libxl__device_destroy do remove frontend entries. I'm sorry, I'm a little foggy today. Does this mean the call to libxl__xs_path_cleanup is simply not needed in libxl__initiate_device_generic_remove? Thanks, Roger. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] libxl: do not fail device removal if backend domain is gone
On Fri, Feb 09, 2018 at 02:39:08PM +, Roger Pau Monné wrote: > On Fri, Feb 09, 2018 at 02:08:33PM +0100, Marek Marczykowski-Górecki wrote: > > On Fri, Feb 09, 2018 at 12:10:39PM +, Roger Pau Monné wrote: > > > On Fri, Feb 09, 2018 at 12:41:58PM +0100, Marek Marczykowski-Górecki > > > wrote: > > > > On Fri, Feb 09, 2018 at 11:27:04AM +, Roger Pau Monné wrote: > > > > > I'm also wondering, if you jump to 'out' here, you avoid the call to > > > > > libxl__xs_transaction_commit and instead end up calling > > > > > libxl__xs_transaction_abort, which means the above call to > > > > > libxl__xs_path_cleanup will not be committed to xenstore, is this > > > > > really desired? > > > > > > > > > > It seems to me libxl might leak xenstore frontend entries in that > > > > > case. > > > > > > > > That call is only if aodev->force. In other cases cleanup is done in > > > > device_hotplug_done()->libxl__device_destroy(), which have its own > > > > transaction. > > > > > > Hm, right, but this would still be incorrect in the force case then? > > > Or is this simply not needed for the 'force' case? > > > > In that case, the first libxl__xs_path_cleanup will indeed be aborted. > > But then it will be cleaned up the same way as in !force case. > > Anyway, this is about the case when backend is already gone, so 'force' > > doesn't really change anything - it was forcefully removed already, by > > shutting down the backend domain (or removing backend using something > > else)... > > Are you sure? The libxl__xs_path_cleanup call is not for removing the > backend entries but the frontend ones. Ie: the backend entries not > being present doesn't imply the frontend entries also not being > present. But libxl__device_destroy do remove frontend entries. -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? signature.asc Description: PGP signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] libxl: do not fail device removal if backend domain is gone
On Fri, Feb 09, 2018 at 02:08:33PM +0100, Marek Marczykowski-Górecki wrote: > On Fri, Feb 09, 2018 at 12:10:39PM +, Roger Pau Monné wrote: > > On Fri, Feb 09, 2018 at 12:41:58PM +0100, Marek Marczykowski-Górecki wrote: > > > On Fri, Feb 09, 2018 at 11:27:04AM +, Roger Pau Monné wrote: > > > > I'm also wondering, if you jump to 'out' here, you avoid the call to > > > > libxl__xs_transaction_commit and instead end up calling > > > > libxl__xs_transaction_abort, which means the above call to > > > > libxl__xs_path_cleanup will not be committed to xenstore, is this > > > > really desired? > > > > > > > > It seems to me libxl might leak xenstore frontend entries in that > > > > case. > > > > > > That call is only if aodev->force. In other cases cleanup is done in > > > device_hotplug_done()->libxl__device_destroy(), which have its own > > > transaction. > > > > Hm, right, but this would still be incorrect in the force case then? > > Or is this simply not needed for the 'force' case? > > In that case, the first libxl__xs_path_cleanup will indeed be aborted. > But then it will be cleaned up the same way as in !force case. > Anyway, this is about the case when backend is already gone, so 'force' > doesn't really change anything - it was forcefully removed already, by > shutting down the backend domain (or removing backend using something > else)... Are you sure? The libxl__xs_path_cleanup call is not for removing the backend entries but the frontend ones. Ie: the backend entries not being present doesn't imply the frontend entries also not being present. Roger. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] libxl: do not fail device removal if backend domain is gone
On Fri, Feb 09, 2018 at 12:10:39PM +, Roger Pau Monné wrote: > On Fri, Feb 09, 2018 at 12:41:58PM +0100, Marek Marczykowski-Górecki wrote: > > On Fri, Feb 09, 2018 at 11:27:04AM +, Roger Pau Monné wrote: > > > I'm also wondering, if you jump to 'out' here, you avoid the call to > > > libxl__xs_transaction_commit and instead end up calling > > > libxl__xs_transaction_abort, which means the above call to > > > libxl__xs_path_cleanup will not be committed to xenstore, is this > > > really desired? > > > > > > It seems to me libxl might leak xenstore frontend entries in that > > > case. > > > > That call is only if aodev->force. In other cases cleanup is done in > > device_hotplug_done()->libxl__device_destroy(), which have its own > > transaction. > > Hm, right, but this would still be incorrect in the force case then? > Or is this simply not needed for the 'force' case? In that case, the first libxl__xs_path_cleanup will indeed be aborted. But then it will be cleaned up the same way as in !force case. Anyway, this is about the case when backend is already gone, so 'force' doesn't really change anything - it was forcefully removed already, by shutting down the backend domain (or removing backend using something else)... -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? signature.asc Description: PGP signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] libxl: do not fail device removal if backend domain is gone
On Fri, Feb 09, 2018 at 12:41:58PM +0100, Marek Marczykowski-Górecki wrote: > On Fri, Feb 09, 2018 at 11:27:04AM +, Roger Pau Monné wrote: > > I'm also wondering, if you jump to 'out' here, you avoid the call to > > libxl__xs_transaction_commit and instead end up calling > > libxl__xs_transaction_abort, which means the above call to > > libxl__xs_path_cleanup will not be committed to xenstore, is this > > really desired? > > > > It seems to me libxl might leak xenstore frontend entries in that > > case. > > That call is only if aodev->force. In other cases cleanup is done in > device_hotplug_done()->libxl__device_destroy(), which have its own > transaction. Hm, right, but this would still be incorrect in the force case then? Or is this simply not needed for the 'force' case? Thanks, Roger ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] libxl: do not fail device removal if backend domain is gone
On Fri, Feb 09, 2018 at 11:27:04AM +, Roger Pau Monné wrote: > On Fri, Feb 09, 2018 at 12:22:13AM +0100, Marek Marczykowski-Górecki wrote: > > Backend domain may be independently destroyed - there is no > > synchronization of libxl structures (including /libxl tree) elsewhere. > > Backend might also remove the device info from its backend xenstore > > subtree on its own. > > If such situation is detected, do not fail the removal, but finish the > > cleanup of the frontend side. > > > > This is just workaround, the real fix should watch when the device > > backend is removed (including backend domain destruction) and remove > > frontend at that time. And report such event to higher layer code, so > > for example libvirt could synchronize its state. > > > > Signed-off-by: Marek Marczykowski-Górecki> > --- > > tools/libxl/libxl_device.c | 7 +++ > > 1 file changed, 7 insertions(+) > > > > diff --git a/tools/libxl/libxl_device.c b/tools/libxl/libxl_device.c > > index 1b796bd392..1f58a99a23 100644 > > --- a/tools/libxl/libxl_device.c > > +++ b/tools/libxl/libxl_device.c > > @@ -997,6 +997,13 @@ void libxl__initiate_device_generic_remove(libxl__egc > > *egc, > > goto out; > > } > > > > +/* if state_path is empty, assume backend is gone (backend domain > > + * shutdown?), cleanup frontend only; rc=0 */ > > +if (!state) { > > +LOG(WARN, "backend %s already removed, cleanup frontend only", > > be_path); > > +goto out; > > +} > > + > > I think INFO should be used instead of WARN, since this doesn't look > to be a cause for concern from an admin PoV. Ok, will change. > I'm also wondering, if you jump to 'out' here, you avoid the call to > libxl__xs_transaction_commit and instead end up calling > libxl__xs_transaction_abort, which means the above call to > libxl__xs_path_cleanup will not be committed to xenstore, is this > really desired? > > It seems to me libxl might leak xenstore frontend entries in that > case. That call is only if aodev->force. In other cases cleanup is done in device_hotplug_done()->libxl__device_destroy(), which have its own transaction. -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? signature.asc Description: PGP signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH] libxl: do not fail device removal if backend domain is gone
Backend domain may be independently destroyed - there is no synchronization of libxl structures (including /libxl tree) elsewhere. Backend might also remove the device info from its backend xenstore subtree on its own. If such situation is detected, do not fail the removal, but finish the cleanup of the frontend side. This is just workaround, the real fix should watch when the device backend is removed (including backend domain destruction) and remove frontend at that time. And report such event to higher layer code, so for example libvirt could synchronize its state. Signed-off-by: Marek Marczykowski-Górecki--- tools/libxl/libxl_device.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/tools/libxl/libxl_device.c b/tools/libxl/libxl_device.c index 1b796bd392..1f58a99a23 100644 --- a/tools/libxl/libxl_device.c +++ b/tools/libxl/libxl_device.c @@ -997,6 +997,13 @@ void libxl__initiate_device_generic_remove(libxl__egc *egc, goto out; } +/* if state_path is empty, assume backend is gone (backend domain + * shutdown?), cleanup frontend only; rc=0 */ +if (!state) { +LOG(WARN, "backend %s already removed, cleanup frontend only", be_path); +goto out; +} + rc = libxl__xs_write_checked(gc, t, online_path, "0"); if (rc) goto out; -- 2.13.6 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel