Re: [Xen-devel] [RFC PATCH v2 00/29] libxl: Cancelling asynchronous operations
Euan Harris writes (Re: [RFC PATCH v2 00/29] libxl: Cancelling asynchronous operations): Yes, that would work, but an open loop approach like that can lead to frustratingly unreliable tests. I think it would be best to make the test aware of the state of the helper - or even in control of it. That would allow us to wait for the helper to reach a particular state before killing it. This is less bad than you might think because the helper's progress messages to libxl are at fairly predictable progress points. In any case, the helper (in general) runs concurrently with libxl, so when libxl decides to stop the progress there will often be a race. (Sometimes the helper has to stop and wait for libxl to confirm.) Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [RFC PATCH v2 00/29] libxl: Cancelling asynchronous operations
On Tue, Apr 07, 2015 at 06:19:52PM +0100, Ian Jackson wrote: On the contrary, I think many long-running operations, such as suspend and migrations, involve multiple iterations of the libxl event loop. Actual suspend/migrate is done in a helper process; the main process is responsible for progress report handling, coordination, etc. Yes, that would work, but an open loop approach like that can lead to frustratingly unreliable tests. I think it would be best to make the test aware of the state of the helper - or even in control of it. That would allow us to wait for the helper to reach a particular state before killing it. Thanks, Euan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [RFC PATCH v2 00/29] libxl: Cancelling asynchronous operations
Hi, On Wed, Feb 18, 2015 at 04:10:35PM +, Euan Harris wrote: We had a chat about testing these changes, and integrating them into xenopsd. We agreed that we each had slightly different expectations of what we were going to do, and when. I think we came to the following major conclusions: - I will start work on a simple test framework for cancellation, hopefully to have first results in a fortnight or so. - Once the test framework is available you will fix whatever bugs it unearths, then we will rinse and repeat. - You will think some more about the possibility of adding cancellation to the xl command line tool, but since this is tricky there is no expectation of when it might happen. I think the most straightforward way to test the cancellation mechanism in LibXL will be to adapt the way we test similar functionality in xenopsd: * define numbered 'cancellation points' at which cancellable operations can be cancelled * before testing a cancellable operation, pre-set the cancellation point at which cancellation should be attempted * when execution reaches the pre-set cancellation point, run the cancellation procedure This approach alone will not allow us to test asynchronous cancellation in the middle of long-running operations, such as writing a suspend image to disk - that will require a way to synchronize the test program with the long-running operation. My first guess about how this might be done was: * add current cancellation point and a trigger point variables to the context struct * increment the counter and fire the cancellation logic in libxl__ao_cancellable_register() In this way we could write a loop which iterated through all possible cancellation points. However you pointed out that we cannot call libxl_ao_cancel() while holding the context lock, so this idea needs some refinement. One possibility would be to tell another thread to try to do the cancellation immediately after we release the lock; another option, if we didn't want to write a multi-thread test driver, would be to do the cancellation at the top of libxl's event loop. I think this captures roughly what we talked about. Please let me know if I misunderstood or missed out any details. Thanks, Euan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [RFC PATCH v2 00/29] libxl: Cancelling asynchronous operations
Ian Campbell writes (Re: [Xen-devel] [RFC PATCH v2 00/29] libxl: Cancelling asynchronous operations): On Tue, 2015-02-10 at 20:09 +, Ian Jackson wrote: This is v2 of my work-in-progress series to support cancellation of long-running libxl operations. [...] I wouldn't recommend testing it yet until I've at least smoke tested it to see that things still work if you don't cancel them. Would review of the series be useful and/or appreciated at this stage? Review of the APIs, and general approach, would be very much appreciated. That's probably best done by looking at the tip and diffing libxl[_internal].h. Perhaps the first half dozen or so look like preparatory cleanups which I could sensibly look at? That would also be useful. Here's a list of the patches: 01/29 libxl: Further fix exit paths from libxl_device_events_handler 02/29 libxl: Comment cleanups 03/29 libxl: suspend: switch_logdirty_done takes rc 04/29 libxl: suspend: common suspend callbacks take rc 05/29 libxl: suspend: Return correct error from callbacks 06/29 libxl: Use libxl__xswait* in libxl__ao_device 07/29 libxl: xswait/devstate: Move xswait to before devstate 08/29 libxl: devstate: Use libxl__xswait* These first eight are cleanups and could in principle go in right away. Thanks, Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [RFC PATCH v2 00/29] libxl: Cancelling asynchronous operations
Euan Harris writes (Re: [RFC PATCH v2 00/29] libxl: Cancelling asynchronous operations): On Wed, Feb 18, 2015 at 04:10:35PM +, Euan Harris wrote: I think the most straightforward way to test the cancellation mechanism in LibXL will be to adapt the way we test similar functionality in xenopsd: * define numbered 'cancellation points' at which cancellable operations can be cancelled * before testing a cancellable operation, pre-set the cancellation point at which cancellation should be attempted * when execution reaches the pre-set cancellation point, run the cancellation procedure This seems likely to work. This approach alone will not allow us to test asynchronous cancellation in the middle of long-running operations, such as writing a suspend image to disk - that will require a way to synchronize the test program with the long-running operation. On the contrary, I think many long-running operations, such as suspend and migrations, involve multiple iterations of the libxl event loop. Actual suspend/migrate is done in a helper process; the main process is responsible for progress report handling, coordination, etc. My first guess about how this might be done was: * add current cancellation point and a trigger point variables to the context struct * increment the counter and fire the cancellation logic in libxl__ao_cancellable_register() In this way we could write a loop which iterated through all possible cancellation points. However you pointed out that we cannot call libxl_ao_cancel() while holding the context lock, so this idea needs some refinement. One possibility would be to tell another thread to try to do the cancellation immediately after we release the lock; another option, if we didn't want to write a multi-thread test driver, would be to do the cancellation at the top of libxl's event loop. The relevant function for this latter approach is eventloop_iteration in libxl_event.c. This is used by libxl whenever the caller specifies that a long-running operation is to be done synchronously (ao_how==0), which is what xl does. You might also consider whether to add a debug option for afterpoll_internal to make it return after every callback (ie, after the call to efd-func() and the call to time_occurs). That would allow you to inject cancellation in a slightly more fine-grained manner. I think this captures roughly what we talked about. Please let me know if I misunderstood or missed out any details. I also mentioned that you counting invocations of libxl__ao_cancellable_register is less than ideal because it is very coarse-grained. Regards, Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [RFC PATCH v2 00/29] libxl: Cancelling asynchronous operations
On Tue, Mar 03, 2015 at 12:08:04PM +, Ian Campbell wrote: I wouldn't recommend testing it yet until I've at least smoke tested it to see that things still work if you don't cancel them. Would review of the series be useful and/or appreciated at this stage? Perhaps the first half dozen or so look like preparatory cleanups which I could sensibly look at? Yes, that would be great. I've read through the whole series fairly carefully, and it looks sensible, but you will be better placed to see whether it fits well with the rest of libxl. Thanks, Euan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [RFC PATCH v2 00/29] libxl: Cancelling asynchronous operations
On Tue, 2015-02-10 at 20:09 +, Ian Jackson wrote: This is v2 of my work-in-progress series to support cancellation of long-running libxl operations. [...] I wouldn't recommend testing it yet until I've at least smoke tested it to see that things still work if you don't cancel them. Would review of the series be useful and/or appreciated at this stage? Perhaps the first half dozen or so look like preparatory cleanups which I could sensibly look at? Here's a list of the patches: 01/29 libxl: Further fix exit paths from libxl_device_events_handler 02/29 libxl: Comment cleanups 03/29 libxl: suspend: switch_logdirty_done takes rc 04/29 libxl: suspend: common suspend callbacks take rc 05/29 libxl: suspend: Return correct error from callbacks 06/29 libxl: Use libxl__xswait* in libxl__ao_device 07/29 libxl: xswait/devstate: Move xswait to before devstate 08/29 libxl: devstate: Use libxl__xswait* 09/29 libxl: New error codes CANCELLED etc. 10/29 libxl: events: Make timeout and async exec setup take an ao, not a gc 11/29 libxl: events: Make libxl__async_exec_* pass caller an rc 12/29 libxl: events: Permit timeouts to signal cancellation 13/29 libxl: domain create: Do not destroy on cancellation 14/29 libxl: ao: Record ultimate parent of a nested ao 15/29 libxl: ao: Count the nested progeny of an ao 16/29 libxl: ao: Provide manip_refcnt 17/29 libxl: cancellation: Provide public ao cancellation API 18/29 libxl: cancellation: Provide explicit internal cancel check API 19/29 libxl: cancellation: Make timeouts cancellable 20/29 libxl: cancellation: Note that driver domain task cannot be cancelled 21/29 libxl: cancellation: Make spawns cancellable 22/29 libxl: Introduce DOMAIN_DESTROYED error code 23/29 libxl: cancellation: Support cancellation where we spot domain death 24/29 libxl: Introduce FILLZERO 25/29 libxl: cancellation: Preparations for save/restore cancellation 26/29 libxl: cancellation: Handle SIGTERM in save/restore helper 27/29 libxl: cancellation: Cancel libxc save/restore 28/29 libxl: ao: datacopier callback gets an rc 29/29 libxl: cancellation: Make datacopiers cancellable ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [RFC PATCH v2 00/29] libxl: Cancelling asynchronous operations
Euan Harris writes (Re: [RFC PATCH v2 00/29] libxl: Cancelling asynchronous operations): On Tue, Feb 10, 2015 at 08:09:47PM +, Ian Jackson wrote: I have rebased this onto current staging. I have compiled it but NOT EXECUTED IT AT ALL. Euan, I thought it would be useful to give you something you could start to work on building against. I wouldn't recommend testing it yet until I've at least smoke tested it to see that things still work if you don't cancel them. We had a chat about testing these changes, and integrating them into xenopsd. We agreed that we each had slightly different expectations of what we were going to do, and when. I think we came to the following major conclusions: - I will start work on a simple test framework for cancellation, hopefully to have first results in a fortnight or so. - Once the test framework is available you will fix whatever bugs it unearths, then we will rinse and repeat. - You will think some more about the possibility of adding cancellation to the xl command line tool, but since this is tricky there is no expectation of when it might happen. In the slightly longer term, we expect: - More testing and integration effort from Xapi project members in March or April. - Investigation of the idea of a xenopsd-based push gate, similar to the current libvirt push gate. Have I got the main points right, or forgotten anything important? That seems about right, thanks. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [RFC PATCH v2 00/29] libxl: Cancelling asynchronous operations
Hi, On Tue, Feb 10, 2015 at 08:09:47PM +, Ian Jackson wrote: I have rebased this onto current staging. I have compiled it but NOT EXECUTED IT AT ALL. Euan, I thought it would be useful to give you something you could start to work on building against. I wouldn't recommend testing it yet until I've at least smoke tested it to see that things still work if you don't cancel them. We had a chat about testing these changes, and integrating them into xenopsd. We agreed that we each had slightly different expectations of what we were going to do, and when. I think we came to the following major conclusions: - I will start work on a simple test framework for cancellation, hopefully to have first results in a fortnight or so. - Once the test framework is available you will fix whatever bugs it unearths, then we will rinse and repeat. - You will think some more about the possibility of adding cancellation to the xl command line tool, but since this is tricky there is no expectation of when it might happen. In the slightly longer term, we expect: - More testing and integration effort from Xapi project members in March or April. - Investigation of the idea of a xenopsd-based push gate, similar to the current libvirt push gate. Have I got the main points right, or forgotten anything important? Thanks, Euan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [RFC PATCH v2 00/29] libxl: Cancelling asynchronous operations
This is v2 of my work-in-progress series to support cancellation of long-running libxl operations. There are many improvements since v1, but the basic structure remains the same and the external API remains unchanged. I have rebased this onto current staging. I have compiled it but NOT EXECUTED IT AT ALL. Euan, I thought it would be useful to give you something you could start to work on building against. I wouldn't recommend testing it yet until I've at least smoke tested it to see that things still work if you don't cancel them. Here's a list of the patches: 01/29 libxl: Further fix exit paths from libxl_device_events_handler 02/29 libxl: Comment cleanups 03/29 libxl: suspend: switch_logdirty_done takes rc 04/29 libxl: suspend: common suspend callbacks take rc 05/29 libxl: suspend: Return correct error from callbacks 06/29 libxl: Use libxl__xswait* in libxl__ao_device 07/29 libxl: xswait/devstate: Move xswait to before devstate 08/29 libxl: devstate: Use libxl__xswait* 09/29 libxl: New error codes CANCELLED etc. 10/29 libxl: events: Make timeout and async exec setup take an ao, not a gc 11/29 libxl: events: Make libxl__async_exec_* pass caller an rc 12/29 libxl: events: Permit timeouts to signal cancellation 13/29 libxl: domain create: Do not destroy on cancellation 14/29 libxl: ao: Record ultimate parent of a nested ao 15/29 libxl: ao: Count the nested progeny of an ao 16/29 libxl: ao: Provide manip_refcnt 17/29 libxl: cancellation: Provide public ao cancellation API 18/29 libxl: cancellation: Provide explicit internal cancel check API 19/29 libxl: cancellation: Make timeouts cancellable 20/29 libxl: cancellation: Note that driver domain task cannot be cancelled 21/29 libxl: cancellation: Make spawns cancellable 22/29 libxl: Introduce DOMAIN_DESTROYED error code 23/29 libxl: cancellation: Support cancellation where we spot domain death 24/29 libxl: Introduce FILLZERO 25/29 libxl: cancellation: Preparations for save/restore cancellation 26/29 libxl: cancellation: Handle SIGTERM in save/restore helper 27/29 libxl: cancellation: Cancel libxc save/restore 28/29 libxl: ao: datacopier callback gets an rc 29/29 libxl: cancellation: Make datacopiers cancellable ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [RFC PATCH v2 00/29] libxl: Cancelling asynchronous operations
Ian Jackson writes ([RFC PATCH v2 00/29] libxl: Cancelling asynchronous operations): This is v2 of my work-in-progress series to support cancellation of long-running libxl operations. There are many improvements since v1, but the basic structure remains the same and the external API remains unchanged. I have rebased this onto current staging. I have compiled it but NOT EXECUTED IT AT ALL. Euan, I thought it would be useful to give you something you could start to work on building against. I wouldn't recommend testing it yet until I've at least smoke tested it to see that things still work if you don't cancel them. Here's a list of the patches: These are also here http://xenbits.xen.org/gitweb/?p=people/iwj/xen.git;a=summary git://xenbits.xen.org/people/iwj/xen.git base.ao-cancel.v2..wip.ao-cancel.v2.. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel