Re: [Xen-devel] [RFC PATCH v2 00/29] libxl: Cancelling asynchronous operations

2015-04-09 Thread Ian Jackson
Euan Harris writes (Re: [RFC PATCH v2 00/29] libxl: Cancelling asynchronous 
operations):
 Yes, that would work, but an open loop approach like that can lead to
 frustratingly unreliable tests.   I think it would be best to make
 the test aware of the state of the helper - or even in control of it.
 That would allow us to wait for the helper to reach a particular state
 before killing it.

This is less bad than you might think because the helper's progress
messages to libxl are at fairly predictable progress points.

In any case, the helper (in general) runs concurrently with libxl, so
when libxl decides to stop the progress there will often be a race.
(Sometimes the helper has to stop and wait for libxl to confirm.)

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC PATCH v2 00/29] libxl: Cancelling asynchronous operations

2015-04-09 Thread Euan Harris
On Tue, Apr 07, 2015 at 06:19:52PM +0100, Ian Jackson wrote:
 On the contrary, I think many long-running operations, such as suspend
 and migrations, involve multiple iterations of the libxl event loop.
 Actual suspend/migrate is done in a helper process; the main process
 is responsible for progress report handling, coordination, etc.

Yes, that would work, but an open loop approach like that can lead to
frustratingly unreliable tests.   I think it would be best to make
the test aware of the state of the helper - or even in control of it.
That would allow us to wait for the helper to reach a particular state
before killing it.

Thanks,
Euan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC PATCH v2 00/29] libxl: Cancelling asynchronous operations

2015-04-07 Thread Euan Harris
Hi,

On Wed, Feb 18, 2015 at 04:10:35PM +, Euan Harris wrote:
 We had a chat about testing these changes, and integrating them into xenopsd.
 We agreed that we each had slightly different expectations of what we were 
 going to do, and when.   I think we came to the following major conclusions:
 
   - I will start work on a simple test framework for cancellation,
 hopefully to have first results in a fortnight or so.
   - Once the test framework is available you will fix whatever bugs it
 unearths, then we will rinse and repeat.
   - You will think some more about the possibility of adding cancellation
 to the xl command line tool, but since this is tricky there is no 
 expectation of when it might happen.

I think the most straightforward way to test the cancellation mechanism in
LibXL will be to adapt the way we test similar functionality in xenopsd:

   * define numbered 'cancellation points' at which cancellable operations
 can be cancelled
   * before testing a cancellable operation, pre-set the cancellation point
 at which cancellation should be attempted
   * when execution reaches the pre-set cancellation point, run the cancellation
 procedure

This approach alone will not allow us to test asynchronous cancellation in
the middle of long-running operations, such as writing a suspend image
to disk - that will require a way to synchronize the test program with
the long-running operation.

My first guess about how this might be done was:

   * add current cancellation point and a trigger point variables to the context
 struct
   * increment the counter and fire the cancellation logic in
 libxl__ao_cancellable_register()

In this way we could write a loop which iterated through all possible
cancellation points.   However you pointed out that we cannot call
libxl_ao_cancel() while holding the context lock, so this idea needs
some refinement.   One possibility would be to tell another thread to try
to do the cancellation immediately after we release the lock;  another
option, if we didn't want to write a multi-thread test driver,
would be to do the cancellation at the top of libxl's event loop.

I think this captures roughly what we talked about.   Please let me know
if I misunderstood or missed out any details.

Thanks,
Euan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC PATCH v2 00/29] libxl: Cancelling asynchronous operations

2015-04-07 Thread Ian Jackson
Ian Campbell writes (Re: [Xen-devel] [RFC PATCH v2 00/29] libxl: Cancelling 
asynchronous operations):
 On Tue, 2015-02-10 at 20:09 +, Ian Jackson wrote:
  This is v2 of my work-in-progress series to support cancellation of
  long-running libxl operations.
 [...]
  I wouldn't recommend testing it yet until I've at least smoke tested
  it to see that things still work if you don't cancel them.
 
 Would review of the series be useful and/or appreciated at this stage?

Review of the APIs, and general approach, would be very much
appreciated.  That's probably best done by looking at the tip and
diffing libxl[_internal].h.

 Perhaps the first half dozen or so look like preparatory cleanups which
 I could sensibly look at?

That would also be useful.

  Here's a list of the patches:
  
01/29  libxl: Further fix exit paths from libxl_device_events_handler
02/29  libxl: Comment cleanups
03/29  libxl: suspend: switch_logdirty_done takes rc
04/29  libxl: suspend: common suspend callbacks take rc
05/29  libxl: suspend: Return correct error from callbacks
06/29  libxl: Use libxl__xswait* in libxl__ao_device
07/29  libxl: xswait/devstate: Move xswait to before devstate
08/29  libxl: devstate: Use libxl__xswait*

These first eight are cleanups and could in principle go in right
away.

Thanks,
Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC PATCH v2 00/29] libxl: Cancelling asynchronous operations

2015-04-07 Thread Ian Jackson
Euan Harris writes (Re: [RFC PATCH v2 00/29] libxl: Cancelling asynchronous 
operations):
 On Wed, Feb 18, 2015 at 04:10:35PM +, Euan Harris wrote:
 I think the most straightforward way to test the cancellation mechanism in
 LibXL will be to adapt the way we test similar functionality in xenopsd:
 
* define numbered 'cancellation points' at which cancellable operations
  can be cancelled
* before testing a cancellable operation, pre-set the cancellation point
  at which cancellation should be attempted
* when execution reaches the pre-set cancellation point, run the 
 cancellation
  procedure

This seems likely to work.

 This approach alone will not allow us to test asynchronous cancellation in
 the middle of long-running operations, such as writing a suspend image
 to disk - that will require a way to synchronize the test program with
 the long-running operation.

On the contrary, I think many long-running operations, such as suspend
and migrations, involve multiple iterations of the libxl event loop.
Actual suspend/migrate is done in a helper process; the main process
is responsible for progress report handling, coordination, etc.

 My first guess about how this might be done was:
 
* add current cancellation point and a trigger point variables to the 
 context
  struct
* increment the counter and fire the cancellation logic in
  libxl__ao_cancellable_register()
 
 In this way we could write a loop which iterated through all possible
 cancellation points.   However you pointed out that we cannot call
 libxl_ao_cancel() while holding the context lock, so this idea needs
 some refinement.   One possibility would be to tell another thread to try
 to do the cancellation immediately after we release the lock;  another
 option, if we didn't want to write a multi-thread test driver,
 would be to do the cancellation at the top of libxl's event loop.

The relevant function for this latter approach is eventloop_iteration
in libxl_event.c.  This is used by libxl whenever the caller specifies
that a long-running operation is to be done synchronously (ao_how==0),
which is what xl does.

You might also consider whether to add a debug option for
afterpoll_internal to make it return after every callback (ie, after
the call to efd-func() and the call to time_occurs).  That would
allow you to inject cancellation in a slightly more fine-grained
manner.

 I think this captures roughly what we talked about.   Please let me know
 if I misunderstood or missed out any details.

I also mentioned that you counting invocations of
libxl__ao_cancellable_register is less than ideal because it is very
coarse-grained.

Regards,
Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC PATCH v2 00/29] libxl: Cancelling asynchronous operations

2015-03-20 Thread Euan Harris
On Tue, Mar 03, 2015 at 12:08:04PM +, Ian Campbell wrote:
  I wouldn't recommend testing it yet until I've at least smoke tested
  it to see that things still work if you don't cancel them.
 
 Would review of the series be useful and/or appreciated at this stage?
 
 Perhaps the first half dozen or so look like preparatory cleanups which
 I could sensibly look at?

Yes, that would be great.   I've read through the whole series fairly carefully,
and it looks sensible, but you will be better placed to see whether it fits well
with the rest of libxl.

Thanks,
Euan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC PATCH v2 00/29] libxl: Cancelling asynchronous operations

2015-03-03 Thread Ian Campbell
On Tue, 2015-02-10 at 20:09 +, Ian Jackson wrote:
 This is v2 of my work-in-progress series to support cancellation of
 long-running libxl operations.
[...]
 I wouldn't recommend testing it yet until I've at least smoke tested
 it to see that things still work if you don't cancel them.

Would review of the series be useful and/or appreciated at this stage?

Perhaps the first half dozen or so look like preparatory cleanups which
I could sensibly look at?
 
 Here's a list of the patches:
 
   01/29  libxl: Further fix exit paths from libxl_device_events_handler
   02/29  libxl: Comment cleanups
   03/29  libxl: suspend: switch_logdirty_done takes rc
   04/29  libxl: suspend: common suspend callbacks take rc
   05/29  libxl: suspend: Return correct error from callbacks
   06/29  libxl: Use libxl__xswait* in libxl__ao_device
   07/29  libxl: xswait/devstate: Move xswait to before devstate
   08/29  libxl: devstate: Use libxl__xswait*
   09/29  libxl: New error codes CANCELLED etc.
   10/29  libxl: events: Make timeout and async exec setup take an ao, not a gc
   11/29  libxl: events: Make libxl__async_exec_* pass caller an rc
   12/29  libxl: events: Permit timeouts to signal cancellation
   13/29  libxl: domain create: Do not destroy on cancellation
   14/29  libxl: ao: Record ultimate parent of a nested ao
   15/29  libxl: ao: Count the nested progeny of an ao
   16/29  libxl: ao: Provide manip_refcnt
   17/29  libxl: cancellation: Provide public ao cancellation API
   18/29  libxl: cancellation: Provide explicit internal cancel check API
   19/29  libxl: cancellation: Make timeouts cancellable
   20/29  libxl: cancellation: Note that driver domain task cannot be cancelled
   21/29  libxl: cancellation: Make spawns cancellable
   22/29  libxl: Introduce DOMAIN_DESTROYED error code
   23/29  libxl: cancellation: Support cancellation where we spot domain death
   24/29  libxl: Introduce FILLZERO
   25/29  libxl: cancellation: Preparations for save/restore cancellation
   26/29  libxl: cancellation: Handle SIGTERM in save/restore helper
   27/29  libxl: cancellation: Cancel libxc save/restore
   28/29  libxl: ao: datacopier callback gets an rc
   29/29  libxl: cancellation: Make datacopiers cancellable
 
 
 ___
 Xen-devel mailing list
 Xen-devel@lists.xen.org
 http://lists.xen.org/xen-devel



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC PATCH v2 00/29] libxl: Cancelling asynchronous operations

2015-02-18 Thread Ian Jackson
Euan Harris writes (Re: [RFC PATCH v2 00/29] libxl: Cancelling asynchronous 
operations):
 On Tue, Feb 10, 2015 at 08:09:47PM +, Ian Jackson wrote:
  I have rebased this onto current staging.  I have compiled it but
  NOT EXECUTED IT AT ALL.  Euan, I thought it would be useful to give
  you something you could start to work on building against.
  
  I wouldn't recommend testing it yet until I've at least smoke tested
  it to see that things still work if you don't cancel them.
 
 We had a chat about testing these changes, and integrating them into xenopsd.
 We agreed that we each had slightly different expectations of what we were 
 going to do, and when.   I think we came to the following major conclusions:
 
   - I will start work on a simple test framework for cancellation,
 hopefully to have first results in a fortnight or so.
   - Once the test framework is available you will fix whatever bugs it
 unearths, then we will rinse and repeat.
   - You will think some more about the possibility of adding cancellation
 to the xl command line tool, but since this is tricky there is no 
 expectation of when it might happen.
 
 In the slightly longer term, we expect:
 
   - More testing and integration effort from Xapi project members in March
 or April.
   - Investigation of the idea of a xenopsd-based push gate, similar to the 
 current libvirt push gate.
 
 Have I got the main points right, or forgotten anything important?   

That seems about right, thanks.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC PATCH v2 00/29] libxl: Cancelling asynchronous operations

2015-02-18 Thread Euan Harris
Hi,

On Tue, Feb 10, 2015 at 08:09:47PM +, Ian Jackson wrote:
 I have rebased this onto current staging.  I have compiled it but
 NOT EXECUTED IT AT ALL.  Euan, I thought it would be useful to give
 you something you could start to work on building against.
 
 I wouldn't recommend testing it yet until I've at least smoke tested
 it to see that things still work if you don't cancel them.

We had a chat about testing these changes, and integrating them into xenopsd.
We agreed that we each had slightly different expectations of what we were 
going to do, and when.   I think we came to the following major conclusions:

  - I will start work on a simple test framework for cancellation,
hopefully to have first results in a fortnight or so.
  - Once the test framework is available you will fix whatever bugs it
unearths, then we will rinse and repeat.
  - You will think some more about the possibility of adding cancellation
to the xl command line tool, but since this is tricky there is no 
expectation of when it might happen.

In the slightly longer term, we expect:

  - More testing and integration effort from Xapi project members in March
or April.
  - Investigation of the idea of a xenopsd-based push gate, similar to the 
current libvirt push gate.

Have I got the main points right, or forgotten anything important?   

Thanks,
Euan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [RFC PATCH v2 00/29] libxl: Cancelling asynchronous operations

2015-02-10 Thread Ian Jackson
This is v2 of my work-in-progress series to support cancellation of
long-running libxl operations.

There are many improvements since v1, but the basic structure remains
the same and the external API remains unchanged.

I have rebased this onto current staging.  I have compiled it but
NOT EXECUTED IT AT ALL.  Euan, I thought it would be useful to give
you something you could start to work on building against.

I wouldn't recommend testing it yet until I've at least smoke tested
it to see that things still work if you don't cancel them.

Here's a list of the patches:

  01/29  libxl: Further fix exit paths from libxl_device_events_handler
  02/29  libxl: Comment cleanups
  03/29  libxl: suspend: switch_logdirty_done takes rc
  04/29  libxl: suspend: common suspend callbacks take rc
  05/29  libxl: suspend: Return correct error from callbacks
  06/29  libxl: Use libxl__xswait* in libxl__ao_device
  07/29  libxl: xswait/devstate: Move xswait to before devstate
  08/29  libxl: devstate: Use libxl__xswait*
  09/29  libxl: New error codes CANCELLED etc.
  10/29  libxl: events: Make timeout and async exec setup take an ao, not a gc
  11/29  libxl: events: Make libxl__async_exec_* pass caller an rc
  12/29  libxl: events: Permit timeouts to signal cancellation
  13/29  libxl: domain create: Do not destroy on cancellation
  14/29  libxl: ao: Record ultimate parent of a nested ao
  15/29  libxl: ao: Count the nested progeny of an ao
  16/29  libxl: ao: Provide manip_refcnt
  17/29  libxl: cancellation: Provide public ao cancellation API
  18/29  libxl: cancellation: Provide explicit internal cancel check API
  19/29  libxl: cancellation: Make timeouts cancellable
  20/29  libxl: cancellation: Note that driver domain task cannot be cancelled
  21/29  libxl: cancellation: Make spawns cancellable
  22/29  libxl: Introduce DOMAIN_DESTROYED error code
  23/29  libxl: cancellation: Support cancellation where we spot domain death
  24/29  libxl: Introduce FILLZERO
  25/29  libxl: cancellation: Preparations for save/restore cancellation
  26/29  libxl: cancellation: Handle SIGTERM in save/restore helper
  27/29  libxl: cancellation: Cancel libxc save/restore
  28/29  libxl: ao: datacopier callback gets an rc
  29/29  libxl: cancellation: Make datacopiers cancellable


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC PATCH v2 00/29] libxl: Cancelling asynchronous operations

2015-02-10 Thread Ian Jackson
Ian Jackson writes ([RFC PATCH v2 00/29] libxl: Cancelling asynchronous 
operations):
 This is v2 of my work-in-progress series to support cancellation of
 long-running libxl operations.
 
 There are many improvements since v1, but the basic structure remains
 the same and the external API remains unchanged.
 
 I have rebased this onto current staging.  I have compiled it but
 NOT EXECUTED IT AT ALL.  Euan, I thought it would be useful to give
 you something you could start to work on building against.
 
 I wouldn't recommend testing it yet until I've at least smoke tested
 it to see that things still work if you don't cancel them.
 
 Here's a list of the patches:

These are also here

 http://xenbits.xen.org/gitweb/?p=people/iwj/xen.git;a=summary
 git://xenbits.xen.org/people/iwj/xen.git

 base.ao-cancel.v2..wip.ao-cancel.v2..

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel