Re: [vdsm] Managing async tasks

2012-12-17 Thread Saggi Mizrahi


- Original Message -
 From: Adam Litke a...@us.ibm.com
 To: vdsm-devel@lists.fedorahosted.org
 Cc: Dan Kenigsberg dan...@redhat.com, Ayal Baron aba...@redhat.com, 
 Saggi Mizrahi smizr...@redhat.com,
 Federico Simoncelli fsimo...@redhat.com, engine-de...@ovirt.org
 Sent: Monday, December 17, 2012 12:00:49 PM
 Subject: Managing async tasks
 
 On today's vdsm call we had a lively discussion around how
 asynchronous
 operations should be handled in the future.  In an effort to include
 more people
 in the discussion and to better capture the resulting conversation I
 would like
 to continue that discussion here on the mailing list.
 
 A lot of ideas were thrown around about how 'tasks' should be handled
 in the
 future.  There are a lot of ways that it can be done.  To determine
 how we
 should implement it, it's probably best if we start with a set of
 requirements.
 If we can first agree on these, it should be easy to find a solution
 that meets
 them.  I'll take a stab at identifying a first set of POSSIBLE
 requirements:
 
 - Standardized method for determining the result of an operation
 
   This is a big one for me because it directly affects the
   consumability of the
   API.  If each verb has different semantics for discovering whether
   it has
   completed successfully, then the API will be nearly impossible to
   use easily.
Since there is no way to assure if of some tasks completed successfully or 
failed, especially around the murky waters of storage, I say this requirement 
should be removed.
At least not in the context of a task.
 
 
 Sorry.  That's my list :)  Hopefully others will be willing to add
 other
 requirements for consideration.
 
 From my understanding, task recovery (stop, abort, rollback, etc)
 will not be
 generally supported and should not be a requirement.
 
 
 
 --
 Adam Litke a...@us.ibm.com
 IBM Linux Technology Center
 
 
___
vdsm-devel mailing list
vdsm-devel@lists.fedorahosted.org
https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel


Re: [vdsm] Managing async tasks

2012-12-17 Thread Adam Litke
On Mon, Dec 17, 2012 at 12:15:08PM -0500, Saggi Mizrahi wrote:
 
 
 - Original Message -
  From: Adam Litke a...@us.ibm.com To: vdsm-devel@lists.fedorahosted.org
  Cc: Dan Kenigsberg dan...@redhat.com, Ayal Baron aba...@redhat.com,
  Saggi Mizrahi smizr...@redhat.com, Federico Simoncelli
  fsimo...@redhat.com, engine-de...@ovirt.org Sent: Monday, December 17,
  2012 12:00:49 PM Subject: Managing async tasks
  
  On today's vdsm call we had a lively discussion around how asynchronous
  operations should be handled in the future.  In an effort to include more
  people in the discussion and to better capture the resulting conversation I
  would like to continue that discussion here on the mailing list.
  
  A lot of ideas were thrown around about how 'tasks' should be handled in the
  future.  There are a lot of ways that it can be done.  To determine how we
  should implement it, it's probably best if we start with a set of
  requirements.  If we can first agree on these, it should be easy to find a
  solution that meets them.  I'll take a stab at identifying a first set of
  POSSIBLE requirements:
  
  - Standardized method for determining the result of an operation
  
This is a big one for me because it directly affects the consumability of
the API.  If each verb has different semantics for discovering whether it
has completed successfully, then the API will be nearly impossible to use
easily.
 Since there is no way to assure if of some tasks completed successfully or
 failed, especially around the murky waters of storage, I say this requirement
 should be removed.  At least not in the context of a task.

I don't agree.  Please feel free to convince me with some exampled.  If we
cannot provide feedback to a user as to whether their request has been satisfied
or not, then we have some bigger problems to solve.

  
  
  Sorry.  That's my list :)  Hopefully others will be willing to add other
  requirements for consideration.
  
  From my understanding, task recovery (stop, abort, rollback, etc) will not
  be generally supported and should not be a requirement.
  

-- 
Adam Litke a...@us.ibm.com
IBM Linux Technology Center

___
vdsm-devel mailing list
vdsm-devel@lists.fedorahosted.org
https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel


Re: [vdsm] Managing async tasks

2012-12-17 Thread Saggi Mizrahi


- Original Message -
 From: Adam Litke a...@us.ibm.com
 To: Saggi Mizrahi smizr...@redhat.com
 Cc: Dan Kenigsberg dan...@redhat.com, Ayal Baron aba...@redhat.com, 
 Federico Simoncelli
 fsimo...@redhat.com, engine-de...@ovirt.org, 
 vdsm-devel@lists.fedorahosted.org
 Sent: Monday, December 17, 2012 2:16:25 PM
 Subject: Re: Managing async tasks
 
 On Mon, Dec 17, 2012 at 12:15:08PM -0500, Saggi Mizrahi wrote:
  
  
  - Original Message -
   From: Adam Litke a...@us.ibm.com To:
   vdsm-devel@lists.fedorahosted.org
   Cc: Dan Kenigsberg dan...@redhat.com, Ayal Baron
   aba...@redhat.com,
   Saggi Mizrahi smizr...@redhat.com, Federico Simoncelli
   fsimo...@redhat.com, engine-de...@ovirt.org Sent: Monday,
   December 17,
   2012 12:00:49 PM Subject: Managing async tasks
   
   On today's vdsm call we had a lively discussion around how
   asynchronous
   operations should be handled in the future.  In an effort to
   include more
   people in the discussion and to better capture the resulting
   conversation I
   would like to continue that discussion here on the mailing list.
   
   A lot of ideas were thrown around about how 'tasks' should be
   handled in the
   future.  There are a lot of ways that it can be done.  To
   determine how we
   should implement it, it's probably best if we start with a set of
   requirements.  If we can first agree on these, it should be easy
   to find a
   solution that meets them.  I'll take a stab at identifying a
   first set of
   POSSIBLE requirements:
   
   - Standardized method for determining the result of an operation
   
 This is a big one for me because it directly affects the
 consumability of
 the API.  If each verb has different semantics for discovering
 whether it
 has completed successfully, then the API will be nearly
 impossible to use
 easily.
  Since there is no way to assure if of some tasks completed
  successfully or
  failed, especially around the murky waters of storage, I say this
  requirement
  should be removed.  At least not in the context of a task.
 
 I don't agree.  Please feel free to convince me with some exampled.
  If we
 cannot provide feedback to a user as to whether their request has
 been satisfied
 or not, then we have some bigger problems to solve.
If VDSM sends a write command to a storage server, and the connection hangs up 
before the ACK has returned.
The operation has been committed but VDSM has no way of knowing if that 
happened as far as VDSM is concerned it got an ETIMEO or EIO.
This is the same problem that the engine has with VDSM.
If VDSM creates an image\VM\network\repo but the connection hangs up before the 
response can be sent back as far as the engine is concerned the operation times 
out.
This is an inherent issue with clustering.
This is why I want to move away from tasks being *the* trackable objects.
Tasks should be short. As short as possible.
Run VM should just persist the VM information on the VDSM host and return. The 
rest of the tracking should be done using the VM ID.
Create image should return once VDSM persisted the information about the 
request on the repository and created the metadata files.
Tracking should be done on the repo or the imageId.
 
   
   
   Sorry.  That's my list :)  Hopefully others will be willing to
   add other
   requirements for consideration.
   
   From my understanding, task recovery (stop, abort, rollback, etc)
   will not
   be generally supported and should not be a requirement.
   
 
 --
 Adam Litke a...@us.ibm.com
 IBM Linux Technology Center
 
 
___
vdsm-devel mailing list
vdsm-devel@lists.fedorahosted.org
https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel


Re: [vdsm] Managing async tasks

2012-12-17 Thread Saggi Mizrahi
This is an addendum to my previous email.

- Original Message -
 From: Saggi Mizrahi smizr...@redhat.com
 To: Adam Litke a...@us.ibm.com
 Cc: Dan Kenigsberg dan...@redhat.com, Ayal Baron aba...@redhat.com, 
 Federico Simoncelli
 fsimo...@redhat.com, engine-de...@ovirt.org, 
 vdsm-devel@lists.fedorahosted.org
 Sent: Monday, December 17, 2012 2:52:06 PM
 Subject: Re: Managing async tasks
 
 
 
 - Original Message -
  From: Adam Litke a...@us.ibm.com
  To: Saggi Mizrahi smizr...@redhat.com
  Cc: Dan Kenigsberg dan...@redhat.com, Ayal Baron
  aba...@redhat.com, Federico Simoncelli
  fsimo...@redhat.com, engine-de...@ovirt.org,
  vdsm-devel@lists.fedorahosted.org
  Sent: Monday, December 17, 2012 2:16:25 PM
  Subject: Re: Managing async tasks
  
  On Mon, Dec 17, 2012 at 12:15:08PM -0500, Saggi Mizrahi wrote:
   
   
   - Original Message -
From: Adam Litke a...@us.ibm.com To:
vdsm-devel@lists.fedorahosted.org
Cc: Dan Kenigsberg dan...@redhat.com, Ayal Baron
aba...@redhat.com,
Saggi Mizrahi smizr...@redhat.com, Federico Simoncelli
fsimo...@redhat.com, engine-de...@ovirt.org Sent: Monday,
December 17,
2012 12:00:49 PM Subject: Managing async tasks

On today's vdsm call we had a lively discussion around how
asynchronous
operations should be handled in the future.  In an effort to
include more
people in the discussion and to better capture the resulting
conversation I
would like to continue that discussion here on the mailing
list.

A lot of ideas were thrown around about how 'tasks' should be
handled in the
future.  There are a lot of ways that it can be done.  To
determine how we
should implement it, it's probably best if we start with a set
of
requirements.  If we can first agree on these, it should be
easy
to find a
solution that meets them.  I'll take a stab at identifying a
first set of
POSSIBLE requirements:

- Standardized method for determining the result of an
operation

  This is a big one for me because it directly affects the
  consumability of
  the API.  If each verb has different semantics for
  discovering
  whether it
  has completed successfully, then the API will be nearly
  impossible to use
  easily.
   Since there is no way to assure if of some tasks completed
   successfully or
   failed, especially around the murky waters of storage, I say this
   requirement
   should be removed.  At least not in the context of a task.
  
  I don't agree.  Please feel free to convince me with some exampled.
   If we
  cannot provide feedback to a user as to whether their request has
  been satisfied
  or not, then we have some bigger problems to solve.
 If VDSM sends a write command to a storage server, and the connection
 hangs up before the ACK has returned.
 The operation has been committed but VDSM has no way of knowing if
 that happened as far as VDSM is concerned it got an ETIMEO or EIO.
 This is the same problem that the engine has with VDSM.
 If VDSM creates an image\VM\network\repo but the connection hangs up
 before the response can be sent back as far as the engine is
 concerned the operation times out.
 This is an inherent issue with clustering.
 This is why I want to move away from tasks being *the* trackable
 objects.
 Tasks should be short. As short as possible.
 Run VM should just persist the VM information on the VDSM host and
 return. The rest of the tracking should be done using the VM ID.
 Create image should return once VDSM persisted the information about
 the request on the repository and created the metadata files.
 Tracking should be done on the repo or the imageId.

The thing is that I know how long a VM object should live (or an Image object).
So tracking it is straight forward. How long a task should live is very 
problematic and quite context specific.
It depends on what the task is.
I think it's quite confusing from an API standpoint to have every task have a 
different scope, id requirement and life-cycle.

In VDSM has two types of APIs

CRUD objects - VM, Image, Repository, Bridge, Storage Connections
General transient methods - getBiosInfo(), getDeviceList()

The latter are quite simple to manage. They don't need any special handling. If 
you lost a getBiosInfo() call you just send another one, no harm done.
The same is even true with things that change the host like getDeviceList()

What we are really arguing about is fitting the CRUD objects to some generic 
task oriented scheme.
I'm saying it's a waste of time as you can quite easily have flows to recover 
from each operation.

Create - Check if the object exists
Read - Read again
Update - either update again or read and update if update didn't commit the 
first time
Delete - Check if object doesn't exist

Each of the objects we CRUD have different life-cycles and ownership semantics.

Danken raised the point that creation has 

Re: [vdsm] Managing async tasks

2012-12-17 Thread Ayal Baron


- Original Message -
 This is an addendum to my previous email.
 
 - Original Message -
  From: Saggi Mizrahi smizr...@redhat.com
  To: Adam Litke a...@us.ibm.com
  Cc: Dan Kenigsberg dan...@redhat.com, Ayal Baron
  aba...@redhat.com, Federico Simoncelli
  fsimo...@redhat.com, engine-de...@ovirt.org,
  vdsm-devel@lists.fedorahosted.org
  Sent: Monday, December 17, 2012 2:52:06 PM
  Subject: Re: Managing async tasks
  
  
  
  - Original Message -
   From: Adam Litke a...@us.ibm.com
   To: Saggi Mizrahi smizr...@redhat.com
   Cc: Dan Kenigsberg dan...@redhat.com, Ayal Baron
   aba...@redhat.com, Federico Simoncelli
   fsimo...@redhat.com, engine-de...@ovirt.org,
   vdsm-devel@lists.fedorahosted.org
   Sent: Monday, December 17, 2012 2:16:25 PM
   Subject: Re: Managing async tasks
   
   On Mon, Dec 17, 2012 at 12:15:08PM -0500, Saggi Mizrahi wrote:


- Original Message -
 From: Adam Litke a...@us.ibm.com To:
 vdsm-devel@lists.fedorahosted.org
 Cc: Dan Kenigsberg dan...@redhat.com, Ayal Baron
 aba...@redhat.com,
 Saggi Mizrahi smizr...@redhat.com, Federico Simoncelli
 fsimo...@redhat.com, engine-de...@ovirt.org Sent: Monday,
 December 17,
 2012 12:00:49 PM Subject: Managing async tasks
 
 On today's vdsm call we had a lively discussion around how
 asynchronous
 operations should be handled in the future.  In an effort to
 include more
 people in the discussion and to better capture the resulting
 conversation I
 would like to continue that discussion here on the mailing
 list.
 
 A lot of ideas were thrown around about how 'tasks' should be
 handled in the
 future.  There are a lot of ways that it can be done.  To
 determine how we
 should implement it, it's probably best if we start with a
 set
 of
 requirements.  If we can first agree on these, it should be
 easy
 to find a
 solution that meets them.  I'll take a stab at identifying a
 first set of
 POSSIBLE requirements:
 
 - Standardized method for determining the result of an
 operation
 
   This is a big one for me because it directly affects the
   consumability of
   the API.  If each verb has different semantics for
   discovering
   whether it
   has completed successfully, then the API will be nearly
   impossible to use
   easily.
Since there is no way to assure if of some tasks completed
successfully or
failed, especially around the murky waters of storage, I say
this
requirement
should be removed.  At least not in the context of a task.
   
   I don't agree.  Please feel free to convince me with some
   exampled.
If we
   cannot provide feedback to a user as to whether their request has
   been satisfied
   or not, then we have some bigger problems to solve.
  If VDSM sends a write command to a storage server, and the
  connection
  hangs up before the ACK has returned.
  The operation has been committed but VDSM has no way of knowing if
  that happened as far as VDSM is concerned it got an ETIMEO or EIO.
  This is the same problem that the engine has with VDSM.
  If VDSM creates an image\VM\network\repo but the connection hangs
  up
  before the response can be sent back as far as the engine is
  concerned the operation times out.
  This is an inherent issue with clustering.
  This is why I want to move away from tasks being *the* trackable
  objects.
  Tasks should be short. As short as possible.
  Run VM should just persist the VM information on the VDSM host and
  return. The rest of the tracking should be done using the VM ID.
  Create image should return once VDSM persisted the information
  about
  the request on the repository and created the metadata files.
  Tracking should be done on the repo or the imageId.
 
 The thing is that I know how long a VM object should live (or an
 Image object).
 So tracking it is straight forward. How long a task should live is
 very problematic and quite context specific.
 It depends on what the task is.
 I think it's quite confusing from an API standpoint to have every
 task have a different scope, id requirement and life-cycle.
 
 In VDSM has two types of APIs
 
 CRUD objects - VM, Image, Repository, Bridge, Storage Connections
 General transient methods - getBiosInfo(), getDeviceList()
 
 The latter are quite simple to manage. They don't need any special
 handling. If you lost a getBiosInfo() call you just send another
 one, no harm done.
 The same is even true with things that change the host like
 getDeviceList()
 
 What we are really arguing about is fitting the CRUD objects to some
 generic task oriented scheme.
 I'm saying it's a waste of time as you can quite easily have flows to
 recover from each operation.
 
 Create - Check if the object exists
 Read - Read again
 Update - either update again or read and update if update didn't
 commit the first 

Re: [vdsm] Managing async tasks

2012-12-17 Thread Adam Litke
On Mon, Dec 17, 2012 at 03:12:34PM -0500, Saggi Mizrahi wrote:
 This is an addendum to my previous email.
 
 - Original Message -
  From: Saggi Mizrahi smizr...@redhat.com
  To: Adam Litke a...@us.ibm.com
  Cc: Dan Kenigsberg dan...@redhat.com, Ayal Baron aba...@redhat.com, 
  Federico Simoncelli
  fsimo...@redhat.com, engine-de...@ovirt.org, 
  vdsm-devel@lists.fedorahosted.org
  Sent: Monday, December 17, 2012 2:52:06 PM
  Subject: Re: Managing async tasks
  
  
  
  - Original Message -
   From: Adam Litke a...@us.ibm.com
   To: Saggi Mizrahi smizr...@redhat.com
   Cc: Dan Kenigsberg dan...@redhat.com, Ayal Baron
   aba...@redhat.com, Federico Simoncelli
   fsimo...@redhat.com, engine-de...@ovirt.org,
   vdsm-devel@lists.fedorahosted.org
   Sent: Monday, December 17, 2012 2:16:25 PM
   Subject: Re: Managing async tasks
   
   On Mon, Dec 17, 2012 at 12:15:08PM -0500, Saggi Mizrahi wrote:


- Original Message -
 From: Adam Litke a...@us.ibm.com To:
 vdsm-devel@lists.fedorahosted.org
 Cc: Dan Kenigsberg dan...@redhat.com, Ayal Baron
 aba...@redhat.com,
 Saggi Mizrahi smizr...@redhat.com, Federico Simoncelli
 fsimo...@redhat.com, engine-de...@ovirt.org Sent: Monday,
 December 17,
 2012 12:00:49 PM Subject: Managing async tasks
 
 On today's vdsm call we had a lively discussion around how
 asynchronous
 operations should be handled in the future.  In an effort to
 include more
 people in the discussion and to better capture the resulting
 conversation I
 would like to continue that discussion here on the mailing
 list.
 
 A lot of ideas were thrown around about how 'tasks' should be
 handled in the
 future.  There are a lot of ways that it can be done.  To
 determine how we
 should implement it, it's probably best if we start with a set
 of
 requirements.  If we can first agree on these, it should be
 easy
 to find a
 solution that meets them.  I'll take a stab at identifying a
 first set of
 POSSIBLE requirements:
 
 - Standardized method for determining the result of an
 operation
 
   This is a big one for me because it directly affects the
   consumability of
   the API.  If each verb has different semantics for
   discovering
   whether it
   has completed successfully, then the API will be nearly
   impossible to use
   easily.
Since there is no way to assure if of some tasks completed
successfully or
failed, especially around the murky waters of storage, I say this
requirement
should be removed.  At least not in the context of a task.
   
   I don't agree.  Please feel free to convince me with some exampled.
If we
   cannot provide feedback to a user as to whether their request has
   been satisfied
   or not, then we have some bigger problems to solve.
  If VDSM sends a write command to a storage server, and the connection
  hangs up before the ACK has returned.
  The operation has been committed but VDSM has no way of knowing if
  that happened as far as VDSM is concerned it got an ETIMEO or EIO.
  This is the same problem that the engine has with VDSM.
  If VDSM creates an image\VM\network\repo but the connection hangs up
  before the response can be sent back as far as the engine is
  concerned the operation times out.
  This is an inherent issue with clustering.
  This is why I want to move away from tasks being *the* trackable
  objects.
  Tasks should be short. As short as possible.
  Run VM should just persist the VM information on the VDSM host and
  return. The rest of the tracking should be done using the VM ID.
  Create image should return once VDSM persisted the information about
  the request on the repository and created the metadata files.
  Tracking should be done on the repo or the imageId.
 
 The thing is that I know how long a VM object should live (or an Image 
 object).
 So tracking it is straight forward. How long a task should live is very 
 problematic and quite context specific.
 It depends on what the task is.
 I think it's quite confusing from an API standpoint to have every task have a 
 different scope, id requirement and life-cycle.
 
 In VDSM has two types of APIs
 
 CRUD objects - VM, Image, Repository, Bridge, Storage Connections
 General transient methods - getBiosInfo(), getDeviceList()
 
 The latter are quite simple to manage. They don't need any special handling. 
 If you lost a getBiosInfo() call you just send another one, no harm done.
 The same is even true with things that change the host like getDeviceList()
 
 What we are really arguing about is fitting the CRUD objects to some generic 
 task oriented scheme.
 I'm saying it's a waste of time as you can quite easily have flows to recover 
 from each operation.
 
 Create - Check if the object exists
 Read - Read again
 Update - either update again or read and update if update