Re: [vdsm] Managing async tasks
- Original Message - From: Adam Litke a...@us.ibm.com To: vdsm-devel@lists.fedorahosted.org Cc: Dan Kenigsberg dan...@redhat.com, Ayal Baron aba...@redhat.com, Saggi Mizrahi smizr...@redhat.com, Federico Simoncelli fsimo...@redhat.com, engine-de...@ovirt.org Sent: Monday, December 17, 2012 12:00:49 PM Subject: Managing async tasks On today's vdsm call we had a lively discussion around how asynchronous operations should be handled in the future. In an effort to include more people in the discussion and to better capture the resulting conversation I would like to continue that discussion here on the mailing list. A lot of ideas were thrown around about how 'tasks' should be handled in the future. There are a lot of ways that it can be done. To determine how we should implement it, it's probably best if we start with a set of requirements. If we can first agree on these, it should be easy to find a solution that meets them. I'll take a stab at identifying a first set of POSSIBLE requirements: - Standardized method for determining the result of an operation This is a big one for me because it directly affects the consumability of the API. If each verb has different semantics for discovering whether it has completed successfully, then the API will be nearly impossible to use easily. Since there is no way to assure if of some tasks completed successfully or failed, especially around the murky waters of storage, I say this requirement should be removed. At least not in the context of a task. Sorry. That's my list :) Hopefully others will be willing to add other requirements for consideration. From my understanding, task recovery (stop, abort, rollback, etc) will not be generally supported and should not be a requirement. -- Adam Litke a...@us.ibm.com IBM Linux Technology Center ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel
Re: [vdsm] Managing async tasks
On Mon, Dec 17, 2012 at 12:15:08PM -0500, Saggi Mizrahi wrote: - Original Message - From: Adam Litke a...@us.ibm.com To: vdsm-devel@lists.fedorahosted.org Cc: Dan Kenigsberg dan...@redhat.com, Ayal Baron aba...@redhat.com, Saggi Mizrahi smizr...@redhat.com, Federico Simoncelli fsimo...@redhat.com, engine-de...@ovirt.org Sent: Monday, December 17, 2012 12:00:49 PM Subject: Managing async tasks On today's vdsm call we had a lively discussion around how asynchronous operations should be handled in the future. In an effort to include more people in the discussion and to better capture the resulting conversation I would like to continue that discussion here on the mailing list. A lot of ideas were thrown around about how 'tasks' should be handled in the future. There are a lot of ways that it can be done. To determine how we should implement it, it's probably best if we start with a set of requirements. If we can first agree on these, it should be easy to find a solution that meets them. I'll take a stab at identifying a first set of POSSIBLE requirements: - Standardized method for determining the result of an operation This is a big one for me because it directly affects the consumability of the API. If each verb has different semantics for discovering whether it has completed successfully, then the API will be nearly impossible to use easily. Since there is no way to assure if of some tasks completed successfully or failed, especially around the murky waters of storage, I say this requirement should be removed. At least not in the context of a task. I don't agree. Please feel free to convince me with some exampled. If we cannot provide feedback to a user as to whether their request has been satisfied or not, then we have some bigger problems to solve. Sorry. That's my list :) Hopefully others will be willing to add other requirements for consideration. From my understanding, task recovery (stop, abort, rollback, etc) will not be generally supported and should not be a requirement. -- Adam Litke a...@us.ibm.com IBM Linux Technology Center ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel
Re: [vdsm] Managing async tasks
- Original Message - From: Adam Litke a...@us.ibm.com To: Saggi Mizrahi smizr...@redhat.com Cc: Dan Kenigsberg dan...@redhat.com, Ayal Baron aba...@redhat.com, Federico Simoncelli fsimo...@redhat.com, engine-de...@ovirt.org, vdsm-devel@lists.fedorahosted.org Sent: Monday, December 17, 2012 2:16:25 PM Subject: Re: Managing async tasks On Mon, Dec 17, 2012 at 12:15:08PM -0500, Saggi Mizrahi wrote: - Original Message - From: Adam Litke a...@us.ibm.com To: vdsm-devel@lists.fedorahosted.org Cc: Dan Kenigsberg dan...@redhat.com, Ayal Baron aba...@redhat.com, Saggi Mizrahi smizr...@redhat.com, Federico Simoncelli fsimo...@redhat.com, engine-de...@ovirt.org Sent: Monday, December 17, 2012 12:00:49 PM Subject: Managing async tasks On today's vdsm call we had a lively discussion around how asynchronous operations should be handled in the future. In an effort to include more people in the discussion and to better capture the resulting conversation I would like to continue that discussion here on the mailing list. A lot of ideas were thrown around about how 'tasks' should be handled in the future. There are a lot of ways that it can be done. To determine how we should implement it, it's probably best if we start with a set of requirements. If we can first agree on these, it should be easy to find a solution that meets them. I'll take a stab at identifying a first set of POSSIBLE requirements: - Standardized method for determining the result of an operation This is a big one for me because it directly affects the consumability of the API. If each verb has different semantics for discovering whether it has completed successfully, then the API will be nearly impossible to use easily. Since there is no way to assure if of some tasks completed successfully or failed, especially around the murky waters of storage, I say this requirement should be removed. At least not in the context of a task. I don't agree. Please feel free to convince me with some exampled. If we cannot provide feedback to a user as to whether their request has been satisfied or not, then we have some bigger problems to solve. If VDSM sends a write command to a storage server, and the connection hangs up before the ACK has returned. The operation has been committed but VDSM has no way of knowing if that happened as far as VDSM is concerned it got an ETIMEO or EIO. This is the same problem that the engine has with VDSM. If VDSM creates an image\VM\network\repo but the connection hangs up before the response can be sent back as far as the engine is concerned the operation times out. This is an inherent issue with clustering. This is why I want to move away from tasks being *the* trackable objects. Tasks should be short. As short as possible. Run VM should just persist the VM information on the VDSM host and return. The rest of the tracking should be done using the VM ID. Create image should return once VDSM persisted the information about the request on the repository and created the metadata files. Tracking should be done on the repo or the imageId. Sorry. That's my list :) Hopefully others will be willing to add other requirements for consideration. From my understanding, task recovery (stop, abort, rollback, etc) will not be generally supported and should not be a requirement. -- Adam Litke a...@us.ibm.com IBM Linux Technology Center ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel
Re: [vdsm] Managing async tasks
This is an addendum to my previous email. - Original Message - From: Saggi Mizrahi smizr...@redhat.com To: Adam Litke a...@us.ibm.com Cc: Dan Kenigsberg dan...@redhat.com, Ayal Baron aba...@redhat.com, Federico Simoncelli fsimo...@redhat.com, engine-de...@ovirt.org, vdsm-devel@lists.fedorahosted.org Sent: Monday, December 17, 2012 2:52:06 PM Subject: Re: Managing async tasks - Original Message - From: Adam Litke a...@us.ibm.com To: Saggi Mizrahi smizr...@redhat.com Cc: Dan Kenigsberg dan...@redhat.com, Ayal Baron aba...@redhat.com, Federico Simoncelli fsimo...@redhat.com, engine-de...@ovirt.org, vdsm-devel@lists.fedorahosted.org Sent: Monday, December 17, 2012 2:16:25 PM Subject: Re: Managing async tasks On Mon, Dec 17, 2012 at 12:15:08PM -0500, Saggi Mizrahi wrote: - Original Message - From: Adam Litke a...@us.ibm.com To: vdsm-devel@lists.fedorahosted.org Cc: Dan Kenigsberg dan...@redhat.com, Ayal Baron aba...@redhat.com, Saggi Mizrahi smizr...@redhat.com, Federico Simoncelli fsimo...@redhat.com, engine-de...@ovirt.org Sent: Monday, December 17, 2012 12:00:49 PM Subject: Managing async tasks On today's vdsm call we had a lively discussion around how asynchronous operations should be handled in the future. In an effort to include more people in the discussion and to better capture the resulting conversation I would like to continue that discussion here on the mailing list. A lot of ideas were thrown around about how 'tasks' should be handled in the future. There are a lot of ways that it can be done. To determine how we should implement it, it's probably best if we start with a set of requirements. If we can first agree on these, it should be easy to find a solution that meets them. I'll take a stab at identifying a first set of POSSIBLE requirements: - Standardized method for determining the result of an operation This is a big one for me because it directly affects the consumability of the API. If each verb has different semantics for discovering whether it has completed successfully, then the API will be nearly impossible to use easily. Since there is no way to assure if of some tasks completed successfully or failed, especially around the murky waters of storage, I say this requirement should be removed. At least not in the context of a task. I don't agree. Please feel free to convince me with some exampled. If we cannot provide feedback to a user as to whether their request has been satisfied or not, then we have some bigger problems to solve. If VDSM sends a write command to a storage server, and the connection hangs up before the ACK has returned. The operation has been committed but VDSM has no way of knowing if that happened as far as VDSM is concerned it got an ETIMEO or EIO. This is the same problem that the engine has with VDSM. If VDSM creates an image\VM\network\repo but the connection hangs up before the response can be sent back as far as the engine is concerned the operation times out. This is an inherent issue with clustering. This is why I want to move away from tasks being *the* trackable objects. Tasks should be short. As short as possible. Run VM should just persist the VM information on the VDSM host and return. The rest of the tracking should be done using the VM ID. Create image should return once VDSM persisted the information about the request on the repository and created the metadata files. Tracking should be done on the repo or the imageId. The thing is that I know how long a VM object should live (or an Image object). So tracking it is straight forward. How long a task should live is very problematic and quite context specific. It depends on what the task is. I think it's quite confusing from an API standpoint to have every task have a different scope, id requirement and life-cycle. In VDSM has two types of APIs CRUD objects - VM, Image, Repository, Bridge, Storage Connections General transient methods - getBiosInfo(), getDeviceList() The latter are quite simple to manage. They don't need any special handling. If you lost a getBiosInfo() call you just send another one, no harm done. The same is even true with things that change the host like getDeviceList() What we are really arguing about is fitting the CRUD objects to some generic task oriented scheme. I'm saying it's a waste of time as you can quite easily have flows to recover from each operation. Create - Check if the object exists Read - Read again Update - either update again or read and update if update didn't commit the first time Delete - Check if object doesn't exist Each of the objects we CRUD have different life-cycles and ownership semantics. Danken raised the point that creation has
Re: [vdsm] Managing async tasks
- Original Message - This is an addendum to my previous email. - Original Message - From: Saggi Mizrahi smizr...@redhat.com To: Adam Litke a...@us.ibm.com Cc: Dan Kenigsberg dan...@redhat.com, Ayal Baron aba...@redhat.com, Federico Simoncelli fsimo...@redhat.com, engine-de...@ovirt.org, vdsm-devel@lists.fedorahosted.org Sent: Monday, December 17, 2012 2:52:06 PM Subject: Re: Managing async tasks - Original Message - From: Adam Litke a...@us.ibm.com To: Saggi Mizrahi smizr...@redhat.com Cc: Dan Kenigsberg dan...@redhat.com, Ayal Baron aba...@redhat.com, Federico Simoncelli fsimo...@redhat.com, engine-de...@ovirt.org, vdsm-devel@lists.fedorahosted.org Sent: Monday, December 17, 2012 2:16:25 PM Subject: Re: Managing async tasks On Mon, Dec 17, 2012 at 12:15:08PM -0500, Saggi Mizrahi wrote: - Original Message - From: Adam Litke a...@us.ibm.com To: vdsm-devel@lists.fedorahosted.org Cc: Dan Kenigsberg dan...@redhat.com, Ayal Baron aba...@redhat.com, Saggi Mizrahi smizr...@redhat.com, Federico Simoncelli fsimo...@redhat.com, engine-de...@ovirt.org Sent: Monday, December 17, 2012 12:00:49 PM Subject: Managing async tasks On today's vdsm call we had a lively discussion around how asynchronous operations should be handled in the future. In an effort to include more people in the discussion and to better capture the resulting conversation I would like to continue that discussion here on the mailing list. A lot of ideas were thrown around about how 'tasks' should be handled in the future. There are a lot of ways that it can be done. To determine how we should implement it, it's probably best if we start with a set of requirements. If we can first agree on these, it should be easy to find a solution that meets them. I'll take a stab at identifying a first set of POSSIBLE requirements: - Standardized method for determining the result of an operation This is a big one for me because it directly affects the consumability of the API. If each verb has different semantics for discovering whether it has completed successfully, then the API will be nearly impossible to use easily. Since there is no way to assure if of some tasks completed successfully or failed, especially around the murky waters of storage, I say this requirement should be removed. At least not in the context of a task. I don't agree. Please feel free to convince me with some exampled. If we cannot provide feedback to a user as to whether their request has been satisfied or not, then we have some bigger problems to solve. If VDSM sends a write command to a storage server, and the connection hangs up before the ACK has returned. The operation has been committed but VDSM has no way of knowing if that happened as far as VDSM is concerned it got an ETIMEO or EIO. This is the same problem that the engine has with VDSM. If VDSM creates an image\VM\network\repo but the connection hangs up before the response can be sent back as far as the engine is concerned the operation times out. This is an inherent issue with clustering. This is why I want to move away from tasks being *the* trackable objects. Tasks should be short. As short as possible. Run VM should just persist the VM information on the VDSM host and return. The rest of the tracking should be done using the VM ID. Create image should return once VDSM persisted the information about the request on the repository and created the metadata files. Tracking should be done on the repo or the imageId. The thing is that I know how long a VM object should live (or an Image object). So tracking it is straight forward. How long a task should live is very problematic and quite context specific. It depends on what the task is. I think it's quite confusing from an API standpoint to have every task have a different scope, id requirement and life-cycle. In VDSM has two types of APIs CRUD objects - VM, Image, Repository, Bridge, Storage Connections General transient methods - getBiosInfo(), getDeviceList() The latter are quite simple to manage. They don't need any special handling. If you lost a getBiosInfo() call you just send another one, no harm done. The same is even true with things that change the host like getDeviceList() What we are really arguing about is fitting the CRUD objects to some generic task oriented scheme. I'm saying it's a waste of time as you can quite easily have flows to recover from each operation. Create - Check if the object exists Read - Read again Update - either update again or read and update if update didn't commit the first
Re: [vdsm] Managing async tasks
On Mon, Dec 17, 2012 at 03:12:34PM -0500, Saggi Mizrahi wrote: This is an addendum to my previous email. - Original Message - From: Saggi Mizrahi smizr...@redhat.com To: Adam Litke a...@us.ibm.com Cc: Dan Kenigsberg dan...@redhat.com, Ayal Baron aba...@redhat.com, Federico Simoncelli fsimo...@redhat.com, engine-de...@ovirt.org, vdsm-devel@lists.fedorahosted.org Sent: Monday, December 17, 2012 2:52:06 PM Subject: Re: Managing async tasks - Original Message - From: Adam Litke a...@us.ibm.com To: Saggi Mizrahi smizr...@redhat.com Cc: Dan Kenigsberg dan...@redhat.com, Ayal Baron aba...@redhat.com, Federico Simoncelli fsimo...@redhat.com, engine-de...@ovirt.org, vdsm-devel@lists.fedorahosted.org Sent: Monday, December 17, 2012 2:16:25 PM Subject: Re: Managing async tasks On Mon, Dec 17, 2012 at 12:15:08PM -0500, Saggi Mizrahi wrote: - Original Message - From: Adam Litke a...@us.ibm.com To: vdsm-devel@lists.fedorahosted.org Cc: Dan Kenigsberg dan...@redhat.com, Ayal Baron aba...@redhat.com, Saggi Mizrahi smizr...@redhat.com, Federico Simoncelli fsimo...@redhat.com, engine-de...@ovirt.org Sent: Monday, December 17, 2012 12:00:49 PM Subject: Managing async tasks On today's vdsm call we had a lively discussion around how asynchronous operations should be handled in the future. In an effort to include more people in the discussion and to better capture the resulting conversation I would like to continue that discussion here on the mailing list. A lot of ideas were thrown around about how 'tasks' should be handled in the future. There are a lot of ways that it can be done. To determine how we should implement it, it's probably best if we start with a set of requirements. If we can first agree on these, it should be easy to find a solution that meets them. I'll take a stab at identifying a first set of POSSIBLE requirements: - Standardized method for determining the result of an operation This is a big one for me because it directly affects the consumability of the API. If each verb has different semantics for discovering whether it has completed successfully, then the API will be nearly impossible to use easily. Since there is no way to assure if of some tasks completed successfully or failed, especially around the murky waters of storage, I say this requirement should be removed. At least not in the context of a task. I don't agree. Please feel free to convince me with some exampled. If we cannot provide feedback to a user as to whether their request has been satisfied or not, then we have some bigger problems to solve. If VDSM sends a write command to a storage server, and the connection hangs up before the ACK has returned. The operation has been committed but VDSM has no way of knowing if that happened as far as VDSM is concerned it got an ETIMEO or EIO. This is the same problem that the engine has with VDSM. If VDSM creates an image\VM\network\repo but the connection hangs up before the response can be sent back as far as the engine is concerned the operation times out. This is an inherent issue with clustering. This is why I want to move away from tasks being *the* trackable objects. Tasks should be short. As short as possible. Run VM should just persist the VM information on the VDSM host and return. The rest of the tracking should be done using the VM ID. Create image should return once VDSM persisted the information about the request on the repository and created the metadata files. Tracking should be done on the repo or the imageId. The thing is that I know how long a VM object should live (or an Image object). So tracking it is straight forward. How long a task should live is very problematic and quite context specific. It depends on what the task is. I think it's quite confusing from an API standpoint to have every task have a different scope, id requirement and life-cycle. In VDSM has two types of APIs CRUD objects - VM, Image, Repository, Bridge, Storage Connections General transient methods - getBiosInfo(), getDeviceList() The latter are quite simple to manage. They don't need any special handling. If you lost a getBiosInfo() call you just send another one, no harm done. The same is even true with things that change the host like getDeviceList() What we are really arguing about is fitting the CRUD objects to some generic task oriented scheme. I'm saying it's a waste of time as you can quite easily have flows to recover from each operation. Create - Check if the object exists Read - Read again Update - either update again or read and update if update