Re: [vdsm] RFC: New Storage API
On 12/04/2012 11:52 PM, Saggi Mizrahi wrote: I've been throwing a lot of bits out about the new storage API and I think it's time to talk a bit. I will purposefully try and keep implementation details away and concentrate about how the API looks and how you use it. First major change is in terminology, there is no long a storage domain but a storage repository. This change is done because so many things are already called domain in the system and this will make things less confusing for new-commers with a libvirt background. One other changes is that repositories no longer have a UUID. The UUID was only used in the pool members manifest and is no longer needed. connectStorageRepository(repoId, repoFormat, connectionParameters={}): repoId - is a transient name that will be used to refer to the connected domain, it is not persisted and doesn't have to be the same across the cluster. repoFormat - Similar to what used to be type (eg. localfs-1.0, nfs-3.4, clvm-1.2). connectionParameters - This is format specific and will used to tell VDSM how to connect to the repo. disconnectStorageRepository(self, repoId): In the new API there are only images, some images are mutable and some are not. mutable images are also called VirtualDisks immutable images are also called Snapshots There are no explicit templates, you can create as many images as you want from any snapshot. There are 4 major image operations: createVirtualDisk(targetRepoId, size, baseSnapshotId=None, userData={}, options={}): targetRepoId - ID of a connected repo where the disk will be created size - The size of the image you wish to create baseSnapshotId - the ID of the snapshot you want the base the new virtual disk on userData - optional data that will be attached to the new VD, could be anything that the user desires. options - options to modify VDSMs default behavior returns the id of the new VD createSnapshot(targetRepoId, baseVirtualDiskId, userData={}, options={}): targetRepoId - The ID of a connected repo where the new sanpshot will be created and the original image exists as well. size - The size of the image you wish to create baseVirtualDisk - the ID of a mutable image (Virtual Disk) you want to snapshot userData - optional data that will be attached to the new Snapshot, could be anything that the user desires. options - options to modify VDSMs default behavior returns the id of the new Snapshot copyImage(targetRepoId, imageId, baseImageId=None, userData={}, options={}) targetRepoId - The ID of a connected repo where the new image will be created imageId - The image you wish to copy baseImageId - if specified, the new image will contain only the diff between image and Id. If None the new image will contain all the bits of image Id. This can be used to copy partial parts of images for export. userData - optional data that will be attached to the new image, could be anything that the user desires. options - options to modify VDSMs default behavior return the Id of the new image. In case of copying an immutable image the ID will be identical to the original image as they contain the same data. However the user should not assume that and always use the value returned from the method. removeImage(repositoryId, imageId, options={}): repositoryId - The ID of a connected repo where the image to delete resides imageId - The id of the image you wish to delete. getImageStatus(repositoryId, imageId) repositoryId - The ID of a connected repo where the image to check resides imageId - The id of the image you wish to check. All operations return once the operations has been committed to disk NOT when the operation actually completes. This is done so that: - operation come to a stable state as quickly as possible. - In case where there is an SDM, only small portion of the operation actually needs to be performed on the SDM host. - No matter how many times the operation fails and on how many hosts, you can always resume the operation and choose when to do it. - You can stop an operation at any time and remove the resulting object making a distinction between stop because the host is overloaded to I don't want that image This means that after calling any operation that creates a new image the user must then call getImageStatus() to check what is the status of the image. The status of the image can be either optimized, degraded, or broken. Optimized means that the image is available and you can run VMs of it. Degraded means that the image is available and will run VMs but it might be a better way VDSM can represent the underlying data. Broken means that the image can't be used at the moment, probably because not all the data has been set up on the volume. Apart from that VDSM will also return the last persisted status information which will conatin hostID - the last host to try and optimize of fix the image stage - X/Y (eg. 1/10) the last persisted stage
Re: [vdsm] RFC: New Storage API
On 12/07/2012 09:53 PM, Saggi Mizrahi wrote: ... connectStorageRepository(repoId, repoFormat, connectionParameters={}): repoId - is a transient name that will be used to refer to the connected domain, it is not persisted and doesn't have to be the same across the cluster. repoFormat - Similar to what used to be type (eg. localfs-1.0, nfs-3.4, clvm-1.2). connectionParameters - This is format specific and will used to tell VDSM how to connect to the repo. Where does repoID come from? I think repoID doesn't exist before connectStorageRepository() return. Isn't repoID a return value of connectStorageRepository()? No, repoIDs are no longer part of the domain, they are just a transient handle. The user can put whatever it wants there as long as it isn't already taken by another currently connected domain. So what happens when user mistakenly gives a repoID that is in use before.. there should be something in the return value that specifies the error and/or reason for error so that user can try with a new/diff repoID ? Asi I said, connect fails if the repoId is in use ATM. so how do add connections to the repo without first disconnecting it (extend storage domain flow)? ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel
Re: [vdsm] RFC: New Storage API
- Original Message - From: Itamar Heim ih...@redhat.com To: Saggi Mizrahi smizr...@redhat.com Cc: VDSM Project Development vdsm-devel@lists.fedorahosted.org, engine-devel engine-de...@ovirt.org Sent: Monday, January 14, 2013 6:18:13 AM Subject: Re: [vdsm] RFC: New Storage API On 12/04/2012 11:52 PM, Saggi Mizrahi wrote: I've been throwing a lot of bits out about the new storage API and I think it's time to talk a bit. I will purposefully try and keep implementation details away and concentrate about how the API looks and how you use it. First major change is in terminology, there is no long a storage domain but a storage repository. This change is done because so many things are already called domain in the system and this will make things less confusing for new-commers with a libvirt background. One other changes is that repositories no longer have a UUID. The UUID was only used in the pool members manifest and is no longer needed. connectStorageRepository(repoId, repoFormat, connectionParameters={}): repoId - is a transient name that will be used to refer to the connected domain, it is not persisted and doesn't have to be the same across the cluster. repoFormat - Similar to what used to be type (eg. localfs-1.0, nfs-3.4, clvm-1.2). connectionParameters - This is format specific and will used to tell VDSM how to connect to the repo. disconnectStorageRepository(self, repoId): In the new API there are only images, some images are mutable and some are not. mutable images are also called VirtualDisks immutable images are also called Snapshots There are no explicit templates, you can create as many images as you want from any snapshot. There are 4 major image operations: createVirtualDisk(targetRepoId, size, baseSnapshotId=None, userData={}, options={}): targetRepoId - ID of a connected repo where the disk will be created size - The size of the image you wish to create baseSnapshotId - the ID of the snapshot you want the base the new virtual disk on userData - optional data that will be attached to the new VD, could be anything that the user desires. options - options to modify VDSMs default behavior returns the id of the new VD createSnapshot(targetRepoId, baseVirtualDiskId, userData={}, options={}): targetRepoId - The ID of a connected repo where the new sanpshot will be created and the original image exists as well. size - The size of the image you wish to create baseVirtualDisk - the ID of a mutable image (Virtual Disk) you want to snapshot userData - optional data that will be attached to the new Snapshot, could be anything that the user desires. options - options to modify VDSMs default behavior returns the id of the new Snapshot copyImage(targetRepoId, imageId, baseImageId=None, userData={}, options={}) targetRepoId - The ID of a connected repo where the new image will be created imageId - The image you wish to copy baseImageId - if specified, the new image will contain only the diff between image and Id. If None the new image will contain all the bits of image Id. This can be used to copy partial parts of images for export. userData - optional data that will be attached to the new image, could be anything that the user desires. options - options to modify VDSMs default behavior return the Id of the new image. In case of copying an immutable image the ID will be identical to the original image as they contain the same data. However the user should not assume that and always use the value returned from the method. removeImage(repositoryId, imageId, options={}): repositoryId - The ID of a connected repo where the image to delete resides imageId - The id of the image you wish to delete. getImageStatus(repositoryId, imageId) repositoryId - The ID of a connected repo where the image to check resides imageId - The id of the image you wish to check. All operations return once the operations has been committed to disk NOT when the operation actually completes. This is done so that: - operation come to a stable state as quickly as possible. - In case where there is an SDM, only small portion of the operation actually needs to be performed on the SDM host. - No matter how many times the operation fails and on how many hosts, you can always resume the operation and choose when to do it. - You can stop an operation at any time and remove the resulting object making a distinction between stop because the host is overloaded to I don't want that image This means that after calling any operation that creates a new image the user must then call getImageStatus() to check what is the status of the image. The status of the image can be either optimized, degraded, or broken. Optimized means
Re: [vdsm] RFC: New Storage API
- Original Message - - Original Message - From: Itamar Heim ih...@redhat.com To: Saggi Mizrahi smizr...@redhat.com Cc: VDSM Project Development vdsm-devel@lists.fedorahosted.org, engine-devel engine-de...@ovirt.org Sent: Monday, January 14, 2013 6:18:13 AM Subject: Re: [vdsm] RFC: New Storage API On 12/04/2012 11:52 PM, Saggi Mizrahi wrote: I've been throwing a lot of bits out about the new storage API and I think it's time to talk a bit. I will purposefully try and keep implementation details away and concentrate about how the API looks and how you use it. First major change is in terminology, there is no long a storage domain but a storage repository. This change is done because so many things are already called domain in the system and this will make things less confusing for new-commers with a libvirt background. One other changes is that repositories no longer have a UUID. The UUID was only used in the pool members manifest and is no longer needed. connectStorageRepository(repoId, repoFormat, connectionParameters={}): repoId - is a transient name that will be used to refer to the connected domain, it is not persisted and doesn't have to be the same across the cluster. repoFormat - Similar to what used to be type (eg. localfs-1.0, nfs-3.4, clvm-1.2). connectionParameters - This is format specific and will used to tell VDSM how to connect to the repo. disconnectStorageRepository(self, repoId): In the new API there are only images, some images are mutable and some are not. mutable images are also called VirtualDisks immutable images are also called Snapshots There are no explicit templates, you can create as many images as you want from any snapshot. There are 4 major image operations: createVirtualDisk(targetRepoId, size, baseSnapshotId=None, userData={}, options={}): targetRepoId - ID of a connected repo where the disk will be created size - The size of the image you wish to create baseSnapshotId - the ID of the snapshot you want the base the new virtual disk on userData - optional data that will be attached to the new VD, could be anything that the user desires. options - options to modify VDSMs default behavior returns the id of the new VD createSnapshot(targetRepoId, baseVirtualDiskId, userData={}, options={}): targetRepoId - The ID of a connected repo where the new sanpshot will be created and the original image exists as well. size - The size of the image you wish to create baseVirtualDisk - the ID of a mutable image (Virtual Disk) you want to snapshot userData - optional data that will be attached to the new Snapshot, could be anything that the user desires. options - options to modify VDSMs default behavior returns the id of the new Snapshot copyImage(targetRepoId, imageId, baseImageId=None, userData={}, options={}) targetRepoId - The ID of a connected repo where the new image will be created imageId - The image you wish to copy baseImageId - if specified, the new image will contain only the diff between image and Id. If None the new image will contain all the bits of image Id. This can be used to copy partial parts of images for export. userData - optional data that will be attached to the new image, could be anything that the user desires. options - options to modify VDSMs default behavior return the Id of the new image. In case of copying an immutable image the ID will be identical to the original image as they contain the same data. However the user should not assume that and always use the value returned from the method. removeImage(repositoryId, imageId, options={}): repositoryId - The ID of a connected repo where the image to delete resides imageId - The id of the image you wish to delete. getImageStatus(repositoryId, imageId) repositoryId - The ID of a connected repo where the image to check resides imageId - The id of the image you wish to check. All operations return once the operations has been committed to disk NOT when the operation actually completes. This is done so that: - operation come to a stable state as quickly as possible. - In case where there is an SDM, only small portion of the operation actually needs to be performed on the SDM host. - No matter how many times the operation fails and on how many hosts, you can always resume the operation and choose when to do it. - You can stop an operation at any time and remove the resulting object making a distinction between stop because the host is overloaded to I don't want that image This means that after calling any operation
Re: [vdsm] RFC: New Storage API
- Original Message - - Original Message - From: Deepak C Shetty deepa...@linux.vnet.ibm.com To: Saggi Mizrahi smizr...@redhat.com Cc: Shu Ming shum...@linux.vnet.ibm.com, engine-devel engine-de...@ovirt.org, VDSM Project Development vdsm-devel@lists.fedorahosted.org, Deepak C Shetty deepa...@linux.vnet.ibm.com Sent: Sunday, December 16, 2012 11:40:01 PM Subject: Re: [vdsm] RFC: New Storage API On 12/08/2012 01:23 AM, Saggi Mizrahi wrote: - Original Message - From: Deepak C Shetty deepa...@linux.vnet.ibm.com To: Saggi Mizrahi smizr...@redhat.com Cc: Shu Ming shum...@linux.vnet.ibm.com, engine-devel engine-de...@ovirt.org, VDSM Project Development vdsm-devel@lists.fedorahosted.org, Deepak C Shetty deepa...@linux.vnet.ibm.com Sent: Friday, December 7, 2012 12:23:15 AM Subject: Re: [vdsm] RFC: New Storage API On 12/06/2012 10:22 PM, Saggi Mizrahi wrote: - Original Message - From: Shu Ming shum...@linux.vnet.ibm.com To: Saggi Mizrahi smizr...@redhat.com Cc: VDSM Project Development vdsm-devel@lists.fedorahosted.org, engine-devel engine-de...@ovirt.org Sent: Thursday, December 6, 2012 11:02:02 AM Subject: Re: [vdsm] RFC: New Storage API Saggi, Thanks for sharing your thought and I get some comments below. Saggi Mizrahi: I've been throwing a lot of bits out about the new storage API and I think it's time to talk a bit. I will purposefully try and keep implementation details away and concentrate about how the API looks and how you use it. First major change is in terminology, there is no long a storage domain but a storage repository. This change is done because so many things are already called domain in the system and this will make things less confusing for new-commers with a libvirt background. One other changes is that repositories no longer have a UUID. The UUID was only used in the pool members manifest and is no longer needed. connectStorageRepository(repoId, repoFormat, connectionParameters={}): repoId - is a transient name that will be used to refer to the connected domain, it is not persisted and doesn't have to be the same across the cluster. repoFormat - Similar to what used to be type (eg. localfs-1.0, nfs-3.4, clvm-1.2). connectionParameters - This is format specific and will used to tell VDSM how to connect to the repo. Where does repoID come from? I think repoID doesn't exist before connectStorageRepository() return. Isn't repoID a return value of connectStorageRepository()? No, repoIDs are no longer part of the domain, they are just a transient handle. The user can put whatever it wants there as long as it isn't already taken by another currently connected domain. So what happens when user mistakenly gives a repoID that is in use before.. there should be something in the return value that specifies the error and/or reason for error so that user can try with a new/diff repoID ? Asi I said, connect fails if the repoId is in use ATM. disconnectStorageRepository(self, repoId) In the new API there are only images, some images are mutable and some are not. mutable images are also called VirtualDisks immutable images are also called Snapshots There are no explicit templates, you can create as many images as you want from any snapshot. There are 4 major image operations: createVirtualDisk(targetRepoId, size, baseSnapshotId=None, userData={}, options={}): targetRepoId - ID of a connected repo where the disk will be created size - The size of the image you wish to create baseSnapshotId - the ID of the snapshot you want the base the new virtual disk on userData - optional data that will be attached to the new VD, could be anything that the user desires. options - options to modify VDSMs default behavior IIUC, i can use options to do storage offloads ? For eg. I can create a LUN that represents this VD on my storage array based on the 'options' parameter ? Is this the intended way to use 'options' ? No, this has nothing to do with offloads. If by offloads you mean having other VDSM hosts to the heavy lifting then this is what the option autoFix=False and the fix mechanism is for. If you are talking about advanced scsi features (ie. write same) they will be used automatically whenever possible. In any case, how we manage LUNs (if they are even used) is an implementation detail. I am a bit more interested in how storage array offloads ( by that I mean, offload VD creation, snapshot, clone etc to the storage array when available/possible) can be done from VDSM ? In the past there were talks of using libSM to do that. How does that strategy play
Re: [vdsm] RFC: New Storage API
On 12/08/2012 01:23 AM, Saggi Mizrahi wrote: - Original Message - From: Deepak C Shetty deepa...@linux.vnet.ibm.com To: Saggi Mizrahi smizr...@redhat.com Cc: Shu Ming shum...@linux.vnet.ibm.com, engine-devel engine-de...@ovirt.org, VDSM Project Development vdsm-devel@lists.fedorahosted.org, Deepak C Shetty deepa...@linux.vnet.ibm.com Sent: Friday, December 7, 2012 12:23:15 AM Subject: Re: [vdsm] RFC: New Storage API On 12/06/2012 10:22 PM, Saggi Mizrahi wrote: - Original Message - From: Shu Ming shum...@linux.vnet.ibm.com To: Saggi Mizrahi smizr...@redhat.com Cc: VDSM Project Development vdsm-devel@lists.fedorahosted.org, engine-devel engine-de...@ovirt.org Sent: Thursday, December 6, 2012 11:02:02 AM Subject: Re: [vdsm] RFC: New Storage API Saggi, Thanks for sharing your thought and I get some comments below. Saggi Mizrahi: I've been throwing a lot of bits out about the new storage API and I think it's time to talk a bit. I will purposefully try and keep implementation details away and concentrate about how the API looks and how you use it. First major change is in terminology, there is no long a storage domain but a storage repository. This change is done because so many things are already called domain in the system and this will make things less confusing for new-commers with a libvirt background. One other changes is that repositories no longer have a UUID. The UUID was only used in the pool members manifest and is no longer needed. connectStorageRepository(repoId, repoFormat, connectionParameters={}): repoId - is a transient name that will be used to refer to the connected domain, it is not persisted and doesn't have to be the same across the cluster. repoFormat - Similar to what used to be type (eg. localfs-1.0, nfs-3.4, clvm-1.2). connectionParameters - This is format specific and will used to tell VDSM how to connect to the repo. Where does repoID come from? I think repoID doesn't exist before connectStorageRepository() return. Isn't repoID a return value of connectStorageRepository()? No, repoIDs are no longer part of the domain, they are just a transient handle. The user can put whatever it wants there as long as it isn't already taken by another currently connected domain. So what happens when user mistakenly gives a repoID that is in use before.. there should be something in the return value that specifies the error and/or reason for error so that user can try with a new/diff repoID ? Asi I said, connect fails if the repoId is in use ATM. disconnectStorageRepository(self, repoId) In the new API there are only images, some images are mutable and some are not. mutable images are also called VirtualDisks immutable images are also called Snapshots There are no explicit templates, you can create as many images as you want from any snapshot. There are 4 major image operations: createVirtualDisk(targetRepoId, size, baseSnapshotId=None, userData={}, options={}): targetRepoId - ID of a connected repo where the disk will be created size - The size of the image you wish to create baseSnapshotId - the ID of the snapshot you want the base the new virtual disk on userData - optional data that will be attached to the new VD, could be anything that the user desires. options - options to modify VDSMs default behavior IIUC, i can use options to do storage offloads ? For eg. I can create a LUN that represents this VD on my storage array based on the 'options' parameter ? Is this the intended way to use 'options' ? No, this has nothing to do with offloads. If by offloads you mean having other VDSM hosts to the heavy lifting then this is what the option autoFix=False and the fix mechanism is for. If you are talking about advanced scsi features (ie. write same) they will be used automatically whenever possible. In any case, how we manage LUNs (if they are even used) is an implementation detail. I am a bit more interested in how storage array offloads ( by that I mean, offload VD creation, snapshot, clone etc to the storage array when available/possible) can be done from VDSM ? In the past there were talks of using libSM to do that. How does that strategy play in this new Storage API scenario ? I agree its implmn detail, but how where does that implm sit and how it would be triggered is not very clear to me. Looking at createVD args, it sounded like 'options' seems to be a trigger point for deciding whether to use storage offloads or not, but you spoke otherwise :) Can you provide your vision on how VDSM can understand the storage array capabilities exploit storgae array offloads in this New Storage API context ? -- Thanks deepak ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel
Re: [vdsm] RFC: New Storage API
On Thu, Dec 06, 2012 at 11:52:01AM -0500, Saggi Mizrahi wrote: - Original Message - From: Shu Ming shum...@linux.vnet.ibm.com To: Saggi Mizrahi smizr...@redhat.com Cc: VDSM Project Development vdsm-devel@lists.fedorahosted.org, engine-devel engine-de...@ovirt.org Sent: Thursday, December 6, 2012 11:02:02 AM Subject: Re: [vdsm] RFC: New Storage API Saggi, Thanks for sharing your thought and I get some comments below. Saggi Mizrahi: I've been throwing a lot of bits out about the new storage API and I think it's time to talk a bit. I will purposefully try and keep implementation details away and concentrate about how the API looks and how you use it. First major change is in terminology, there is no long a storage domain but a storage repository. This change is done because so many things are already called domain in the system and this will make things less confusing for new-commers with a libvirt background. One other changes is that repositories no longer have a UUID. The UUID was only used in the pool members manifest and is no longer needed. connectStorageRepository(repoId, repoFormat, connectionParameters={}): repoId - is a transient name that will be used to refer to the connected domain, it is not persisted and doesn't have to be the same across the cluster. repoFormat - Similar to what used to be type (eg. localfs-1.0, nfs-3.4, clvm-1.2). connectionParameters - This is format specific and will used to tell VDSM how to connect to the repo. Where does repoID come from? I think repoID doesn't exist before connectStorageRepository() return. Isn't repoID a return value of connectStorageRepository()? No, repoIDs are no longer part of the domain, they are just a transient handle. The user can put whatever it wants there as long as it isn't already taken by another currently connected domain. disconnectStorageRepository(self, repoId) In the new API there are only images, some images are mutable and some are not. mutable images are also called VirtualDisks immutable images are also called Snapshots There are no explicit templates, you can create as many images as you want from any snapshot. There are 4 major image operations: createVirtualDisk(targetRepoId, size, baseSnapshotId=None, userData={}, options={}): targetRepoId - ID of a connected repo where the disk will be created size - The size of the image you wish to create baseSnapshotId - the ID of the snapshot you want the base the new virtual disk on userData - optional data that will be attached to the new VD, could be anything that the user desires. options - options to modify VDSMs default behavior returns the id of the new VD I think we will also need a function to check if a a VirtualDisk is based on a specific snapshot. Like: isSnapshotOf(virtualDiskId, baseSnapshotID): No, the design is that volume dependencies are an implementation detail. There is no reason for you to know that an image is physically a snapshot of another. Logical snapshots, template information, and any other information can be set by the user by using the userData field available for every image. Statements like this make me start to worry about your userData concept. It's a sign of a bad API if the user needs to invent a custom metadata scheme for itself. This reminds me of the abomination that is the 'custom' property in the vm definition today. createSnapshot(targetRepoId, baseVirtualDiskId, userData={}, options={}): targetRepoId - The ID of a connected repo where the new sanpshot will be created and the original image exists as well. size - The size of the image you wish to create baseVirtualDisk - the ID of a mutable image (Virtual Disk) you want to snapshot userData - optional data that will be attached to the new Snapshot, could be anything that the user desires. options - options to modify VDSMs default behavior returns the id of the new Snapshot copyImage(targetRepoId, imageId, baseImageId=None, userData={}, options={}) targetRepoId - The ID of a connected repo where the new image will be created imageId - The image you wish to copy baseImageId - if specified, the new image will contain only the diff between image and Id. If None the new image will contain all the bits of image Id. This can be used to copy partial parts of images for export. userData - optional data that will be attached to the new image, could be anything that the user desires. options - options to modify VDSMs default behavior Does this function mean that we can copy the image from one repository to another repository? Does it cover the semantics of storage migration, storage backup, storage
Re: [vdsm] RFC: New Storage API
- Original Message - From: Adam Litke a...@us.ibm.com To: Saggi Mizrahi smizr...@redhat.com Cc: Deepak C Shetty deepa...@linux.vnet.ibm.com, engine-devel engine-de...@ovirt.org, VDSM Project Development vdsm-devel@lists.fedorahosted.org Sent: Monday, December 10, 2012 1:49:31 PM Subject: Re: [vdsm] RFC: New Storage API On Fri, Dec 07, 2012 at 02:53:41PM -0500, Saggi Mizrahi wrote: snip 1) Can you provide more info on why there is a exception for 'lvm based block domain'. Its not coming out clearly. File based domains are responsible for syncing up object manipulation (creation\deletion) The backend is responsible for making sure it all works either by having a single writer (NFS) or having it's own locking mechanism (gluster). In our LVM based domains VDSM is responsible for basic object manipulation. The current design uses an approach where there is a single host responsible for object creation\deleteion it is the SRM\SDM\SPM\S?M. If we ever find a way to make it fully clustered without a big hit in performance the S?M requirement will be removed form that type of domain. I would like to see us maintain a LOCALFS domain as well. For this, we would also need SRM, correct? No, why? -- Adam Litke a...@us.ibm.com IBM Linux Technology Center ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel
Re: [vdsm] RFC: New Storage API
On Mon, Dec 10, 2012 at 02:03:09PM -0500, Saggi Mizrahi wrote: - Original Message - From: Adam Litke a...@us.ibm.com To: Saggi Mizrahi smizr...@redhat.com Cc: Deepak C Shetty deepa...@linux.vnet.ibm.com, engine-devel engine-de...@ovirt.org, VDSM Project Development vdsm-devel@lists.fedorahosted.org Sent: Monday, December 10, 2012 1:49:31 PM Subject: Re: [vdsm] RFC: New Storage API On Fri, Dec 07, 2012 at 02:53:41PM -0500, Saggi Mizrahi wrote: snip 1) Can you provide more info on why there is a exception for 'lvm based block domain'. Its not coming out clearly. File based domains are responsible for syncing up object manipulation (creation\deletion) The backend is responsible for making sure it all works either by having a single writer (NFS) or having it's own locking mechanism (gluster). In our LVM based domains VDSM is responsible for basic object manipulation. The current design uses an approach where there is a single host responsible for object creation\deleteion it is the SRM\SDM\SPM\S?M. If we ever find a way to make it fully clustered without a big hit in performance the S?M requirement will be removed form that type of domain. I would like to see us maintain a LOCALFS domain as well. For this, we would also need SRM, correct? No, why? Sorry, nevermind. I was thinking of a scenario with multiple clients talking to a single vdsm and making sure they don't stomp on one another. This is probably not something we are going to care about though. -- Adam Litke a...@us.ibm.com IBM Linux Technology Center ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel
Re: [vdsm] RFC: New Storage API
On Mon, Dec 10, 2012 at 03:36:23PM -0500, Saggi Mizrahi wrote: Statements like this make me start to worry about your userData concept. It's a sign of a bad API if the user needs to invent a custom metadata scheme for itself. This reminds me of the abomination that is the 'custom' property in the vm definition today. In one sentence: If VDSM doesn't care about it, VDSM doesn't manage it. userData being a void* is quite common and I don't understand why you would thing it's a sign of a bad API. Further more, giving the user choice about how to represent it's own metadata and what fields it want to keep seems reasonable to me. Especially given the fact that VDSM never reads it. The reason we are pulling away from the current system of VDSM understanding the extra data is that it makes that data tied to VDSMs on disk format. VDSM on disk format has to be very stable because of clusters with multiple VDSM versions. Further more, since this is actually manager data it has to be tied to the manager backward compatibility lifetime as well. Having it be opaque to VDSM ties it to only one, simpler, support lifetime instead of two. I guess you are implying that it will make it problematic for multiple users to read userData left by another user because the formats might not be compatible. The solution is that all parties interested in using VDSM storage agree on format, and common fields, and supportability, and all the other things that choosing a supporting *something* entails. This is, however, out of the scope of VDSM. When the time comes I think how the userData blob is actually parsed and what fields it keeps should be discussed on ovirt-devel or engine-devel. The crux of the issue is that VDSM manages only what it cares about and the user can't modify directly. This is done because everything we expose we commit to. If you want any information persisted like: - Human readable name (in whatever encoding) - Is this a template or a snapshot - What user owns this image You can just put it in the userData. VDSM is not going to impose what encoding you use. It's not going to decide if you represent your users as IDs or names or ldap queries or Public Keys. It's not going to decide if you have explicit templates or not. It's not going to decide if you care what is the logical image chain. It's not going to decide anything that is out of it's scope. No format is future proof, no selection of fields will be good for any situation. I'd much rather it be someone else's problem when any of them need to be changed. They have currently been VDSMs problem and it has been hell to maintain. In general, I actually agree with most of this. What I want to avoid is pushing things that should actually be a part of the API into this userData blob. We do want to keep the API as simple as possible to give vdsm flexibility. If, over time, we find that users are always using userData to work around something missing in the API, this could be a really good sign that the API needs extension. -- Adam Litke a...@us.ibm.com IBM Linux Technology Center ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel
Re: [vdsm] RFC: New Storage API
- Original Message - From: Adam Litke a...@us.ibm.com To: Saggi Mizrahi smizr...@redhat.com Cc: Shu Ming shum...@linux.vnet.ibm.com, engine-devel engine-de...@ovirt.org, VDSM Project Development vdsm-devel@lists.fedorahosted.org Sent: Monday, December 10, 2012 4:47:46 PM Subject: Re: [vdsm] RFC: New Storage API On Mon, Dec 10, 2012 at 03:36:23PM -0500, Saggi Mizrahi wrote: Statements like this make me start to worry about your userData concept. It's a sign of a bad API if the user needs to invent a custom metadata scheme for itself. This reminds me of the abomination that is the 'custom' property in the vm definition today. In one sentence: If VDSM doesn't care about it, VDSM doesn't manage it. userData being a void* is quite common and I don't understand why you would thing it's a sign of a bad API. Further more, giving the user choice about how to represent it's own metadata and what fields it want to keep seems reasonable to me. Especially given the fact that VDSM never reads it. The reason we are pulling away from the current system of VDSM understanding the extra data is that it makes that data tied to VDSMs on disk format. VDSM on disk format has to be very stable because of clusters with multiple VDSM versions. Further more, since this is actually manager data it has to be tied to the manager backward compatibility lifetime as well. Having it be opaque to VDSM ties it to only one, simpler, support lifetime instead of two. I guess you are implying that it will make it problematic for multiple users to read userData left by another user because the formats might not be compatible. The solution is that all parties interested in using VDSM storage agree on format, and common fields, and supportability, and all the other things that choosing a supporting *something* entails. This is, however, out of the scope of VDSM. When the time comes I think how the userData blob is actually parsed and what fields it keeps should be discussed on ovirt-devel or engine-devel. The crux of the issue is that VDSM manages only what it cares about and the user can't modify directly. This is done because everything we expose we commit to. If you want any information persisted like: - Human readable name (in whatever encoding) - Is this a template or a snapshot - What user owns this image You can just put it in the userData. VDSM is not going to impose what encoding you use. It's not going to decide if you represent your users as IDs or names or ldap queries or Public Keys. It's not going to decide if you have explicit templates or not. It's not going to decide if you care what is the logical image chain. It's not going to decide anything that is out of it's scope. No format is future proof, no selection of fields will be good for any situation. I'd much rather it be someone else's problem when any of them need to be changed. They have currently been VDSMs problem and it has been hell to maintain. In general, I actually agree with most of this. What I want to avoid is pushing things that should actually be a part of the API into this userData blob. We do want to keep the API as simple as possible to give vdsm flexibility. If, over time, we find that users are always using userData to work around something missing in the API, this could be a really good sign that the API needs extension. I was actually contemplating about this for quite a while. If while you create an image the reply is lost or, VDSM is unable to know if the operation was committed or not, the user will have no way of knowing what thew new image ID is. To solve this it is recommended that the manager puts some sort of task related information in the user data. If the operation ever finishes in an an ambiguous state the user just reads the userData from any images it doesn't know or is unsure about their state. This is a flow that every client will have to have. So why not just add that to the API? Because I don't want to impose how this information gets generated, what is the content of that data or how unique it has to be. Since VDSM doesn't use it for anything, I don't feel like I need to figure this out. I am all for simplicity, but simplicity is kind of an abstract concept. Having it be a blob is in some aspects the simplest thing you can do. Just saying that I have a field, put whatever in it is simple to convey but does requires more work on the user's side to figure out what to do with it. All that being said, I do think that the format, fields and how to use them should be defined so different users can communicate and synchronize. It's also important that you don't reinvent the wheel for every flow in every client. I'm just saying that it's not in the scope of VDSM. It should be done as a standard that all users of VDSM agree too conform to. It's the same way that a file
Re: [vdsm] RFC: New Storage API
2012-12-11 4:36, Saggi Mizrahi: - Original Message - From: Adam Litke a...@us.ibm.com To: Saggi Mizrahi smizr...@redhat.com Cc: Shu Ming shum...@linux.vnet.ibm.com, engine-devel engine-de...@ovirt.org, VDSM Project Development vdsm-devel@lists.fedorahosted.org Sent: Monday, December 10, 2012 1:39:51 PM Subject: Re: [vdsm] RFC: New Storage API On Thu, Dec 06, 2012 at 11:52:01AM -0500, Saggi Mizrahi wrote: - Original Message - From: Shu Ming shum...@linux.vnet.ibm.com To: Saggi Mizrahi smizr...@redhat.com Cc: VDSM Project Development vdsm-devel@lists.fedorahosted.org, engine-devel engine-de...@ovirt.org Sent: Thursday, December 6, 2012 11:02:02 AM Subject: Re: [vdsm] RFC: New Storage API Saggi, Thanks for sharing your thought and I get some comments below. Saggi Mizrahi: I've been throwing a lot of bits out about the new storage API and I think it's time to talk a bit. I will purposefully try and keep implementation details away and concentrate about how the API looks and how you use it. First major change is in terminology, there is no long a storage domain but a storage repository. This change is done because so many things are already called domain in the system and this will make things less confusing for new-commers with a libvirt background. One other changes is that repositories no longer have a UUID. The UUID was only used in the pool members manifest and is no longer needed. connectStorageRepository(repoId, repoFormat, connectionParameters={}): repoId - is a transient name that will be used to refer to the connected domain, it is not persisted and doesn't have to be the same across the cluster. repoFormat - Similar to what used to be type (eg. localfs-1.0, nfs-3.4, clvm-1.2). connectionParameters - This is format specific and will used to tell VDSM how to connect to the repo. Where does repoID come from? I think repoID doesn't exist before connectStorageRepository() return. Isn't repoID a return value of connectStorageRepository()? No, repoIDs are no longer part of the domain, they are just a transient handle. The user can put whatever it wants there as long as it isn't already taken by another currently connected domain. disconnectStorageRepository(self, repoId) In the new API there are only images, some images are mutable and some are not. mutable images are also called VirtualDisks immutable images are also called Snapshots There are no explicit templates, you can create as many images as you want from any snapshot. There are 4 major image operations: createVirtualDisk(targetRepoId, size, baseSnapshotId=None, userData={}, options={}): targetRepoId - ID of a connected repo where the disk will be created size - The size of the image you wish to create baseSnapshotId - the ID of the snapshot you want the base the new virtual disk on userData - optional data that will be attached to the new VD, could be anything that the user desires. options - options to modify VDSMs default behavior returns the id of the new VD I think we will also need a function to check if a a VirtualDisk is based on a specific snapshot. Like: isSnapshotOf(virtualDiskId, baseSnapshotID): No, the design is that volume dependencies are an implementation detail. There is no reason for you to know that an image is physically a snapshot of another. Logical snapshots, template information, and any other information can be set by the user by using the userData field available for every image. Statements like this make me start to worry about your userData concept. It's a sign of a bad API if the user needs to invent a custom metadata scheme for itself. This reminds me of the abomination that is the 'custom' property in the vm definition today. In one sentence: If VDSM doesn't care about it, VDSM doesn't manage it. userData being a void* is quite common and I don't understand why you would thing it's a sign of a bad API. Further more, giving the user choice about how to represent it's own metadata and what fields it want to keep seems reasonable to me. Especially given the fact that VDSM never reads it. The reason we are pulling away from the current system of VDSM understanding the extra data is that it makes that data tied to VDSMs on disk format. VDSM on disk format has to be very stable because of clusters with multiple VDSM versions. Further more, since this is actually manager data it has to be tied to the manager backward compatibility lifetime as well. Having it be opaque to VDSM ties it to only one, simpler, support lifetime instead of two. Making userData being opaque gives flexibilities to the management applications. To me, opaque userDaa can have two types at least. The first is the userData for runtime only. The second is the userData expected to be persisted into the metadata disk. For the first type, the management applications can store their own data structures like temporary task states, VDSM query caches etc. After the VDSM
Re: [vdsm] RFC: New Storage API
- Original Message - From: Deepak C Shetty deepa...@linux.vnet.ibm.com To: Saggi Mizrahi smizr...@redhat.com Cc: Shu Ming shum...@linux.vnet.ibm.com, engine-devel engine-de...@ovirt.org, VDSM Project Development vdsm-devel@lists.fedorahosted.org, Deepak C Shetty deepa...@linux.vnet.ibm.com Sent: Friday, December 7, 2012 12:23:15 AM Subject: Re: [vdsm] RFC: New Storage API On 12/06/2012 10:22 PM, Saggi Mizrahi wrote: - Original Message - From: Shu Ming shum...@linux.vnet.ibm.com To: Saggi Mizrahi smizr...@redhat.com Cc: VDSM Project Development vdsm-devel@lists.fedorahosted.org, engine-devel engine-de...@ovirt.org Sent: Thursday, December 6, 2012 11:02:02 AM Subject: Re: [vdsm] RFC: New Storage API Saggi, Thanks for sharing your thought and I get some comments below. Saggi Mizrahi: I've been throwing a lot of bits out about the new storage API and I think it's time to talk a bit. I will purposefully try and keep implementation details away and concentrate about how the API looks and how you use it. First major change is in terminology, there is no long a storage domain but a storage repository. This change is done because so many things are already called domain in the system and this will make things less confusing for new-commers with a libvirt background. One other changes is that repositories no longer have a UUID. The UUID was only used in the pool members manifest and is no longer needed. connectStorageRepository(repoId, repoFormat, connectionParameters={}): repoId - is a transient name that will be used to refer to the connected domain, it is not persisted and doesn't have to be the same across the cluster. repoFormat - Similar to what used to be type (eg. localfs-1.0, nfs-3.4, clvm-1.2). connectionParameters - This is format specific and will used to tell VDSM how to connect to the repo. Where does repoID come from? I think repoID doesn't exist before connectStorageRepository() return. Isn't repoID a return value of connectStorageRepository()? No, repoIDs are no longer part of the domain, they are just a transient handle. The user can put whatever it wants there as long as it isn't already taken by another currently connected domain. So what happens when user mistakenly gives a repoID that is in use before.. there should be something in the return value that specifies the error and/or reason for error so that user can try with a new/diff repoID ? Asi I said, connect fails if the repoId is in use ATM. disconnectStorageRepository(self, repoId) In the new API there are only images, some images are mutable and some are not. mutable images are also called VirtualDisks immutable images are also called Snapshots There are no explicit templates, you can create as many images as you want from any snapshot. There are 4 major image operations: createVirtualDisk(targetRepoId, size, baseSnapshotId=None, userData={}, options={}): targetRepoId - ID of a connected repo where the disk will be created size - The size of the image you wish to create baseSnapshotId - the ID of the snapshot you want the base the new virtual disk on userData - optional data that will be attached to the new VD, could be anything that the user desires. options - options to modify VDSMs default behavior IIUC, i can use options to do storage offloads ? For eg. I can create a LUN that represents this VD on my storage array based on the 'options' parameter ? Is this the intended way to use 'options' ? No, this has nothing to do with offloads. If by offloads you mean having other VDSM hosts to the heavy lifting then this is what the option autoFix=False and the fix mechanism is for. If you are talking about advanced scsi features (ie. write same) they will be used automatically whenever possible. In any case, how we manage LUNs (if they are even used) is an implementation detail. returns the id of the new VD I think we will also need a function to check if a a VirtualDisk is based on a specific snapshot. Like: isSnapshotOf(virtualDiskId, baseSnapshotID): No, the design is that volume dependencies are an implementation detail. There is no reason for you to know that an image is physically a snapshot of another. Logical snapshots, template information, and any other information can be set by the user by using the userData field available for every image. createSnapshot(targetRepoId, baseVirtualDiskId, userData={}, options={}): targetRepoId - The ID of a connected repo where the new sanpshot will be created and the original image exists as well. size - The size of the image you wish to create baseVirtualDisk - the ID of a mutable image (Virtual Disk) you want to snapshot userData - optional data that will be attached to the new Snapshot, could be anything
Re: [vdsm] RFC: New Storage API
Saggi, Thanks for sharing your thought and I get some comments below. Saggi Mizrahi: I've been throwing a lot of bits out about the new storage API and I think it's time to talk a bit. I will purposefully try and keep implementation details away and concentrate about how the API looks and how you use it. First major change is in terminology, there is no long a storage domain but a storage repository. This change is done because so many things are already called domain in the system and this will make things less confusing for new-commers with a libvirt background. One other changes is that repositories no longer have a UUID. The UUID was only used in the pool members manifest and is no longer needed. connectStorageRepository(repoId, repoFormat, connectionParameters={}): repoId - is a transient name that will be used to refer to the connected domain, it is not persisted and doesn't have to be the same across the cluster. repoFormat - Similar to what used to be type (eg. localfs-1.0, nfs-3.4, clvm-1.2). connectionParameters - This is format specific and will used to tell VDSM how to connect to the repo. Where does repoID come from? I think repoID doesn't exist before connectStorageRepository() return. Isn't repoID a return value of connectStorageRepository()? disconnectStorageRepository(self, repoId) In the new API there are only images, some images are mutable and some are not. mutable images are also called VirtualDisks immutable images are also called Snapshots There are no explicit templates, you can create as many images as you want from any snapshot. There are 4 major image operations: createVirtualDisk(targetRepoId, size, baseSnapshotId=None, userData={}, options={}): targetRepoId - ID of a connected repo where the disk will be created size - The size of the image you wish to create baseSnapshotId - the ID of the snapshot you want the base the new virtual disk on userData - optional data that will be attached to the new VD, could be anything that the user desires. options - options to modify VDSMs default behavior returns the id of the new VD I think we will also need a function to check if a a VirtualDisk is based on a specific snapshot. Like: isSnapshotOf(virtualDiskId, baseSnapshotID): createSnapshot(targetRepoId, baseVirtualDiskId, userData={}, options={}): targetRepoId - The ID of a connected repo where the new sanpshot will be created and the original image exists as well. size - The size of the image you wish to create baseVirtualDisk - the ID of a mutable image (Virtual Disk) you want to snapshot userData - optional data that will be attached to the new Snapshot, could be anything that the user desires. options - options to modify VDSMs default behavior returns the id of the new Snapshot copyImage(targetRepoId, imageId, baseImageId=None, userData={}, options={}) targetRepoId - The ID of a connected repo where the new image will be created imageId - The image you wish to copy baseImageId - if specified, the new image will contain only the diff between image and Id. If None the new image will contain all the bits of image Id. This can be used to copy partial parts of images for export. userData - optional data that will be attached to the new image, could be anything that the user desires. options - options to modify VDSMs default behavior Does this function mean that we can copy the image from one repository to another repository? Does it cover the semantics of storage migration, storage backup, storage incremental backup? return the Id of the new image. In case of copying an immutable image the ID will be identical to the original image as they contain the same data. However the user should not assume that and always use the value returned from the method. removeImage(repositoryId, imageId, options={}): repositoryId - The ID of a connected repo where the image to delete resides imageId - The id of the image you wish to delete. getImageStatus(repositoryId, imageId) repositoryId - The ID of a connected repo where the image to check resides imageId - The id of the image you wish to check. All operations return once the operations has been committed to disk NOT when the operation actually completes. This is done so that: - operation come to a stable state as quickly as possible. - In case where there is an SDM, only small portion of the operation actually needs to be performed on the SDM host. - No matter how many times the operation fails and on how many hosts, you can always resume the operation and choose when to do it. - You can stop an operation at any time and remove the resulting object making a distinction between stop because the host is overloaded to I don't want that image This means that after calling any operation that creates a new image the user must then call getImageStatus() to check what is the status of the image. The status of the image can be either
Re: [vdsm] RFC: New Storage API
- Original Message - From: Tony Asleson tasle...@redhat.com To: vdsm-devel@lists.fedorahosted.org Sent: Wednesday, December 5, 2012 4:48:34 PM Subject: Re: [vdsm] RFC: New Storage API On 12/04/2012 03:52 PM, Saggi Mizrahi wrote: I've been throwing a lot of bits out about the new storage API and I think it's time to talk a bit. I will purposefully try and keep implementation details away and concentrate about how the API looks and how you use it. First major change is in terminology, there is no long a storage domain but a storage repository. This change is done because so many things are already called domain in the system and this will make things less confusing for new-commers with a libvirt background. One other changes is that repositories no longer have a UUID. The UUID was only used in the pool members manifest and is no longer needed. connectStorageRepository(repoId, repoFormat, connectionParameters={}): repoId - is a transient name that will be used to refer to the connected domain, it is not persisted and doesn't have to be the same across the cluster. repoFormat - Similar to what used to be type (eg. localfs-1.0, nfs-3.4, clvm-1.2). connectionParameters - This is format specific and will used to tell VDSM how to connect to the repo. disconnectStorageRepository(self, repoId): In the new API there are only images, some images are mutable and some are not. mutable images are also called VirtualDisks immutable images are also called Snapshots There are no explicit templates, you can create as many images as you want from any snapshot. There are 4 major image operations: createVirtualDisk(targetRepoId, size, baseSnapshotId=None, userData={}, options={}): targetRepoId - ID of a connected repo where the disk will be created size - The size of the image you wish to create baseSnapshotId - the ID of the snapshot you want the base the new virtual disk on userData - optional data that will be attached to the new VD, could be anything that the user desires. options - options to modify VDSMs default behavior returns the id of the new VD I'm guessing there will be a way to find out how much space is available for a specified repo before you try to create a virtual disk on it? This is in the repo API which is not really detailed here. In any case, due to the nature of storage, you can never tell how much space an image is going to actually take. You have over-committing, thin provisioning, sparse volumes, native snapshots, compression, de-dupe, soft raid (btfs\zfs), check-summing, metadata backups, metadata per-operation (btrfs), and more. VDSM might also leave the image in degraded mode if there is no room to complete the action. If you want to create an image you should just give it a whirl, also you should always leave certain % percentage free. createSnapshot(targetRepoId, baseVirtualDiskId, userData={}, options={}): targetRepoId - The ID of a connected repo where the new sanpshot will be created and the original image exists as well. size - The size of the image you wish to create baseVirtualDisk - the ID of a mutable image (Virtual Disk) you want to snapshot userData - optional data that will be attached to the new Snapshot, could be anything that the user desires. options - options to modify VDSMs default behavior returns the id of the new Snapshot copyImage(targetRepoId, imageId, baseImageId=None, userData={}, options={}) targetRepoId - The ID of a connected repo where the new image will be created imageId - The image you wish to copy baseImageId - if specified, the new image will contain only the diff between image and Id. If None the new image will contain all the bits of image Id. This can be used to copy partial parts of images for export. userData - optional data that will be attached to the new image, could be anything that the user desires. options - options to modify VDSMs default behavior return the Id of the new image. In case of copying an immutable image the ID will be identical to the original image as they contain the same data. However the user should not assume that and always use the value returned from the method. Can the target repo id be itself? The case where a user wants to make a copy of a virtual disk in the same repo. A caller could snapshot the virtual disk and then create a virtual disk from the snapshot, but if the target repo could be the same as source repo then they could use this call as long as the returned ID was different. Does imageId IO need to be quiesced before calling this or will that be handled in the implementation (eg. snapshot first)? Copy of an image is possible to the same repo. Copy of a sanpshot to the same repo will not work, there is also no reason
Re: [vdsm] RFC: New Storage API
- Original Message - From: Shu Ming shum...@linux.vnet.ibm.com To: Saggi Mizrahi smizr...@redhat.com Cc: VDSM Project Development vdsm-devel@lists.fedorahosted.org, engine-devel engine-de...@ovirt.org Sent: Thursday, December 6, 2012 11:02:02 AM Subject: Re: [vdsm] RFC: New Storage API Saggi, Thanks for sharing your thought and I get some comments below. Saggi Mizrahi: I've been throwing a lot of bits out about the new storage API and I think it's time to talk a bit. I will purposefully try and keep implementation details away and concentrate about how the API looks and how you use it. First major change is in terminology, there is no long a storage domain but a storage repository. This change is done because so many things are already called domain in the system and this will make things less confusing for new-commers with a libvirt background. One other changes is that repositories no longer have a UUID. The UUID was only used in the pool members manifest and is no longer needed. connectStorageRepository(repoId, repoFormat, connectionParameters={}): repoId - is a transient name that will be used to refer to the connected domain, it is not persisted and doesn't have to be the same across the cluster. repoFormat - Similar to what used to be type (eg. localfs-1.0, nfs-3.4, clvm-1.2). connectionParameters - This is format specific and will used to tell VDSM how to connect to the repo. Where does repoID come from? I think repoID doesn't exist before connectStorageRepository() return. Isn't repoID a return value of connectStorageRepository()? No, repoIDs are no longer part of the domain, they are just a transient handle. The user can put whatever it wants there as long as it isn't already taken by another currently connected domain. disconnectStorageRepository(self, repoId) In the new API there are only images, some images are mutable and some are not. mutable images are also called VirtualDisks immutable images are also called Snapshots There are no explicit templates, you can create as many images as you want from any snapshot. There are 4 major image operations: createVirtualDisk(targetRepoId, size, baseSnapshotId=None, userData={}, options={}): targetRepoId - ID of a connected repo where the disk will be created size - The size of the image you wish to create baseSnapshotId - the ID of the snapshot you want the base the new virtual disk on userData - optional data that will be attached to the new VD, could be anything that the user desires. options - options to modify VDSMs default behavior returns the id of the new VD I think we will also need a function to check if a a VirtualDisk is based on a specific snapshot. Like: isSnapshotOf(virtualDiskId, baseSnapshotID): No, the design is that volume dependencies are an implementation detail. There is no reason for you to know that an image is physically a snapshot of another. Logical snapshots, template information, and any other information can be set by the user by using the userData field available for every image. createSnapshot(targetRepoId, baseVirtualDiskId, userData={}, options={}): targetRepoId - The ID of a connected repo where the new sanpshot will be created and the original image exists as well. size - The size of the image you wish to create baseVirtualDisk - the ID of a mutable image (Virtual Disk) you want to snapshot userData - optional data that will be attached to the new Snapshot, could be anything that the user desires. options - options to modify VDSMs default behavior returns the id of the new Snapshot copyImage(targetRepoId, imageId, baseImageId=None, userData={}, options={}) targetRepoId - The ID of a connected repo where the new image will be created imageId - The image you wish to copy baseImageId - if specified, the new image will contain only the diff between image and Id. If None the new image will contain all the bits of image Id. This can be used to copy partial parts of images for export. userData - optional data that will be attached to the new image, could be anything that the user desires. options - options to modify VDSMs default behavior Does this function mean that we can copy the image from one repository to another repository? Does it cover the semantics of storage migration, storage backup, storage incremental backup? Yes, the main purpose is copying to another repo. and you can even do incremental backups. Also the following flow 1. Run a VM using imageA 2. write to disk 3. Stop VM 4. copy imageA to repoB 5. Run a VM using imageA again 6. Write to disk 7. Stop VM 8. Copy imageA again basing it of imageA_copy1 on repoB creating a diff on repo diff without snapshotting the original image. return the Id of the new image
Re: [vdsm] RFC: New Storage API
On 12/06/2012 10:22 PM, Saggi Mizrahi wrote: - Original Message - From: Shu Ming shum...@linux.vnet.ibm.com To: Saggi Mizrahi smizr...@redhat.com Cc: VDSM Project Development vdsm-devel@lists.fedorahosted.org, engine-devel engine-de...@ovirt.org Sent: Thursday, December 6, 2012 11:02:02 AM Subject: Re: [vdsm] RFC: New Storage API Saggi, Thanks for sharing your thought and I get some comments below. Saggi Mizrahi: I've been throwing a lot of bits out about the new storage API and I think it's time to talk a bit. I will purposefully try and keep implementation details away and concentrate about how the API looks and how you use it. First major change is in terminology, there is no long a storage domain but a storage repository. This change is done because so many things are already called domain in the system and this will make things less confusing for new-commers with a libvirt background. One other changes is that repositories no longer have a UUID. The UUID was only used in the pool members manifest and is no longer needed. connectStorageRepository(repoId, repoFormat, connectionParameters={}): repoId - is a transient name that will be used to refer to the connected domain, it is not persisted and doesn't have to be the same across the cluster. repoFormat - Similar to what used to be type (eg. localfs-1.0, nfs-3.4, clvm-1.2). connectionParameters - This is format specific and will used to tell VDSM how to connect to the repo. Where does repoID come from? I think repoID doesn't exist before connectStorageRepository() return. Isn't repoID a return value of connectStorageRepository()? No, repoIDs are no longer part of the domain, they are just a transient handle. The user can put whatever it wants there as long as it isn't already taken by another currently connected domain. So what happens when user mistakenly gives a repoID that is in use before.. there should be something in the return value that specifies the error and/or reason for error so that user can try with a new/diff repoID ? disconnectStorageRepository(self, repoId) In the new API there are only images, some images are mutable and some are not. mutable images are also called VirtualDisks immutable images are also called Snapshots There are no explicit templates, you can create as many images as you want from any snapshot. There are 4 major image operations: createVirtualDisk(targetRepoId, size, baseSnapshotId=None, userData={}, options={}): targetRepoId - ID of a connected repo where the disk will be created size - The size of the image you wish to create baseSnapshotId - the ID of the snapshot you want the base the new virtual disk on userData - optional data that will be attached to the new VD, could be anything that the user desires. options - options to modify VDSMs default behavior IIUC, i can use options to do storage offloads ? For eg. I can create a LUN that represents this VD on my storage array based on the 'options' parameter ? Is this the intended way to use 'options' ? returns the id of the new VD I think we will also need a function to check if a a VirtualDisk is based on a specific snapshot. Like: isSnapshotOf(virtualDiskId, baseSnapshotID): No, the design is that volume dependencies are an implementation detail. There is no reason for you to know that an image is physically a snapshot of another. Logical snapshots, template information, and any other information can be set by the user by using the userData field available for every image. createSnapshot(targetRepoId, baseVirtualDiskId, userData={}, options={}): targetRepoId - The ID of a connected repo where the new sanpshot will be created and the original image exists as well. size - The size of the image you wish to create baseVirtualDisk - the ID of a mutable image (Virtual Disk) you want to snapshot userData - optional data that will be attached to the new Snapshot, could be anything that the user desires. options - options to modify VDSMs default behavior returns the id of the new Snapshot copyImage(targetRepoId, imageId, baseImageId=None, userData={}, options={}) targetRepoId - The ID of a connected repo where the new image will be created imageId - The image you wish to copy baseImageId - if specified, the new image will contain only the diff between image and Id. If None the new image will contain all the bits of image Id. This can be used to copy partial parts of images for export. userData - optional data that will be attached to the new image, could be anything that the user desires. options - options to modify VDSMs default behavior Does this function mean that we can copy the image from one repository to another repository? Does it cover the semantics of storage migration, storage backup, storage incremental backup? Yes, the main purpose is copying to another repo. and you can even do incremental backups. Also
Re: [vdsm] RFC: New Storage API
On 12/04/2012 03:52 PM, Saggi Mizrahi wrote: I've been throwing a lot of bits out about the new storage API and I think it's time to talk a bit. I will purposefully try and keep implementation details away and concentrate about how the API looks and how you use it. First major change is in terminology, there is no long a storage domain but a storage repository. This change is done because so many things are already called domain in the system and this will make things less confusing for new-commers with a libvirt background. One other changes is that repositories no longer have a UUID. The UUID was only used in the pool members manifest and is no longer needed. connectStorageRepository(repoId, repoFormat, connectionParameters={}): repoId - is a transient name that will be used to refer to the connected domain, it is not persisted and doesn't have to be the same across the cluster. repoFormat - Similar to what used to be type (eg. localfs-1.0, nfs-3.4, clvm-1.2). connectionParameters - This is format specific and will used to tell VDSM how to connect to the repo. disconnectStorageRepository(self, repoId): In the new API there are only images, some images are mutable and some are not. mutable images are also called VirtualDisks immutable images are also called Snapshots There are no explicit templates, you can create as many images as you want from any snapshot. There are 4 major image operations: createVirtualDisk(targetRepoId, size, baseSnapshotId=None, userData={}, options={}): targetRepoId - ID of a connected repo where the disk will be created size - The size of the image you wish to create baseSnapshotId - the ID of the snapshot you want the base the new virtual disk on userData - optional data that will be attached to the new VD, could be anything that the user desires. options - options to modify VDSMs default behavior returns the id of the new VD I'm guessing there will be a way to find out how much space is available for a specified repo before you try to create a virtual disk on it? createSnapshot(targetRepoId, baseVirtualDiskId, userData={}, options={}): targetRepoId - The ID of a connected repo where the new sanpshot will be created and the original image exists as well. size - The size of the image you wish to create baseVirtualDisk - the ID of a mutable image (Virtual Disk) you want to snapshot userData - optional data that will be attached to the new Snapshot, could be anything that the user desires. options - options to modify VDSMs default behavior returns the id of the new Snapshot copyImage(targetRepoId, imageId, baseImageId=None, userData={}, options={}) targetRepoId - The ID of a connected repo where the new image will be created imageId - The image you wish to copy baseImageId - if specified, the new image will contain only the diff between image and Id. If None the new image will contain all the bits of image Id. This can be used to copy partial parts of images for export. userData - optional data that will be attached to the new image, could be anything that the user desires. options - options to modify VDSMs default behavior return the Id of the new image. In case of copying an immutable image the ID will be identical to the original image as they contain the same data. However the user should not assume that and always use the value returned from the method. Can the target repo id be itself? The case where a user wants to make a copy of a virtual disk in the same repo. A caller could snapshot the virtual disk and then create a virtual disk from the snapshot, but if the target repo could be the same as source repo then they could use this call as long as the returned ID was different. Does imageId IO need to be quiesced before calling this or will that be handled in the implementation (eg. snapshot first)? removeImage(repositoryId, imageId, options={}): repositoryId - The ID of a connected repo where the image to delete resides imageId - The id of the image you wish to delete. What is the behavior if you delete snapshots or virtual disks that have dependencies on one another? For example, delete the snapshot a virtual disk is based on or delete the virtual disk a snapshot is based on? getImageStatus(repositoryId, imageId) repositoryId - The ID of a connected repo where the image to check resides imageId - The id of the image you wish to check. All operations return once the operations has been committed to disk NOT when the operation actually completes. This is done so that: - operation come to a stable state as quickly as possible. - In case where there is an SDM, only small portion of the operation actually needs to be performed on the SDM host. - No matter how many times the operation fails and on how many hosts, you can always resume the operation and choose when to do it. -
[vdsm] RFC: New Storage API
I've been throwing a lot of bits out about the new storage API and I think it's time to talk a bit. I will purposefully try and keep implementation details away and concentrate about how the API looks and how you use it. First major change is in terminology, there is no long a storage domain but a storage repository. This change is done because so many things are already called domain in the system and this will make things less confusing for new-commers with a libvirt background. One other changes is that repositories no longer have a UUID. The UUID was only used in the pool members manifest and is no longer needed. connectStorageRepository(repoId, repoFormat, connectionParameters={}): repoId - is a transient name that will be used to refer to the connected domain, it is not persisted and doesn't have to be the same across the cluster. repoFormat - Similar to what used to be type (eg. localfs-1.0, nfs-3.4, clvm-1.2). connectionParameters - This is format specific and will used to tell VDSM how to connect to the repo. disconnectStorageRepository(self, repoId): In the new API there are only images, some images are mutable and some are not. mutable images are also called VirtualDisks immutable images are also called Snapshots There are no explicit templates, you can create as many images as you want from any snapshot. There are 4 major image operations: createVirtualDisk(targetRepoId, size, baseSnapshotId=None, userData={}, options={}): targetRepoId - ID of a connected repo where the disk will be created size - The size of the image you wish to create baseSnapshotId - the ID of the snapshot you want the base the new virtual disk on userData - optional data that will be attached to the new VD, could be anything that the user desires. options - options to modify VDSMs default behavior returns the id of the new VD createSnapshot(targetRepoId, baseVirtualDiskId, userData={}, options={}): targetRepoId - The ID of a connected repo where the new sanpshot will be created and the original image exists as well. size - The size of the image you wish to create baseVirtualDisk - the ID of a mutable image (Virtual Disk) you want to snapshot userData - optional data that will be attached to the new Snapshot, could be anything that the user desires. options - options to modify VDSMs default behavior returns the id of the new Snapshot copyImage(targetRepoId, imageId, baseImageId=None, userData={}, options={}) targetRepoId - The ID of a connected repo where the new image will be created imageId - The image you wish to copy baseImageId - if specified, the new image will contain only the diff between image and Id. If None the new image will contain all the bits of image Id. This can be used to copy partial parts of images for export. userData - optional data that will be attached to the new image, could be anything that the user desires. options - options to modify VDSMs default behavior return the Id of the new image. In case of copying an immutable image the ID will be identical to the original image as they contain the same data. However the user should not assume that and always use the value returned from the method. removeImage(repositoryId, imageId, options={}): repositoryId - The ID of a connected repo where the image to delete resides imageId - The id of the image you wish to delete. getImageStatus(repositoryId, imageId) repositoryId - The ID of a connected repo where the image to check resides imageId - The id of the image you wish to check. All operations return once the operations has been committed to disk NOT when the operation actually completes. This is done so that: - operation come to a stable state as quickly as possible. - In case where there is an SDM, only small portion of the operation actually needs to be performed on the SDM host. - No matter how many times the operation fails and on how many hosts, you can always resume the operation and choose when to do it. - You can stop an operation at any time and remove the resulting object making a distinction between stop because the host is overloaded to I don't want that image This means that after calling any operation that creates a new image the user must then call getImageStatus() to check what is the status of the image. The status of the image can be either optimized, degraded, or broken. Optimized means that the image is available and you can run VMs of it. Degraded means that the image is available and will run VMs but it might be a better way VDSM can represent the underlying data. Broken means that the image can't be used at the moment, probably because not all the data has been set up on the volume. Apart from that VDSM will also return the last persisted status information which will conatin hostID - the last host to try and optimize of fix the image stage - X/Y (eg. 1/10) the last persisted stage of the fix. percent_complete - -1 or 0-100, the
Re: [vdsm] RFC: New Storage API
Thanks for sharing this. It's nice to have something a little more concrete to think about. Just a few comments and questions inline to get some discussion flowing. On Tue, Dec 04, 2012 at 04:52:40PM -0500, Saggi Mizrahi wrote: I've been throwing a lot of bits out about the new storage API and I think it's time to talk a bit. I will purposefully try and keep implementation details away and concentrate about how the API looks and how you use it. First major change is in terminology, there is no long a storage domain but a storage repository. This change is done because so many things are already called domain in the system and this will make things less confusing for new-commers with a libvirt background. One other changes is that repositories no longer have a UUID. The UUID was only used in the pool members manifest and is no longer needed. connectStorageRepository(repoId, repoFormat, connectionParameters={}): We should probably add an options/flags parameter for extension of all new APIs. repoId - is a transient name that will be used to refer to the connected domain, it is not persisted and doesn't have to be the same across the cluster. repoFormat - Similar to what used to be type (eg. localfs-1.0, nfs-3.4, clvm-1.2). connectionParameters - This is format specific and will used to tell VDSM how to connect to the repo. disconnectStorageRepository(self, repoId): I assume 'self' is a mistake here. Just want to clarify given all of the recent talk about instances vs. namespaces. In the new API there are only images, some images are mutable and some are not. mutable images are also called VirtualDisks immutable images are also called Snapshots By mutable you mean writable right? Or does the word mutable imply more than that? There are no explicit templates, you can create as many images as you want from any snapshot. There are 4 major image operations: createVirtualDisk(targetRepoId, size, baseSnapshotId=None, userData={}, options={}): Is userdata a 'StringMap'? I will reopen the argument about an options dict vs a flags parameter. I oppose the dict for expansion because I think it causes APIs to devolve into a mess where lots of arbitrary and not well thought out overrides are packed into the dict over time. A flags argument (in json and python it can be an enum array) limits us to really switching flags on and off instead of passing arbitrary data. targetRepoId - ID of a connected repo where the disk will be created size - The size of the image you wish to create baseSnapshotId - the ID of the snapshot you want the base the new virtual disk on userData - optional data that will be attached to the new VD, could be anything that the user desires. options - options to modify VDSMs default behavior returns the id of the new VD createSnapshot(targetRepoId, baseVirtualDiskId, userData={}, options={}): targetRepoId - The ID of a connected repo where the new sanpshot will be created and the original image exists as well. size - The size of the image you wish to create Why is this needed? Doesn't the size of a snapshot have to be equal to its base image? baseVirtualDisk - the ID of a mutable image (Virtual Disk) you want to snapshot Can you snapshot a snapshot? In that case, this parameter should be called baseImage. userData - optional data that will be attached to the new Snapshot, could be anything that the user desires. options - options to modify VDSMs default behavior returns the id of the new Snapshot copyImage(targetRepoId, imageId, baseImageId=None, userData={}, options={}) targetRepoId - The ID of a connected repo where the new image will be created imageId - The image you wish to copy Do we locate the sourceRepoId automatically based on the imageId? baseImageId - if specified, the new image will contain only the diff between image and Id. If None the new image will contain all the bits of image Id. This can be used to copy partial parts of images for export. userData - optional data that will be attached to the new image, could be anything that the user desires. options - options to modify VDSMs default behavior return the Id of the new image. In case of copying an immutable image the ID will be identical to the original image as they contain the same data. However the user should not assume that and always use the value returned from the method. removeImage(repositoryId, imageId, options={}): repositoryId - The ID of a connected repo where the image to delete resides imageId - The id of the image you wish to delete. getImageStatus(repositoryId, imageId) repositoryId - The ID of a connected repo where the image to check resides imageId - The id of the image you wish to check. What is in this return value? Is it a single enum indicating whether the image is locked (being copied, etc.) or a list of detailed
Re: [vdsm] RFC: New Storage API
- Original Message - From: Adam Litke a...@us.ibm.com To: Saggi Mizrahi smizr...@redhat.com Cc: VDSM Project Development vdsm-devel@lists.fedorahosted.org, engine-devel engine-de...@ovirt.org Sent: Tuesday, December 4, 2012 6:08:25 PM Subject: Re: [vdsm] RFC: New Storage API Thanks for sharing this. It's nice to have something a little more concrete to think about. Just a few comments and questions inline to get some discussion flowing. On Tue, Dec 04, 2012 at 04:52:40PM -0500, Saggi Mizrahi wrote: I've been throwing a lot of bits out about the new storage API and I think it's time to talk a bit. I will purposefully try and keep implementation details away and concentrate about how the API looks and how you use it. First major change is in terminology, there is no long a storage domain but a storage repository. This change is done because so many things are already called domain in the system and this will make things less confusing for new-commers with a libvirt background. One other changes is that repositories no longer have a UUID. The UUID was only used in the pool members manifest and is no longer needed. connectStorageRepository(repoId, repoFormat, connectionParameters={}): We should probably add an options/flags parameter for extension of all new APIs. Usually I agree but connectionParameters is already generic enough :) repoId - is a transient name that will be used to refer to the connected domain, it is not persisted and doesn't have to be the same across the cluster. repoFormat - Similar to what used to be type (eg. localfs-1.0, nfs-3.4, clvm-1.2). connectionParameters - This is format specific and will used to tell VDSM how to connect to the repo. disconnectStorageRepository(self, repoId): I assume 'self' is a mistake here. Just want to clarify given all of the recent talk about instances vs. namespaces. Yea, it's just pasted from my code In the new API there are only images, some images are mutable and some are not. mutable images are also called VirtualDisks immutable images are also called Snapshots By mutable you mean writable right? Or does the word mutable imply more than that? It's a semantic distinction due to implementation details, in general terms, yes. There are no explicit templates, you can create as many images as you want from any snapshot. There are 4 major image operations: createVirtualDisk(targetRepoId, size, baseSnapshotId=None, userData={}, options={}): Is userdata a 'StringMap'? currently it's a json object. We could limit it to a string map and trust the client to parse types. We can have it be a string\blob and trust the user to serialize the data. It's pass-through object either way. I will reopen the argument about an options dict vs a flags parameter. I oppose the dict for expansion because I think it causes APIs to devolve into a mess where lots of arbitrary and not well thought out overrides are packed into the dict over time. A flags argument (in json and python it can be an enum array) limits us to really switching flags on and off instead of passing arbitrary data. We already have strategy that we know we want to have several options. Other stuff that have been suggested is to be able to override the img format (qcow2\qed) The way I envision it is having an class opts = CommandOptions() that you add opts.addStringOption(key, value) opts.addIntOption(key, 3) opt.addBoolOption(key, True) I know you could just as well have strategy_space_flag and strategy_performance_flag and fail the operation if they both exist. Since it is a matter of personal taste I think it should be decided by a vote. targetRepoId - ID of a connected repo where the disk will be created size - The size of the image you wish to create baseSnapshotId - the ID of the snapshot you want the base the new virtual disk on userData - optional data that will be attached to the new VD, could be anything that the user desires. options - options to modify VDSMs default behavior returns the id of the new VD createSnapshot(targetRepoId, baseVirtualDiskId, userData={}, options={}): targetRepoId - The ID of a connected repo where the new sanpshot will be created and the original image exists as well. size - The size of the image you wish to create Why is this needed? Doesn't the size of a snapshot have to be equal to its base image? oops, another copy\paste error, you can see this arg doesn't exist in the method signature. My proof reading do need more work. baseVirtualDisk - the ID of a mutable image (Virtual Disk) you want to snapshot Can you snapshot a snapshot? In that case, this parameter should be called baseImage. You can't snapshot a sanpshot, it makes no sense as it can't change and you will get the same object. userData - optional data