Re: [vdsm] RFC: New Storage API

Shu Ming Thu, 06 Dec 2012 08:02:47 -0800

Saggi,

Thanks for sharing your thought and I get some comments below.



Saggi Mizrahi:

I've been throwing a lot of bits out about the new storage API and I think it's 
time to talk a bit.
I will purposefully try and keep implementation details away and concentrate 
about how the API looks and how you use it.

First major change is in terminology, there is no long a storage domain but a 
storage repository.
This change is done because so many things are already called domain in the 
system and this will make things less confusing for new-commers with a libvirt 
background.

One other changes is that repositories no longer have a UUID.
The UUID was only used in the pool members manifest and is no longer needed.


connectStorageRepository(repoId, repoFormat, connectionParameters={}):
repoId - is a transient name that will be used to refer to the connected 
domain, it is not persisted and doesn't have to be the same across the cluster.
repoFormat - Similar to what used to be type (eg. localfs-1.0, nfs-3.4, 
clvm-1.2).
connectionParameters - This is format specific and will used to tell VDSM how 
to connect to the repo.

Where does repoID come from? I think repoID doesn't exist beforeconnectStorageRepository() return. Isn't repoID a return value ofconnectStorageRepository()?


disconnectStorageRepository(self, repoId)


In the new API there are only images, some images are mutable and some are not.
mutable images are also called VirtualDisks
immutable images are also called Snapshots

There are no explicit templates, you can create as many images as you want from 
any snapshot.

There are 4 major image operations:


createVirtualDisk(targetRepoId, size, baseSnapshotId=None,
                   userData={}, options={}):

targetRepoId - ID of a connected repo where the disk will be created
size - The size of the image you wish to create
baseSnapshotId - the ID of the snapshot you want the base the new virtual disk 
on
userData - optional data that will be attached to the new VD, could be anything 
that the user desires.
options - options to modify VDSMs default behavior

returns the id of the new VD

I think we will also need a function to check if a a VirtualDisk isbased on a specific snapshot.

Like: isSnapshotOf(virtualDiskId, baseSnapshotID):


createSnapshot(targetRepoId, baseVirtualDiskId,
                userData={}, options={}):
targetRepoId - The ID of a connected repo where the new sanpshot will be 
created and the original image exists as well.
size - The size of the image you wish to create
baseVirtualDisk - the ID of a mutable image (Virtual Disk) you want to snapshot
userData - optional data that will be attached to the new Snapshot, could be 
anything that the user desires.
options - options to modify VDSMs default behavior

returns the id of the new Snapshot

copyImage(targetRepoId, imageId, baseImageId=None, userData={}, options={})
targetRepoId - The ID of a connected repo where the new image will be created
imageId - The image you wish to copy
baseImageId - if specified, the new image will contain only the diff between 
image and Id.
               If None the new image will contain all the bits of image Id. 
This can be used to copy partial parts of images for export.
userData - optional data that will be attached to the new image, could be 
anything that the user desires.
options - options to modify VDSMs default behavior

Does this function mean that we can copy the image from one repositoryto another repository? Does it cover the semantics of storage migration,storage backup, storage incremental backup?


return the Id of the new image. In case of copying an immutable image the ID 
will be identical to the original image as they contain the same data. However 
the user should not assume that and always use the value returned from the 
method.

removeImage(repositoryId, imageId, options={}):
repositoryId - The ID of a connected repo where the image to delete resides
imageId - The id of the image you wish to delete.


----
getImageStatus(repositoryId, imageId)
repositoryId - The ID of a connected repo where the image to check resides
imageId - The id of the image you wish to check.

All operations return once the operations has been committed to disk NOT when 
the operation actually completes.
This is done so that:
- operation come to a stable state as quickly as possible.
- In case where there is an SDM, only small portion of the operation actually 
needs to be performed on the SDM host.
- No matter how many times the operation fails and on how many hosts, you can 
always resume the operation and choose when to do it.
- You can stop an operation at any time and remove the resulting object making a distinction 
between "stop because the host is overloaded" to "I don't want that image"

This means that after calling any operation that creates a new image the user 
must then call getImageStatus() to check what is the status of the image.
The status of the image can be either optimized, degraded, or broken.
"Optimized" means that the image is available and you can run VMs of it.
"Degraded" means that the image is available and will run VMs but it might be a 
better way VDSM can represent the underlying data.


What does the "represent" mean here?

"Broken" means that the image can't be used at the moment, probably because not 
all the data has been set up on the volume.

Apart from that VDSM will also return the last persisted status information 
which will conatin
hostID - the last host to try and optimize of fix the image

Any host can optimize the image? No need to be SDM?

stage - X/Y (eg. 1/10) the last persisted stage of the fix.
percent_complete - -1 or 0-100, the last persisted completion percentage of the 
aforementioned stage. -1 means that no progress is available for that operation.
last_error - This will only be filled if the operation failed because of 
something other then IO or a VDSM crash for obvious reasons.
              It will usually be set if the task was manually stopped

The user can either be satisfied with that information or as the host specified 
in host ID if it is still working on that image by checking it's running tasks.


So we need a function to know what tasks are running on the image


checkStorageRepository(self, repositoryId, options={}):
A method to go over a storage repository and scan for any existing problems. 
This includes degraded\broken images and deleted images that have no yet been 
physically deleted\merged.
It returns a list of Fix objects.
Fix objects come in 4 types:
clean - cleans data, run them to get more space.
optimize - run them to optimize a degraded image
merge - Merges two images together. Doing this sometimes
         makes more images ready optimizing or cleaning.
         The reason it is different from optimize is that
         unmerged images are considered optimized.
mend - mends a broken image

The user can read these types and prioritize fixes. Fixes also contain opaque 
FIX data and they should be sent as received to
fixStorageRepository(self, repositoryId, fix, options={}):

That will start a fix operation.


All major operations automatically start the appropriate "Fix" to bring the 
created object to an optimize\degraded state (the one that is quicker) unless one of the 
options is
AutoFix=False. This is only useful for repos that might not be able to create 
volumes on all hosts (SDM) but would like to have the actual IO distributed in 
the cluster.

Other common options is the strategy option:
It has currently 2 possible values
space and performance - In case VDSM has 2 ways of completing the same 
operation it will tell it to value one over the other. For example, whether to 
copy all the data or just create a qcow based of a snapshot.
The default is space.

You might have also noticed that it is never explicitly specified where to look 
for existing images. This is done purposefully, VDSM will always look in all 
connected repositories for existing objects.
For very large setups this might be problematic. To mitigate the problem you 
have these options:
participatingRepositories=[repoId, ...] which tell VDSM to narrow the search to 
just these repositories
and
imageHints={imgId: repoId} which will force VDSM to look for those image ID 
just in those repositories and fail if it doesn't find them there.
_______________________________________________
vdsm-devel mailing list
vdsm-devel@lists.fedorahosted.org
https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel



--
---
舒明 Shu Ming
Open Virtualization Engineerning; CSTL, IBM Corp.
Tel: 86-10-82451626  Tieline: 9051626 E-mail: shum...@cn.ibm.com or 
shum...@linux.vnet.ibm.com
Address: 3/F Ring Building, ZhongGuanCun Software Park, Haidian District, 
Beijing 100193, PRC


_______________________________________________
vdsm-devel mailing list
vdsm-devel@lists.fedorahosted.org
https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel

Re: [vdsm] RFC: New Storage API

Reply via email to