Re: [vdsm] RFC: New Storage API
On 12/04/2012 03:52 PM, Saggi Mizrahi wrote: I've been throwing a lot of bits out about the new storage API and I think it's time to talk a bit. I will purposefully try and keep implementation details away and concentrate about how the API looks and how you use it. First major change is in terminology, there is no long a storage domain but a storage repository. This change is done because so many things are already called domain in the system and this will make things less confusing for new-commers with a libvirt background. One other changes is that repositories no longer have a UUID. The UUID was only used in the pool members manifest and is no longer needed. connectStorageRepository(repoId, repoFormat, connectionParameters={}): repoId - is a transient name that will be used to refer to the connected domain, it is not persisted and doesn't have to be the same across the cluster. repoFormat - Similar to what used to be type (eg. localfs-1.0, nfs-3.4, clvm-1.2). connectionParameters - This is format specific and will used to tell VDSM how to connect to the repo. disconnectStorageRepository(self, repoId): In the new API there are only images, some images are mutable and some are not. mutable images are also called VirtualDisks immutable images are also called Snapshots There are no explicit templates, you can create as many images as you want from any snapshot. There are 4 major image operations: createVirtualDisk(targetRepoId, size, baseSnapshotId=None, userData={}, options={}): targetRepoId - ID of a connected repo where the disk will be created size - The size of the image you wish to create baseSnapshotId - the ID of the snapshot you want the base the new virtual disk on userData - optional data that will be attached to the new VD, could be anything that the user desires. options - options to modify VDSMs default behavior returns the id of the new VD I'm guessing there will be a way to find out how much space is available for a specified repo before you try to create a virtual disk on it? createSnapshot(targetRepoId, baseVirtualDiskId, userData={}, options={}): targetRepoId - The ID of a connected repo where the new sanpshot will be created and the original image exists as well. size - The size of the image you wish to create baseVirtualDisk - the ID of a mutable image (Virtual Disk) you want to snapshot userData - optional data that will be attached to the new Snapshot, could be anything that the user desires. options - options to modify VDSMs default behavior returns the id of the new Snapshot copyImage(targetRepoId, imageId, baseImageId=None, userData={}, options={}) targetRepoId - The ID of a connected repo where the new image will be created imageId - The image you wish to copy baseImageId - if specified, the new image will contain only the diff between image and Id. If None the new image will contain all the bits of image Id. This can be used to copy partial parts of images for export. userData - optional data that will be attached to the new image, could be anything that the user desires. options - options to modify VDSMs default behavior return the Id of the new image. In case of copying an immutable image the ID will be identical to the original image as they contain the same data. However the user should not assume that and always use the value returned from the method. Can the target repo id be itself? The case where a user wants to make a copy of a virtual disk in the same repo. A caller could snapshot the virtual disk and then create a virtual disk from the snapshot, but if the target repo could be the same as source repo then they could use this call as long as the returned ID was different. Does imageId IO need to be quiesced before calling this or will that be handled in the implementation (eg. snapshot first)? removeImage(repositoryId, imageId, options={}): repositoryId - The ID of a connected repo where the image to delete resides imageId - The id of the image you wish to delete. What is the behavior if you delete snapshots or virtual disks that have dependencies on one another? For example, delete the snapshot a virtual disk is based on or delete the virtual disk a snapshot is based on? getImageStatus(repositoryId, imageId) repositoryId - The ID of a connected repo where the image to check resides imageId - The id of the image you wish to check. All operations return once the operations has been committed to disk NOT when the operation actually completes. This is done so that: - operation come to a stable state as quickly as possible. - In case where there is an SDM, only small portion of the operation actually needs to be performed on the SDM host. - No matter how many times the operation fails and on how many hosts, you can always resume the operation and choose when to do it. -
Re: [vdsm] Review needed: 3.2 release feature -- libvdsm
On 11/06/2012 09:49 AM, Dan Kenigsberg wrote: On Mon, Oct 29, 2012 at 10:20:04AM -0500, Adam Litke wrote: Hi everyone, libvdsm is listed as a release feature for 3.2 (preview only)[1][2]. There is a set of patches up in gerrit that could use a wide review from the community. The plan is to merge the new json-rpc server[3] first so if you could concentrate your reviews there it would yield the greatest benefit. Thanks! [1] http://wiki.ovirt.org/wiki/OVirt_3.2_release-management [2] http://wiki.ovirt.org/wiki/Features/libvdsm [3] http://gerrit.ovirt.org/#/c/8614/ [3] defines the format of each message as sizejson-data where size is a binary value, used to split a (tcp) stream into messages. I would like to consider another splitting scheme, which I find better suited to the textual nature of jsonrpc: terminate each message with the newline character. It makes the protocol easier to sniff and debug (in case you've missed part of a message). The down size is that we would need to require clients to escape literal newlines, and unescape them in responses (both are done by python's json module, and the latter is part of the json standard). I use json-rpc for IPC in libStoragemgmt (out of process plug-ins) with unix domain sockets. I adopted the sizejson-data model as well*. I chose this because it allows the use of non-stream capable json parsers. I wanted to ensure that the transport and protocol would be language and parser agnostic. You could achieve the message separation with new lines as you suggest, but then you may have to parse the message stream twice. Once to find the message delimiter and once again to parse the json itself, depending on json parser. Having the size at the beginning of the message is incredibly convenient from a coding efficiency standpoint. As for debug, I just log the message payload if needed. I haven't had the need to use a packet trace, but I'm not sure having a single newline separating messages would be obvious in a single frame capture? Would it be possible to compromise and leave the length and add the newline as the end? So sizepayloadnew line? You could then pass the message payload to the parser with without having to escape the newlines? Regards, Tony *Except size is represented as fixed length, zero padded text instead of binary. ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel
Re: [vdsm] [Libstoragemgmt-devel] [Engine-devel] RFC: Writeup on VDSM-libstoragemgmt integration
On 06/25/2012 09:14 AM, Deepak C Shetty wrote: On 06/25/2012 07:47 AM, Shu Ming wrote: I think VDSM-libstoragemgmt will let the storage array itself to make the snapshot and handle the coordination of the various atomic functions. VDSM should be blocked on the following access to the specific luns which are under snapshotting. I kind of agree. If snapshot is being done at the array level, then the array takes care of quiesing the I/O, taking the snapshot and allowing the I/O, why does VDSM have to worry about anything here, it should all happen transparently for VDSM, isnt it ? The array can take a snapshot in flight, but the data may be in an inconsistent state. Only the end application/user of the storage knows when a point in time is consistent. Typically the application(s) are quiesced, the OS buffers flushed (outstanding tagged IO is allowed to complete) and then the storage is told to make a point in time copy. This is the only way to be sure of what you have on disk is coherent. A transactional database (two-phase commit) and logging file systems (meta data) are specifically written to handle these inconsistencies, but many applications are not. Regards, Tony ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://fedorahosted.org/mailman/listinfo/vdsm-devel
Re: [vdsm] Libstorage and repository engines
On 01/31/2012 04:57 PM, Saggi Mizrahi wrote: All you have to do facilitate a new storage type is to create a domain engine. A domain engine is a python class that implement a minimal interface. 1. It has to be able to create resize and delete a slab (slab being a block of writable storage like a lun\lv\file) 2. It has to be able to create and delete tags (tags are pointers to slabs) The above function are very easy to implement and require very little complexity. All the heavy lifting (image manipulation, cleaning, transaction, atomic operations, etc) is managed by the Image Manager that just uses this unified interface to interact with the different storage types) Your interface is tailored to your specific needs. LibStorageMgmt is trying to provide a generalized interface for storage management. For example a database user will have different needs (e.g. LUN mirror/break mirror/map mirror for backup etc.). By providing as many features as an array provides (in a consistent API) we hope that libStorageMgmt will be useful for many different use cases including those required for virtualization. To allow users the ability to use the hardware they purchased with their platform of choice. Building a repo engine on top of libstorage is completely possible. But as you can see this creates a redundant layer of abstractions in the libstorage side. I agree, it is in the best interest to have fewer places for redundant code/functionality. IMHO asking hardware vendors or open source developers to write specific plug-ins for storage arrays for every application that utilizes it will yield sub-optimal results. Also libstorage will have to keep it's abstraction at a much lower level. This means exposing target specific flags and abilities. eWhile this is good in concept it will mean that the repo engine wrapping libstorage will have to juggle all those flags and calls instead of having different distinct class for each storage type with it's own specific hacks in place. In both approaches you are dealing with the same complexity, it is only your chosen implementation to encapsulate that complexity that is different. Please remember that libStorageMgmt is currently a working prototype (with one vendor) that has a small subset of the functionality it will ultimately provide. It is subject to change quite a bit before we release the first version. If there is something technical you don't like about libStorageMgmt please raise the issue(s) on the mailing list (libstoragemgmt-de...@lists.sourceforge.net) so we can have a discussion. Everyone has an opportunity to help guide its future design and feature set. Regards, Tony ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://fedorahosted.org/mailman/listinfo/vdsm-devel