Re: [vdsm] RFC: New Storage API

2012-12-05 Thread Tony Asleson
On 12/04/2012 03:52 PM, Saggi Mizrahi wrote:
 I've been throwing a lot of bits out about the new storage API and I think 
 it's time to talk a bit.
 I will purposefully try and keep implementation details away and concentrate 
 about how the API looks and how you use it.
 
 First major change is in terminology, there is no long a storage domain but a 
 storage repository.
 This change is done because so many things are already called domain in the 
 system and this will make things less confusing for new-commers with a 
 libvirt background.
 
 One other changes is that repositories no longer have a UUID.
 The UUID was only used in the pool members manifest and is no longer needed.
 
 
 connectStorageRepository(repoId, repoFormat, connectionParameters={}):
 repoId - is a transient name that will be used to refer to the connected 
 domain, it is not persisted and doesn't have to be the same across the 
 cluster.
 repoFormat - Similar to what used to be type (eg. localfs-1.0, nfs-3.4, 
 clvm-1.2).
 connectionParameters - This is format specific and will used to tell VDSM how 
 to connect to the repo.
 
 disconnectStorageRepository(self, repoId):
 
 
 In the new API there are only images, some images are mutable and some are 
 not.
 mutable images are also called VirtualDisks
 immutable images are also called Snapshots
 
 There are no explicit templates, you can create as many images as you want 
 from any snapshot.
 
 There are 4 major image operations:
 
 
 createVirtualDisk(targetRepoId, size, baseSnapshotId=None,
   userData={}, options={}):
 
 targetRepoId - ID of a connected repo where the disk will be created
 size - The size of the image you wish to create
 baseSnapshotId - the ID of the snapshot you want the base the new virtual 
 disk on
 userData - optional data that will be attached to the new VD, could be 
 anything that the user desires.
 options - options to modify VDSMs default behavior
 
 returns the id of the new VD

I'm guessing there will be a way to find out how much space is available
for a specified repo before you try to create a virtual disk on it?

 
 createSnapshot(targetRepoId, baseVirtualDiskId,
userData={}, options={}):
 targetRepoId - The ID of a connected repo where the new sanpshot will be 
 created and the original image exists as well.
 size - The size of the image you wish to create
 baseVirtualDisk - the ID of a mutable image (Virtual Disk) you want to 
 snapshot
 userData - optional data that will be attached to the new Snapshot, could be 
 anything that the user desires.
 options - options to modify VDSMs default behavior
 
 returns the id of the new Snapshot
 
 copyImage(targetRepoId, imageId, baseImageId=None, userData={}, options={})
 targetRepoId - The ID of a connected repo where the new image will be created
 imageId - The image you wish to copy
 baseImageId - if specified, the new image will contain only the diff between 
 image and Id.
   If None the new image will contain all the bits of image Id. 
 This can be used to copy partial parts of images for export.
 userData - optional data that will be attached to the new image, could be 
 anything that the user desires.
 options - options to modify VDSMs default behavior
 
 return the Id of the new image. In case of copying an immutable image the ID 
 will be identical to the original image as they contain the same data. 
 However the user should not assume that and always use the value returned 
 from the method.

Can the target repo id be itself?  The case where a user wants to make a
copy of a virtual disk in the same repo.  A caller could snapshot the
virtual disk and then create a virtual disk from the snapshot, but if
the target repo could be the same as source repo then they could use
this call as long as the returned ID was different.

Does imageId IO need to be quiesced before calling this or will that be
handled in the implementation (eg. snapshot first)?

 removeImage(repositoryId, imageId, options={}):
 repositoryId - The ID of a connected repo where the image to delete resides
 imageId - The id of the image you wish to delete.


What is the behavior if you delete snapshots or virtual disks that have
dependencies on one another?  For example, delete the snapshot a virtual
disk is based on or delete the virtual disk a snapshot is based on?

 
 
 getImageStatus(repositoryId, imageId)
 repositoryId - The ID of a connected repo where the image to check resides
 imageId - The id of the image you wish to check.
 
 All operations return once the operations has been committed to disk NOT when 
 the operation actually completes.
 This is done so that:
 - operation come to a stable state as quickly as possible.
 - In case where there is an SDM, only small portion of the operation actually 
 needs to be performed on the SDM host.
 - No matter how many times the operation fails and on how many hosts, you can 
 always resume the operation and choose when to do it.
 - 

Re: [vdsm] Review needed: 3.2 release feature -- libvdsm

2012-11-06 Thread Tony Asleson
On 11/06/2012 09:49 AM, Dan Kenigsberg wrote:
 On Mon, Oct 29, 2012 at 10:20:04AM -0500, Adam Litke wrote:
 Hi everyone,

 libvdsm is listed as a release feature for 3.2 (preview only)[1][2].  There 
 is a
 set of patches up in gerrit that could use a wide review from the community.
 The plan is to merge the new json-rpc server[3] first so if you could
 concentrate your reviews there it would yield the greatest benefit.  Thanks!

 [1] http://wiki.ovirt.org/wiki/OVirt_3.2_release-management
 [2] http://wiki.ovirt.org/wiki/Features/libvdsm
 [3] http://gerrit.ovirt.org/#/c/8614/
 
 [3] defines the format of each message as
 
 sizejson-data
 
 where size is a binary value, used to split a (tcp) stream into
 messages. I would like to consider another splitting scheme, which I
 find better suited to the textual nature of jsonrpc: terminate each
 message with the newline character. It makes the protocol easier to
 sniff and debug (in case you've missed part of a message).
 
 The down size is that we would need to require clients to
 escape literal newlines, and unescape them in responses (both are done
 by python's json module, and the latter is part of the json standard).

I use json-rpc for IPC in libStoragemgmt (out of process plug-ins) with
unix domain sockets.  I adopted the sizejson-data model as well*.

I chose this because it allows the use of non-stream capable json
parsers.  I wanted to ensure that the transport and protocol would be
language and parser agnostic.

You could achieve the message separation with new lines as you suggest,
but then you may have to parse the message stream twice.  Once to find
the message delimiter and once again to parse the json itself, depending
on json parser.  Having the size at the beginning of the message is
incredibly convenient from a coding efficiency standpoint.

As for debug, I just log the message payload if needed.  I haven't had
the need to use a packet trace, but I'm not sure having a single newline
separating messages would be obvious in a single frame capture?

Would it be possible to compromise and leave the length and add the
newline as the end?  So sizepayloadnew line?  You could then pass
the message payload to the parser with without having to escape the
newlines?

Regards,
Tony

*Except size is represented as fixed length, zero padded text instead of
binary.
___
vdsm-devel mailing list
vdsm-devel@lists.fedorahosted.org
https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel


Re: [vdsm] [Libstoragemgmt-devel] [Engine-devel] RFC: Writeup on VDSM-libstoragemgmt integration

2012-06-25 Thread Tony Asleson
On 06/25/2012 09:14 AM, Deepak C Shetty wrote:
 On 06/25/2012 07:47 AM, Shu Ming wrote:

 I think VDSM-libstoragemgmt will let the storage array itself to make 
 the snapshot and handle the coordination of the various atomic 
 functions. VDSM should be blocked on the following access to the 
 specific luns which are under snapshotting.
 
 I kind of agree. If snapshot is being done at the array level, then the 
 array takes care of quiesing the I/O, taking the snapshot and allowing 
 the I/O, why does VDSM have to worry about anything here, it should all 
 happen transparently for VDSM, isnt it ?

The array can take a snapshot in flight, but the data may be in an
inconsistent state.  Only the end application/user of the storage knows
when a point in time is consistent.  Typically the application(s) are
quiesced, the OS buffers flushed (outstanding tagged IO is allowed to
complete) and then the storage is told to make a point in time copy.
This is the only way to be sure of what you have on disk is coherent.

A transactional database (two-phase commit) and logging file systems
(meta data) are specifically written to handle these inconsistencies,
but many applications are not.

Regards,
Tony

___
vdsm-devel mailing list
vdsm-devel@lists.fedorahosted.org
https://fedorahosted.org/mailman/listinfo/vdsm-devel


Re: [vdsm] Libstorage and repository engines

2012-02-01 Thread Tony Asleson
On 01/31/2012 04:57 PM, Saggi Mizrahi wrote:
 All you have to do facilitate a new storage type is to create a
 domain engine.

 A domain engine is a python class that implement a minimal
 interface. 1. It has to be able to create resize and delete a slab
 (slab being a block of writable storage like a lun\lv\file) 2. It has
 to be able to create and delete tags (tags are pointers to slabs)

 The above function are very easy to implement and require very little
 complexity. All the heavy lifting (image manipulation, cleaning,
 transaction, atomic operations, etc) is managed by the Image Manager
 that just uses this unified interface to interact with the different
 storage types)

Your interface is tailored to your specific needs.  LibStorageMgmt is
trying to provide a generalized interface for storage management.  For
example a database user will have different needs (e.g. LUN mirror/break
mirror/map mirror for backup etc.).  By providing as many features as an
array provides (in a consistent API) we hope that libStorageMgmt will be
useful for many different use cases including those required for
virtualization.  To allow users the ability to use the hardware they
purchased with their platform of choice.

 Building a repo engine on top of libstorage is completely possible.
 But as you can see this creates a redundant layer of abstractions in
 the libstorage side.

I agree, it is in the best interest to have fewer places for redundant
code/functionality.  IMHO asking hardware vendors or open source
developers to write specific plug-ins for storage arrays for every
application that utilizes it will yield sub-optimal results.

 Also libstorage will have to keep it's abstraction at a much lower
 level. This means exposing target specific flags and abilities.
 eWhile this is good in concept it will mean that the repo engine
 wrapping libstorage will have to juggle all those flags and calls
 instead of having different distinct class for each storage type with
 it's own specific hacks in place.

In both approaches you are dealing with the same complexity, it is only
your chosen implementation to encapsulate that complexity that is different.

Please remember that libStorageMgmt is currently a working prototype
(with one vendor) that has a small subset of the functionality it will
ultimately provide.  It is subject to change quite a bit before we
release the first version.  If there is something technical you don't
like about libStorageMgmt please raise the issue(s) on the mailing list
(libstoragemgmt-de...@lists.sourceforge.net) so we can have a
discussion.  Everyone has an opportunity to help guide its future design
and feature set.

Regards,
Tony

___
vdsm-devel mailing list
vdsm-devel@lists.fedorahosted.org
https://fedorahosted.org/mailman/listinfo/vdsm-devel