Re: [vdsm] object instancing in the new VDSM API

Adam Litke Tue, 04 Dec 2012 12:01:43 -0800

Thanks for your detailed response...

On Mon, Dec 03, 2012 at 09:26:34PM -0500, Saggi Mizrahi wrote:
> So from what I gather the only thing that is bothering you is that storage
> operations require a lot of IDs.  I get that, I hate that to. It doesn't
> change the point that it was designed that way.  Even if you deem some use
> cases irrelevant it wouldn't change the fact that this is how people use it
> now.  And because we are going to throw it away as soon as we can there is no
> reason to shape out API around that.


In that case, I want to throw away the bad architecture along with the bad API.
In the future I would like to see:

1) Objects can be uniquely identified by a single UUID.   This means you would
not be able to reuse the same UUID on a different host/domain unless you are
talking about the same object (ie. move image).  If we think this is going to be
a problem, let's discuss the specific use cases.

2) Verbs should not have non-obvious preconditions or overloaded semantics
(basically, we need to get rid of the issues with storage pools and images that
you explain below).

> So from what I gather we agree on instancing.

Sure.  I am willing to adopt namespaces instead of instances as long as the
above is adhered to in the new design.

I do have to ask again, when do you think the new storage stuff will be ready
for serious review, testing and consideration for merging?  I would be happy to
spend a significant amount of time helping out with this if the end result has
us closing on this 2+ year endeavor :)

> 
> ---
> 
> From this moment on I'm going to try my best to explain how VDSM storage
> currently works.  It is filled with misdirection and bad design. I hope that
> after that you will understand why you can't pack all the IDs together.
> 
> Let's start with the storage pool. Because it was simpler to have all metadata
> changing operations run on the same host someone needed to find a way to make
> cross domain operations work on the same host.  The solution was to band them
> all to a single entity call the storage pool and have a single lock. The point
> was to have a host be able to connect to multiple pools at a time.  Due to bad
> code (that could have been easily not have been so bad) the multiple pools
> feature was never implemented. Because the single lock to rule them all
> doesn't really work when you want to secure domain we had to add more locks
> making the pool concept obsolete.
> 
> These means that you can trust VDSM to only be connected to a single pool at
> the time, this means that if you want to change anything you can just remove
> the pool arg.

Is there a reason that vdsm doesn't automatically "connect" to the pool noted in
the master storage domain?  It's fine if it doesn't become spm, but it would be
nice to reduce the number of steps required to bring storage back up after a
reboot.

Also, do you see significant changes in the storage domain related verbs?  I
guess we will remove the attach/detach/activate/deactivate verbs since storage
pools are going away.

> Lets go to volumes and images. Contrary to how it's name imgUUID does not
> represent and image. It's actually a tag given to part of a chain. This is
> commonly used to differentiate between parts of the chain responsible for VM
> images and templates. Due to bad code a lot of the possible combinations are
> not supported but that is the intention.  imgUUID being a tag means that it
> serves 3 purposes depending on the verb that uses it.  1) In some verbs it
> used as a useless sanity check to make sure the volume is tagged with this
> sdUUID.  This I imagine was done because someone didn't fully comprehend how
> and why you do sanity checks.  This means that in some verbs you can just
> remove it (if you are actually changing anything) 2) In some verbs it's meant
> to distinct the volume from it's original chain (creating a template). At that
> point it's actually now being invented by the caller.  3) Operations that act
> on the whole chain, if volUUID is there is for the same useless sanity check
> and can be removed.
> 
> What you need to get out of this is that most of the time you can use less IDs
> just by removing useless imgUUID or volUUID args.  Further more, you need to
> understand that they are not hierarchical. imgUUID is a "tag" on the volume.
> similar to "user" for a file.
> 
> As for domain IDs, because the caller can choose to reuse imgUUIDs and
> volUUIDs on different domains and some flows actually depend on that.  To make
> things simpler some verbs should be split up so how you specify that target
> volID doesn't affect the actual command.
> 
> This means that copyImage() and createTemplate() should be split to:
> copyImage(dstDomain, srcDomain, imgUUID) createTemplate(dstDomain, dstImgUUID,
> srcDomain, srcImgUUID)
> 
> That being said, I'm personally still against an indeterminate storage API
> because of engine adoption problems.  But if you want to "fix" the current
> interface. Packing up the IDs to a single ID wouldn't work and is logically
> wrong.  What you need to do is remove redundant arguments and split up verbs
> that do more then one thing.
> 
> ----- Original Message -----
> > From: "Adam Litke" <[email protected]> To: "Saggi Mizrahi"
> > <[email protected]> Cc: "vdsm-devel" <[email protected]>, "Ayal
> > Baron" <[email protected]>, "Barak Azulay" <[email protected]>, "ybronhei"
> > <[email protected]> Sent: Monday, December 3, 2012 5:46:31 PM Subject: Re:
> > object instancing in the new VDSM API
> > 
> > On Mon, Dec 03, 2012 at 04:34:28PM -0500, Saggi Mizrahi wrote:
> > > Currently the suggested scheme treats everything as instances and object
> > > have methods.  This puts instancing as the responsibility of the API
> > > bindings.  I suggest changing it to the way json was designed with
> > > namespaces and methods.
> > > 
> > > For example instead for the api being:
> > > 
> > > vm = host.getVMsList()[0] vm.getInfo()
> > > 
> > > the API should be:
> > > 
> > > vmID = host.getVMsList()[0] api.VMsManager.getVMInfo(vmID)
> > > 
> > > And it should be up to decide how to wrap everything in objects.
> > 
> > For VMs, your example looks nice, but for today's Volumes it's not so nice.
> > To properly identify a Volume, we must pass the storage pool id, storage
> > domain id, image id, and volume id.  If we are working with two Volumes, we
> > would need 8 parameters unless we optimize for context and assume that the
> > storage pool uuid is the same for both volumes, etc.  The problem with that
> > optimization is that we require clients to understand internal
> > implementation details.
> > 
> > How should the StorageDomain.getVolumes API return a list of Volumes?  A
> > list of Volume ids is not enough information for most commands that involve
> > a Volume.
> > 
> > > The problem with the API bindings controlling the instancing is that: 1)
> > > We have to *have* and *pass* implicit api obj which is problematic to
> > > maintain.  For example, you have to have the api object as a member of
> > > instance for the method calls to work.  This means that you can't recreate
> > > or pool API objects easily. You effectively need to add a "move" method to
> > > move the object to another API object to use it on a different host.
> > 
> > You already make assumptions like this when passing around bare UUIDs.  For
> > example, you know that a Storage Domain cannot be associated with multiple
> > Storage Pools at the same time.  With instantiated objects, all of those
> > associations are baked into the objects.  A client never constructs objects.
> > It only receives pre-instantiated objects by calling other APIs.
> > 
> > > 2) Because the objects are opaque it might be hard to know what fields of
> > > the instance to persist to get the same object.
> > 
> > No.  You just persist the whole object identifier the way it was given to
> > you.  In the case of Volumes, it may be an object containing 4 string uuids.
> > It could also be a string in the form "/spuuid/sduuid/imguuid/voluuid".  In
> > the end it doesn't really matter which form it's in because the client will
> > not manipulate it.  Perhaps some flattened string is best in order to enable
> > easy database storage.
> > 
> > > 3) It breaks the distinction between by-value and by-reference objects.
> > 
> > The distinction is made in the schema.  Reference objects have methods and
> > are called 'class' in the schema.  Value objects have only fields and are
> > called 'type'.
> > 
> > > 4) Any serious user will make it's own instance classes that conform to
> > > it's design and flow so they don't really add any convenience to anything
> > > apart for tests.  You will create you're own VM object, and because it's
> > > in the manager scope it will be the same instance across all hosts.
> > > Instead of being able to pass the same ID to any host (as the vmID remains
> > > the same) you will have to create and instance object to use either before
> > > every call for simplicity or cache for each host for performance benefits.
> > 
> > This is a pretty good argument for using namespacing instead of instances,
> > however...  I still think that all object references need to be an opaque
> > type and it should not be legal to roll your own object reference from a set
> > of other objects (eq. create a volume reference from imgUUID, sdUUID, and
> > spUUID).  The API should be explicit about the relationships between
> > objects.
> > 
> > If you want to write your own instance classes you still can.  Just pass the
> > vdsm-generated identifier into your object's constructor to use for later
> > API calls.
> > 
> > > 5) It makes us pass a weird "__obj__" parameter to each call that
> > > symbolizes self and makes it hard for a user that choose to use it's own
> > > bindings to understand what it does.
> > 
> > Fair.  '__obj__' is a terrible name.  I would be okay with changing the
> > semantics so that all API calls take an 'id' parameter as their first
> > argument.  I guess this could always be a string with an unspecified format.
> > For Volumes, we can decide how we want to encode the 4 uuids.  Vdsm would
> > then need to parse this value on the server side to pull out the relevant
> > IDs.
> > 
> > > 6) It's syntactic sugar at best that adds needless limitation to how a
> > > user can play with the IDs and the API.
> > > 
> > > I personally think there is a reason why json-rpc defines name-spaces and
> > > methods and forgoes instance. It's simpler (for the implementation), more
> > > flexible, and it give the user more choice. Trying to hack that in will
> > > just cause needless complications IMHO. IDs should are just strings no
> > > need to complicate them
> > > 
> > > 
> > > By-Value objects should still be defined and instantiated by the bindings
> > > because unlike IDs we need to make sure all the fields exist and are in
> > > the correct type.
> > > 
> > 
> > -- Adam Litke <[email protected]> IBM Linux Technology Center
> > 
> > 
> 

-- 
Adam Litke <[email protected]>
IBM Linux Technology Center

_______________________________________________
vdsm-devel mailing list
[email protected]
https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel

Re: [vdsm] object instancing in the new VDSM API

Reply via email to