Re: [vdsm] object instancing in the new VDSM API

Saggi Mizrahi Mon, 03 Dec 2012 18:26:45 -0800

So from what I gather the only thing that is bothering you is that storage 
operations require a lot of IDs.
I get that, I hate that to. It doesn't change the point that it was designed 
that way.
Even if you deem some use cases irrelevant it wouldn't change the fact that 
this is how people use it now.
And because we are going to throw it away as soon as we can there is no reason 
to shape out API around that.

So from what I gather we agree on instancing.

---

From this moment on I'm going to try my best to explain how VDSM storage 
currently works.
It is filled with misdirection and bad design. I hope that after that you will 
understand why you can't pack all the IDs together.

Let's start with the storage pool. Because it was simpler to have all metadata 
changing operations run on the same host someone needed to find a way to make 
cross domain operations work on the same host.
The solution was to band them all to a single entity call the storage pool and 
have a single lock. The point was to have a host be able to connect to multiple 
pools at a time.
Due to bad code (that could have been easily not have been so bad) the multiple 
pools feature was never implemented. Because the single lock to rule them all 
doesn't really work when you want to secure domain we had to add more locks 
making the pool concept obsolete.

These means that you can trust VDSM to only be connected to a single pool at 
the time, this means that if you want to change anything you can just remove 
the pool arg.

Lets go to volumes and images. Contrary to how it's name imgUUID does not 
represent and image. It's actually a tag given to part of a chain. This is 
commonly used to differentiate between parts of the chain responsible for VM 
images and templates. Due to bad code a lot of the possible combinations are 
not supported but that is the intention.
imgUUID being a tag means that it serves 3 purposes depending on the verb that 
uses it.
1) In some verbs it used as a useless sanity check to make sure the volume is 
tagged with this sdUUID.
   This I imagine was done because someone didn't fully comprehend how and why 
you do sanity checks.
   This means that in some verbs you can just remove it (if you are actually 
changing anything)
2) In some verbs it's meant to distinct the volume from it's original chain 
(creating a template). At that point it's actually now being invented by the 
caller.
3) Operations that act on the whole chain, if volUUID is there is for the same 
useless sanity check and can be removed.

What you need to get out of this is that most of the time you can use less IDs 
just by removing useless imgUUID or volUUID args.
Further more, you need to understand that they are not hierarchical. imgUUID is 
a "tag" on the volume. similar to "user" for a file.

As for domain IDs, because the caller can choose to reuse imgUUIDs and volUUIDs 
on different domains and some flows actually depend on that.
To make things simpler some verbs should be split up so how you specify that 
target volID doesn't affect the actual command.

This means that copyImage() and createTemplate() should be split to:
copyImage(dstDomain, srcDomain, imgUUID)
createTemplate(dstDomain, dstImgUUID, srcDomain, srcImgUUID)

That being said, I'm personally still against an indeterminate storage API 
because of engine adoption problems.
But if you want to "fix" the current interface. Packing up the IDs to a single 
ID wouldn't work and is logically wrong.
What you need to do is remove redundant arguments and split up verbs that do 
more then one thing.

----- Original Message -----
> From: "Adam Litke" <[email protected]>
> To: "Saggi Mizrahi" <[email protected]>
> Cc: "vdsm-devel" <[email protected]>, "Ayal Baron" 
> <[email protected]>, "Barak Azulay"
> <[email protected]>, "ybronhei" <[email protected]>
> Sent: Monday, December 3, 2012 5:46:31 PM
> Subject: Re: object instancing in the new VDSM API
> 
> On Mon, Dec 03, 2012 at 04:34:28PM -0500, Saggi Mizrahi wrote:
> > Currently the suggested scheme treats everything as instances and
> > object have methods.
> > This puts instancing as the responsibility of the API bindings.
> > I suggest changing it to the way json was designed with namespaces
> > and methods.
> > 
> > For example instead for the api being:
> > 
> > vm = host.getVMsList()[0]
> > vm.getInfo()
> > 
> > the API should be:
> > 
> > vmID = host.getVMsList()[0]
> > api.VMsManager.getVMInfo(vmID)
> > 
> > And it should be up to decide how to wrap everything in objects.
> 
> For VMs, your example looks nice, but for today's Volumes it's not so
> nice.  To
> properly identify a Volume, we must pass the storage pool id, storage
> domain id,
> image id, and volume id.  If we are working with two Volumes, we
> would need 8
> parameters unless we optimize for context and assume that the storage
> pool uuid
> is the same for both volumes, etc.  The problem with that
> optimization is that
> we require clients to understand internal implementation details.
> 
> How should the StorageDomain.getVolumes API return a list of Volumes?
>  A list of
> Volume ids is not enough information for most commands that involve a
> Volume.
> 
> > The problem with the API bindings controlling the instancing is
> > that:
> > 1) We have to *have* and *pass* implicit api obj which is
> > problematic to maintain.
> >    For example, you have to have the api object as a member of
> >    instance for the
> >    method calls to work.  This means that you can't recreate or
> >    pool API
> >    objects easily. You effectively need to add a "move" method to
> >    move the
> >    object to another API object to use it on a different host.
> 
> You already make assumptions like this when passing around bare
> UUIDs.  For
> example, you know that a Storage Domain cannot be associated with
> multiple
> Storage Pools at the same time.  With instantiated objects, all of
> those
> associations are baked into the objects.  A client never constructs
> objects.  It
> only receives pre-instantiated objects by calling other APIs.
> 
> > 2) Because the objects are opaque it might be hard to know what
> > fields of the
> >    instance to persist to get the same object.
> 
> No.  You just persist the whole object identifier the way it was
> given to you.
> In the case of Volumes, it may be an object containing 4 string
> uuids.  It could
> also be a string in the form "/spuuid/sduuid/imguuid/voluuid".  In
> the end it
> doesn't really matter which form it's in because the client will not
> manipulate
> it.  Perhaps some flattened string is best in order to enable easy
> database
> storage.
> 
> > 3) It breaks the distinction between by-value and by-reference
> > objects.
> 
> The distinction is made in the schema.  Reference objects have
> methods and are
> called 'class' in the schema.  Value objects have only fields and are
> called
> 'type'.
> 
> > 4) Any serious user will make it's own instance classes that
> > conform to it's
> >    design and flow so they don't really add any convenience to
> >    anything apart
> >    for tests.
> >    You will create you're own VM object, and because it's in the
> >    manager scope
> >    it will be the same instance across all hosts. Instead of being
> >    able to pass
> >    the same ID to any host (as the vmID remains the same) you will
> >    have to
> >    create and instance object to use either before every call for
> >    simplicity
> >    or cache for each host for performance benefits.
> 
> This is a pretty good argument for using namespacing instead of
> instances,
> however...  I still think that all object references need to be an
> opaque type
> and it should not be legal to roll your own object reference from a
> set of other
> objects (eq. create a volume reference from imgUUID, sdUUID, and
> spUUID).  The
> API should be explicit about the relationships between objects.
> 
> If you want to write your own instance classes you still can.  Just
> pass the
> vdsm-generated identifier into your object's constructor to use for
> later API
> calls.
> 
> > 5) It makes us pass a weird "__obj__" parameter to each call that
> > symbolizes
> >    self and makes it hard for a user that choose to use it's own
> >    bindings to
> >    understand what it does.
> 
> Fair.  '__obj__' is a terrible name.  I would be okay with changing
> the
> semantics so that all API calls take an 'id' parameter as their first
> argument.
> I guess this could always be a string with an unspecified format.
>  For Volumes,
> we can decide how we want to encode the 4 uuids.  Vdsm would then
> need to parse
> this value on the server side to pull out the relevant IDs.
> 
> > 6) It's syntactic sugar at best that adds needless limitation to
> > how a user can
> >    play with the IDs and the API.
> > 
> > I personally think there is a reason why json-rpc defines
> > name-spaces and
> > methods and forgoes instance. It's simpler (for the
> > implementation), more
> > flexible, and it give the user more choice. Trying to hack that in
> > will just
> > cause needless complications IMHO. IDs should are just strings no
> > need to complicate them
> > 
> > 
> > By-Value objects should still be defined and instantiated by the
> > bindings because unlike IDs we need to make sure all the fields
> > exist and are in the correct type.
> > 
> 
> --
> Adam Litke <[email protected]>
> IBM Linux Technology Center
> 
> 
_______________________________________________
vdsm-devel mailing list
[email protected]
https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel

Re: [vdsm] object instancing in the new VDSM API

Reply via email to