Re: [vdsm] object instancing in the new VDSM API

2012-12-04 Thread Adam Litke
 and is logically
 wrong.  What you need to do is remove redundant arguments and split up verbs
 that do more then one thing.
 
 - Original Message -
  From: Adam Litke a...@us.ibm.com To: Saggi Mizrahi
  smizr...@redhat.com Cc: vdsm-devel vdsm-de...@fedorahosted.org, Ayal
  Baron aba...@redhat.com, Barak Azulay bazu...@redhat.com, ybronhei
  ybron...@redhat.com Sent: Monday, December 3, 2012 5:46:31 PM Subject: Re:
  object instancing in the new VDSM API
  
  On Mon, Dec 03, 2012 at 04:34:28PM -0500, Saggi Mizrahi wrote:
   Currently the suggested scheme treats everything as instances and object
   have methods.  This puts instancing as the responsibility of the API
   bindings.  I suggest changing it to the way json was designed with
   namespaces and methods.
   
   For example instead for the api being:
   
   vm = host.getVMsList()[0] vm.getInfo()
   
   the API should be:
   
   vmID = host.getVMsList()[0] api.VMsManager.getVMInfo(vmID)
   
   And it should be up to decide how to wrap everything in objects.
  
  For VMs, your example looks nice, but for today's Volumes it's not so nice.
  To properly identify a Volume, we must pass the storage pool id, storage
  domain id, image id, and volume id.  If we are working with two Volumes, we
  would need 8 parameters unless we optimize for context and assume that the
  storage pool uuid is the same for both volumes, etc.  The problem with that
  optimization is that we require clients to understand internal
  implementation details.
  
  How should the StorageDomain.getVolumes API return a list of Volumes?  A
  list of Volume ids is not enough information for most commands that involve
  a Volume.
  
   The problem with the API bindings controlling the instancing is that: 1)
   We have to *have* and *pass* implicit api obj which is problematic to
   maintain.  For example, you have to have the api object as a member of
   instance for the method calls to work.  This means that you can't recreate
   or pool API objects easily. You effectively need to add a move method to
   move the object to another API object to use it on a different host.
  
  You already make assumptions like this when passing around bare UUIDs.  For
  example, you know that a Storage Domain cannot be associated with multiple
  Storage Pools at the same time.  With instantiated objects, all of those
  associations are baked into the objects.  A client never constructs objects.
  It only receives pre-instantiated objects by calling other APIs.
  
   2) Because the objects are opaque it might be hard to know what fields of
   the instance to persist to get the same object.
  
  No.  You just persist the whole object identifier the way it was given to
  you.  In the case of Volumes, it may be an object containing 4 string uuids.
  It could also be a string in the form /spuuid/sduuid/imguuid/voluuid.  In
  the end it doesn't really matter which form it's in because the client will
  not manipulate it.  Perhaps some flattened string is best in order to enable
  easy database storage.
  
   3) It breaks the distinction between by-value and by-reference objects.
  
  The distinction is made in the schema.  Reference objects have methods and
  are called 'class' in the schema.  Value objects have only fields and are
  called 'type'.
  
   4) Any serious user will make it's own instance classes that conform to
   it's design and flow so they don't really add any convenience to anything
   apart for tests.  You will create you're own VM object, and because it's
   in the manager scope it will be the same instance across all hosts.
   Instead of being able to pass the same ID to any host (as the vmID remains
   the same) you will have to create and instance object to use either before
   every call for simplicity or cache for each host for performance benefits.
  
  This is a pretty good argument for using namespacing instead of instances,
  however...  I still think that all object references need to be an opaque
  type and it should not be legal to roll your own object reference from a set
  of other objects (eq. create a volume reference from imgUUID, sdUUID, and
  spUUID).  The API should be explicit about the relationships between
  objects.
  
  If you want to write your own instance classes you still can.  Just pass the
  vdsm-generated identifier into your object's constructor to use for later
  API calls.
  
   5) It makes us pass a weird __obj__ parameter to each call that
   symbolizes self and makes it hard for a user that choose to use it's own
   bindings to understand what it does.
  
  Fair.  '__obj__' is a terrible name.  I would be okay with changing the
  semantics so that all API calls take an 'id' parameter as their first
  argument.  I guess this could always be a string with an unspecified format.
  For Volumes, we can decide how we want to encode the 4 uuids.  Vdsm would
  then need to parse this value on the server side to pull out the relevant
  IDs.
  
   6) It's syntactic

Re: [vdsm] object instancing in the new VDSM API

2012-12-03 Thread Adam Litke
On Mon, Dec 03, 2012 at 04:34:28PM -0500, Saggi Mizrahi wrote:
 Currently the suggested scheme treats everything as instances and object have 
 methods.
 This puts instancing as the responsibility of the API bindings.
 I suggest changing it to the way json was designed with namespaces and 
 methods.
 
 For example instead for the api being:
 
 vm = host.getVMsList()[0]
 vm.getInfo()
 
 the API should be:
 
 vmID = host.getVMsList()[0]
 api.VMsManager.getVMInfo(vmID)
 
 And it should be up to decide how to wrap everything in objects.

For VMs, your example looks nice, but for today's Volumes it's not so nice.  To
properly identify a Volume, we must pass the storage pool id, storage domain id,
image id, and volume id.  If we are working with two Volumes, we would need 8
parameters unless we optimize for context and assume that the storage pool uuid
is the same for both volumes, etc.  The problem with that optimization is that
we require clients to understand internal implementation details.

How should the StorageDomain.getVolumes API return a list of Volumes?  A list of
Volume ids is not enough information for most commands that involve a Volume.

 The problem with the API bindings controlling the instancing is that:
 1) We have to *have* and *pass* implicit api obj which is problematic to 
 maintain.
For example, you have to have the api object as a member of instance for 
 the
method calls to work.  This means that you can't recreate or pool API
objects easily. You effectively need to add a move method to move the
object to another API object to use it on a different host.

You already make assumptions like this when passing around bare UUIDs.  For
example, you know that a Storage Domain cannot be associated with multiple
Storage Pools at the same time.  With instantiated objects, all of those
associations are baked into the objects.  A client never constructs objects.  It
only receives pre-instantiated objects by calling other APIs.

 2) Because the objects are opaque it might be hard to know what fields of the
instance to persist to get the same object.

No.  You just persist the whole object identifier the way it was given to you.
In the case of Volumes, it may be an object containing 4 string uuids.  It could
also be a string in the form /spuuid/sduuid/imguuid/voluuid.  In the end it
doesn't really matter which form it's in because the client will not manipulate
it.  Perhaps some flattened string is best in order to enable easy database
storage.

 3) It breaks the distinction between by-value and by-reference objects.

The distinction is made in the schema.  Reference objects have methods and are
called 'class' in the schema.  Value objects have only fields and are called
'type'.

 4) Any serious user will make it's own instance classes that conform to it's
design and flow so they don't really add any convenience to anything apart
for tests.
You will create you're own VM object, and because it's in the manager scope
it will be the same instance across all hosts. Instead of being able to 
 pass
the same ID to any host (as the vmID remains the same) you will have to
create and instance object to use either before every call for simplicity
or cache for each host for performance benefits.

This is a pretty good argument for using namespacing instead of instances,
however...  I still think that all object references need to be an opaque type
and it should not be legal to roll your own object reference from a set of other
objects (eq. create a volume reference from imgUUID, sdUUID, and spUUID).  The
API should be explicit about the relationships between objects.

If you want to write your own instance classes you still can.  Just pass the
vdsm-generated identifier into your object's constructor to use for later API
calls.

 5) It makes us pass a weird __obj__ parameter to each call that symbolizes
self and makes it hard for a user that choose to use it's own bindings to
understand what it does.

Fair.  '__obj__' is a terrible name.  I would be okay with changing the
semantics so that all API calls take an 'id' parameter as their first argument.
I guess this could always be a string with an unspecified format.  For Volumes,
we can decide how we want to encode the 4 uuids.  Vdsm would then need to parse
this value on the server side to pull out the relevant IDs.

 6) It's syntactic sugar at best that adds needless limitation to how a user 
 can
play with the IDs and the API.
 
 I personally think there is a reason why json-rpc defines name-spaces and
 methods and forgoes instance. It's simpler (for the implementation), more
 flexible, and it give the user more choice. Trying to hack that in will just
 cause needless complications IMHO. IDs should are just strings no need to 
 complicate them
 
 
 By-Value objects should still be defined and instantiated by the bindings 
 because unlike IDs we need to make sure all the fields exist and are in the 
 

Re: [vdsm] object instancing in the new VDSM API

2012-12-03 Thread Saggi Mizrahi
So from what I gather the only thing that is bothering you is that storage 
operations require a lot of IDs.
I get that, I hate that to. It doesn't change the point that it was designed 
that way.
Even if you deem some use cases irrelevant it wouldn't change the fact that 
this is how people use it now.
And because we are going to throw it away as soon as we can there is no reason 
to shape out API around that.

So from what I gather we agree on instancing.

---

From this moment on I'm going to try my best to explain how VDSM storage 
currently works.
It is filled with misdirection and bad design. I hope that after that you will 
understand why you can't pack all the IDs together.

Let's start with the storage pool. Because it was simpler to have all metadata 
changing operations run on the same host someone needed to find a way to make 
cross domain operations work on the same host.
The solution was to band them all to a single entity call the storage pool and 
have a single lock. The point was to have a host be able to connect to multiple 
pools at a time.
Due to bad code (that could have been easily not have been so bad) the multiple 
pools feature was never implemented. Because the single lock to rule them all 
doesn't really work when you want to secure domain we had to add more locks 
making the pool concept obsolete.

These means that you can trust VDSM to only be connected to a single pool at 
the time, this means that if you want to change anything you can just remove 
the pool arg.

Lets go to volumes and images. Contrary to how it's name imgUUID does not 
represent and image. It's actually a tag given to part of a chain. This is 
commonly used to differentiate between parts of the chain responsible for VM 
images and templates. Due to bad code a lot of the possible combinations are 
not supported but that is the intention.
imgUUID being a tag means that it serves 3 purposes depending on the verb that 
uses it.
1) In some verbs it used as a useless sanity check to make sure the volume is 
tagged with this sdUUID.
   This I imagine was done because someone didn't fully comprehend how and why 
you do sanity checks.
   This means that in some verbs you can just remove it (if you are actually 
changing anything)
2) In some verbs it's meant to distinct the volume from it's original chain 
(creating a template). At that point it's actually now being invented by the 
caller.
3) Operations that act on the whole chain, if volUUID is there is for the same 
useless sanity check and can be removed.

What you need to get out of this is that most of the time you can use less IDs 
just by removing useless imgUUID or volUUID args.
Further more, you need to understand that they are not hierarchical. imgUUID is 
a tag on the volume. similar to user for a file.

As for domain IDs, because the caller can choose to reuse imgUUIDs and volUUIDs 
on different domains and some flows actually depend on that.
To make things simpler some verbs should be split up so how you specify that 
target volID doesn't affect the actual command.

This means that copyImage() and createTemplate() should be split to:
copyImage(dstDomain, srcDomain, imgUUID)
createTemplate(dstDomain, dstImgUUID, srcDomain, srcImgUUID)

That being said, I'm personally still against an indeterminate storage API 
because of engine adoption problems.
But if you want to fix the current interface. Packing up the IDs to a single 
ID wouldn't work and is logically wrong.
What you need to do is remove redundant arguments and split up verbs that do 
more then one thing.

- Original Message -
 From: Adam Litke a...@us.ibm.com
 To: Saggi Mizrahi smizr...@redhat.com
 Cc: vdsm-devel vdsm-de...@fedorahosted.org, Ayal Baron 
 aba...@redhat.com, Barak Azulay
 bazu...@redhat.com, ybronhei ybron...@redhat.com
 Sent: Monday, December 3, 2012 5:46:31 PM
 Subject: Re: object instancing in the new VDSM API
 
 On Mon, Dec 03, 2012 at 04:34:28PM -0500, Saggi Mizrahi wrote:
  Currently the suggested scheme treats everything as instances and
  object have methods.
  This puts instancing as the responsibility of the API bindings.
  I suggest changing it to the way json was designed with namespaces
  and methods.
  
  For example instead for the api being:
  
  vm = host.getVMsList()[0]
  vm.getInfo()
  
  the API should be:
  
  vmID = host.getVMsList()[0]
  api.VMsManager.getVMInfo(vmID)
  
  And it should be up to decide how to wrap everything in objects.
 
 For VMs, your example looks nice, but for today's Volumes it's not so
 nice.  To
 properly identify a Volume, we must pass the storage pool id, storage
 domain id,
 image id, and volume id.  If we are working with two Volumes, we
 would need 8
 parameters unless we optimize for context and assume that the storage
 pool uuid
 is the same for both volumes, etc.  The problem with that
 optimization is that
 we require clients to understand internal implementation details.
 
 How should the StorageDomain.getVolumes API