On 26/01/12 23:42, Saggi Mizrahi wrote:
> 
> 
> ----- Original Message -----
>> From: "Livnat Peer" <lp...@redhat.com>
>> To: "Saggi Mizrahi" <smizr...@redhat.com>
>> Cc: "Adam Litke" <a...@us.ibm.com>, engine-de...@ovirt.org, 
>> vdsm-devel@lists.fedorahosted.org
>> Sent: Thursday, January 26, 2012 3:16:32 PM
>> Subject: Re: [vdsm] [Engine-devel] [RFC] New Connection Management API
>>
>> On 26/01/12 21:21, Saggi Mizrahi wrote:
>>>
>>>
>>> ----- Original Message -----
>>>> From: "Adam Litke" <a...@us.ibm.com>
>>>> To: "Saggi Mizrahi" <smizr...@redhat.com>
>>>> Cc: "Livnat Peer" <lp...@redhat.com>, engine-de...@ovirt.org,
>>>> vdsm-devel@lists.fedorahosted.org
>>>> Sent: Thursday, January 26, 2012 1:58:40 PM
>>>> Subject: Re: [vdsm] [Engine-devel] [RFC] New Connection Management
>>>> API
>>>>
>>>> On Thu, Jan 26, 2012 at 10:00:57AM -0500, Saggi Mizrahi wrote:
>>>>> <snip>
>>>>> Again trying to sum up and address all comments
>>>>>
>>>>> Clear all:
>>>>> ==========
>>>>> My opinions is still to not implement it.
>>>>> Even though it might generate a bit more traffic premature
>>>>> optimization is bad and there are other reasons we can improve
>>>>> VDSM command overhead without doing this.
>>>>>
>>>>> In any case this argument is redundant because my intention is
>>>>> (as
>>>>> Litke pointed out) is to have a lean API.
>>>>> and API call is something you have to support across versions,
>>>>> this
>>>>> call implemented in the engine is something that no one has to
>>>>> support and can change\evolve easily.
>>>>>
>>>>> As a rule, if an API call C and be implemented by doing A + B
>>>>> then
>>>>> C is redundant.
>>>>>
>>>>> List of connections as args:
>>>>> ============================
>>>>> Sorry I forgot to respond about that. I'm not as strongly opposed
>>>>> to the idea as the other things you suggested. It'll just make
>>>>> implementing the persistence logic in VDSM significantly more
>>>>> complicated as I will have to commit multiple connection
>>>>> information to disk in an all or nothing mode. I can create a
>>>>> small sqlitedb to do that or do some directory tricks and exploit
>>>>> FS rename atomicity but I'd rather not.
>>>>
>>>> I would be strongly opposed to introducing a sqlite database into
>>>> vdsm just to
>>>> enable "convenience mode" for this API.  Does the operation really
>>>> need to be
>>>> atomic?  Why not just perform each connection sequentially and
>>>> return
>>>> a list of
>>>> statuses? Is the only motivation for allowing a list of parameters
>>>> to reduce
>>>> the number of API calls between engine and vdsm)?  If so, the same
>>>> argument
>>>> Saggi makes above applies here.
>>>
>>> I try and have VDSM expose APIs that are simple to predict. a
>>> command can either succeed or fail.
>>> The problem is not actually validating the connections. The problem
>>> is that once I concluded that they are all OK I need to persist to
>>> disk the information that will allow me to reconnect if VDSM
>>> happens to crash. If I naively save them one by one I could get in
>>> a state where only some of the connections persisted before the
>>> operation failed. So I have to somehow put all this in a
>>> transaction.
>>>
>>> I don't have to use sqlite. I could also put all the persistence
>>> information in a new dir for every call named <UUID>.tmp. Once I
>>> wrote everything down I rename the directory to just <UUID> and
>>> fsync it. This is guarantied by posix to be atomic. For unmanage,
>>> I move all the persistence information from the directories they
>>> sit in to a new dir named <UUID>. Rename it to a <UUDI>.tmp, fsync
>>> it and then remove it.
>>>
>>> This all just looks like more trouble then it's worth to me.
>>>
>>
>>
>> I agree with Adam, I don't think the operation should be atomic,
>> having
>> only some of the connections persisted is a perfectly valid outcome
>> if
>> the API returns a list of statuses.
> What if it doesn't return at all?
> The only reasons that something will fail manage is if the URI is broken so I 
> assume 99% of the issued manage commands will succeed.
> My problem is with VDSM crashing mid operation. The operation will appear to 
> fail but when VDSM returns some of the connections persisted so it will 
> reconnect. because the client's manage call failed it doesn't expect the CIDs 
> to be in the list. This will cause ambiguity when finding an already 
> registered CID at runtime.
> 

I think that if VDSM did not return at all, it is reasonable expectation
to use the status verb for finding the connections status (managed or not).

general comment - It would help if manage verb returns a dedicated error
code which indicates that the CID is already managed.



>>
>>
>>>>
>>>>> The demands are not without base. I would like to keep the code
>>>>> simple under the hood in the price of a few more calls. You would
>>>>> like to make less calls and keep the code simpler on your side.
>>>>> There isn't a real way to settle this.
>>>>> If anyone on the list as pros and cons for either way I'd be
>>>>> happy
>>>>> to hear them.
>>>>> If no compelling arguments arise I will let Ayal call this one.
>>>>>
>>>>> Transient connections:
>>>>> ======================
>>>>> The problem you are describing as I understand it is that VDSM
>>>>> did
>>>>> not respond and not that the API client did not respond.
>>>>> Again, this can happen for a number of reason, most of which VDSM
>>>>> might not be aware that there is actually a problem (network
>>>>> issues).
>>>>>
>>>>> This relates to the EOL policy. I agree we have to find a good
>>>>> way
>>>>> to define an automatic EOL for resources. I have made my
>>>>> suggestion. Out of the scope of the API.
>>>>>
>>>>> In the meantime cleaning stale connections is trivial and I have
>>>>> made it clear a previous email about how to go about it in a
>>>>> simple non intrusive way. Clean hosts on host connect, and on
>>>>> every poll if you find connections that you don't like. This
>>>>> should keep things squeaky clean.
>>>>>
>>>>> ----- Original Message -----
>>>>>> From: "Livnat Peer" <lp...@redhat.com>
>>>>>> To: "Saggi Mizrahi" <smizr...@redhat.com>
>>>>>> Cc: vdsm-devel@lists.fedorahosted.org, engine-de...@ovirt.org
>>>>>> Sent: Thursday, January 26, 2012 5:22:42 AM
>>>>>> Subject: Re: [Engine-devel] [RFC] New Connection Management API
>>>>>>
>>>>>> On 25/01/12 23:35, Saggi Mizrahi wrote:
>>>>>>> <SNIP>
>>>>>>> This is mail was getting way too long.
>>>>>>>
>>>>>>> About the clear all verb.
>>>>>>> No.
>>>>>>> Just loop, find the connections YOU OWN and clean them. Even
>>>>>>> though
>>>>>>> you don't want to support multiple clients to VDSM API doesn't
>>>>>>> mean the engine shouldn't behave like a proper citizen.
>>>>>>> It's the same reason why VDSM tries and not mess system
>>>>>>> resources
>>>>>>> it didn't initiate.
>>>>>>
>>>>>>
>>>>>> There is a big difference, VDSM living in hybrid mode with other
>>>>>> workload on the host is a valid use case, having more than one
>>>>>> concurrent manager for VDSM is not.
>>>>>> Generating a disconnect request for each connection does not
>>>>>> seem
>>>>>> like
>>>>>> the right API to me, again think on the simple flow of moving
>>>>>> host
>>>>>> from
>>>>>> one data center to another, the engine needs to disconnect tall
>>>>>> storage
>>>>>> domains (each domain can have couple of connections associated
>>>>>> with
>>>>>> it).
>>>>>>
>>>>>> I am giving example from the engine use cases as it is the main
>>>>>> user
>>>>>> of
>>>>>> VDSM ATM but I am sure it will be relevant to any other user of
>>>>>> VDSM.
>>>>>>
>>>>>>>
>>>>>>> ------------------------
>>>>>>>
>>>>>>> As I see it the only point of conflict is the so called
>>>>>>> non-peristed connections.
>>>>>>> I will call them transient connections from now on.
>>>>>>>
>>>>>>> There are 2 user cases being discussed
>>>>>>> 1. Wait until a connection is made, if it fails don't retry and
>>>>>>> automatically unmanage.
>>>>>>> 2. If the called of the API forgets or fails to unmanage a
>>>>>>> connection.
>>>>>>>
>>>>>>
>>>>>> Actually I was not discussing #2 at all.
>>>>>>
>>>>>>> Your suggestion as I understand it:
>>>>>>> Transient connections are:
>>>>>>>      - Connection that VDSM will only try to connect to once
>>>>>>>      and
>>>>>>>      will not reconnect to in case of disconnect.
>>>>>>
>>>>>> yes
>>>>>>
>>>>>>>
>>>>>>> My problem with this definition that it does not specify the
>>>>>>> "end
>>>>>>> of life" of the connection.
>>>>>>> Meaning it solves only use case 1.
>>>>>>
>>>>>> since this is the only use case i had in mind, it is what i was
>>>>>> looking for.
>>>>>>
>>>>>>> If all is well, and it usually is, VDSM will not invoke a
>>>>>>> disconnect.
>>>>>>> So the caller would have to call unmanage if the connection
>>>>>>> succeeded at the end of the flow.
>>>>>>
>>>>>> agree.
>>>>>>
>>>>>>> Now, if you are already calling unmanage if connection
>>>>>>> succeeded
>>>>>>> you can just call it anyway.
>>>>>>
>>>>>> not exactly, an example I gave earlier on the thread was that
>>>>>> VSDM
>>>>>> hangs
>>>>>> or have other error and the engine can not initiate unmanaged,
>>>>>> instead
>>>>>> let's assume the host is fenced (self-fence or external fence
>>>>>> does
>>>>>> not
>>>>>> matter), in this scenario the engine will not issue unmanage.
>>>>>>
>>>>>>>
>>>>>>> instead of doing: (with your suggestion)
>>>>>>> ----------------
>>>>>>> manage
>>>>>>> wait until succeeds or lastError has value
>>>>>>> try:
>>>>>>>   do stuff
>>>>>>> finally:
>>>>>>>   unmanage
>>>>>>>
>>>>>>> do: (with the canonical flow)
>>>>>>> ---
>>>>>>> manage
>>>>>>> try:
>>>>>>>   wait until succeeds or lastError has value
>>>>>>>   do stuff
>>>>>>> finally:
>>>>>>>   unmanage
>>>>>>>
>>>>>>> This is simpler to do than having another connection type.
>>>>>>
>>>>>> You are assuming the engine can communicate with VDSM and there
>>>>>> are
>>>>>> scenarios where it is not feasible.
>>>>>>
>>>>>>>
>>>>>>> Now that we got that out of the way lets talk about the 2nd use
>>>>>>> case.
>>>>>>
>>>>>> Since I did not ask VDSM to clean after the (engine) user and
>>>>>> you
>>>>>> don't
>>>>>> want to do it I am not sure we need to discuss this.
>>>>>>
>>>>>> If you insist we can start the discussion on who should
>>>>>> implement
>>>>>> the
>>>>>> cleanup mechanism but I'm afraid I have no strong arguments for
>>>>>> VDSM
>>>>>> to
>>>>>> do it, so I rather not go there ;)
>>>>>>
>>>>>>
>>>>>> You dropped from the discussion my request for supporting list
>>>>>> of
>>>>>> connections for manage and unmanage verbs.
>>>>>>
>>>>>>> API client died in the middle of the operation and unmanage was
>>>>>>> never called.
>>>>>>>
>>>>>>> Your suggested definition means that unless there was a problem
>>>>>>> with the connection VDSM will still have this connection
>>>>>>> active.
>>>>>>> The engine will have to clean it anyway.
>>>>>>>
>>>>>>> The problem is, VDSM has no way of knowing that a client died,
>>>>>>> forgot or is thinking really hard and will continue on in about
>>>>>>> 2
>>>>>>> minutes.
>>>>>>
>>>>>>>
>>>>>>> Connections that live until they die is a hard to define and
>>>>>>> work
>>>>>>> with lifecycle. Solving this problem is theoretically simple.
>>>>>>>
>>>>>>> Have clients hold some sort of session token and force the
>>>>>>> client
>>>>>>> to update it at a specified interval. You could bind resources
>>>>>>> (like domains, VMs, connections) to that session token so when
>>>>>>> it
>>>>>>> expires VDSM auto cleans the resources.
>>>>>>>
>>>>>>> This kind of mechanism is out of the scope of this API change.
>>>>>>> Further more I think that this mechanism should sit in the
>>>>>>> engine
>>>>>>> since the session might actually contain resources from
>>>>>>> multiple
>>>>>>> hosts and resources that are not managed by VDSM.
>>>>>>>
>>>>>>> In GUI flows specifically the user might do actions that don't
>>>>>>> even
>>>>>>> touch the engine and forcing it to refresh the engine token is
>>>>>>> simpler then having it refresh the VDSM token.
>>>>>>>
>>>>>>> I understand that engine currently has no way of tracking a
>>>>>>> user
>>>>>>> session. This, as I said, is also true in the case of VDSM. We
>>>>>>> can
>>>>>>> start and argue about which project should implement the
>>>>>>> session
>>>>>>> semantics. But as I see it it's not relevant to the connection
>>>>>>> management API.
>>>>>>
>>>>>>
>>>>> _______________________________________________
>>>>> vdsm-devel mailing list
>>>>> vdsm-devel@lists.fedorahosted.org
>>>>> https://fedorahosted.org/mailman/listinfo/vdsm-devel
>>>>
>>>> --
>>>> Adam Litke <a...@us.ibm.com>
>>>> IBM Linux Technology Center
>>>>
>>>>
>>
>>

_______________________________________________
vdsm-devel mailing list
vdsm-devel@lists.fedorahosted.org
https://fedorahosted.org/mailman/listinfo/vdsm-devel

Reply via email to