I have begun work at changing how API clients can control storage connections 
when interacting with VDSM.

Currently there are 2 API calls:
connectStorageServer() - Will connect to the storage target if the host is not 
already connected to it.
disconnectStorageServer() - Will disconnect from the storage target if the host 
is connected to it.

This API is very simple but is inappropriate when multiple clients and flows 
try to access the same storage.

This is currently solved by trying to synchronize things inside rhevm. This is 
hard and convoluted. It also brings out issues with other clients using the 
VDSM API.

Another problem is error recovery. Currently ovirt-engine(OE) has no way of 
monitoring the connections on all the hosts an if a connection disappears it's 
OE's responsibility to reconnect.

I suggest a different concept where VDSM 'manages' the connections. VDSM 
receives a manage request with the connection information and from that point 
forward VDSM will try to keep this connection alive. If the connection fails 
VDSM will automatically try and recover.

Every manage request will also have a connection ID(CID). This CID will be used 
when the same client asks to unamange the connection.
When multiple requests for manage are received to the same connection they all 
have to have their own unique CID. By internally mapping CIDs to actual 
connections VDSM can properly disconnect when no CID is addressing the 
connection. This allows each client and even each flow to have it's own CID 
effectively eliminating connect\disconnect races.

The change from (dis)connect to (un)manage also changes the semantics of the 
calls significantly.
Whereas connectStorageServer would have returned when the storage is either 
connected or failed to connect, manageStorageServer will return once VDSM 
registered the CID. This means that the connection might not be active 
immediately as the VDSM tries to connect. The connection might remain down for 
a long time if the storage target is down or is having issues.

This allows for VDSM to receive the manage request even if the storage is 
having issues and recover as soon as it's operational without user intervention.

In order for the client to query the current state of the connections I propose 
getStorageConnectionList(). This will return a mapping of CID to connection 
status. The status contains the connection info (excluding credentials), 
whether the connection is active, whether the connection is managed (unamanged 
connection are returned with transient IDs), and, if the connection is down, 
the last error information.

The same actual connection can return multiple times, once for each CID.

For cases where an operation requires a connection to be active a user can poll 
the status of the CID. The user can then choose to poll for a certain amount of 
time or until an error appears in the error field of the status. This will give 
you either a timeout or a "try once" semantic depending on the flows needs.

All connections that have been managed persist VDSM restart and will be managed 
until a corresponding unmanage command has been issued.

There is no concept of temporary connections as "temporary" is flow dependent 
and VDSM can't accommodate all interpretation of "temporary". An ad-hoc 
mechanism can be build using the CID field. For instance a client can manage a 
connection with "ENGINE_FLOW101_CON1". If the flow got interrupted the client 
can clean all IDs with certain flow IDs.

I think this API gives safety, robustness, and implementation freedom.


Nitty Gritty:

manageStorageServer
===================
Synopsis:
manageStorageServer(uri, connectionID):

Parameters:
uri - a uri pointing to a storage target (eg: nfs://server:export, 
iscsi://host/iqn;portal=1)
connectionID - string with any char except "/".

Description:
Tells VDSM to start managing the connection. From this moment on VDSM will try 
and have the connection available when needed. VDSM will monitor the connection 
and will automatically reconnect on failure.
Returns:
Success code if VDSM was able to manage the connection.
It usually just verifies that the arguments are sane and that the CID is not 
already in use.
This doesn't mean the host is connected.
----
unmanageStorageServer
=====================
Synopsis:
unmanageStorageServer(connectionID):

Parameters:
connectionID - string with any char except "/".

Descriptions:
Tells VDSM to stop managing the connection. VDSM will try and disconnect for 
the storage target if this is the last CID referencing the storage connection.

Returns:
Success code if VDSM was able to unmanage the connection.
It will return an error if the CID is not registered with VDSM. Disconnect 
failures are not reported. Active unmanaged connections can be tracked with 
getStorageServerList()
----
getStorageServerList
====================
Synopsis:
getStorageServerList()

Description:
Will return list of all managed and unmanaged connections. Unmanaged 
connections have temporary IDs and are not guaranteed to be consistent across 
calls.

Results:VDSM was able to manage the connection.
It usually just verifies that the arguments are sane and that the CID is not 
already in use.
This doesn't mean the host is connected.
----
unmanageStorageServer
=====================
Synopsis:
unmanageStorageServer(connectionID):

Parameters:
connectionID - string with any char except "/".

Descriptions:
Tells VDSM to stop managing the connection. VDSM will try and disconnect for 
the storage target if this is the last CID referencing the storage connection.

Returns:
Success code if VDSM was able to unmanage the connection.
It will return an error if the CID is not registered with VDSM. Disconnect 
failures are not reported. Active unmanaged connections can be tracked with 
getStorageServerList()
----
getStorageServerList
====================
Synopsis:
getStorageServerList()

Description:
Will return list of all managed and unmanaged connections. Unmanaged 
connections have temporary IDs and are not guaranteed to be consistent across 
calls.

Results:
A mapping between CIDs and the status.
example return value (Actual key names may differ)

{'conA': {'connected': True, 'managed': True, 'lastError': 0, 'connectionInfo': 
{
    'remotePath': 'server:/export
    'retrans': 3
    'version': 4
    }}
 'iscsi_session_34': {'connected': False, 'managed': False, 'lastError': 339, 
'connectionIfno': {
    'hostname': 'dandylopn'
    'portal': 1}}
}
_______________________________________________
vdsm-devel mailing list
vdsm-devel@lists.fedorahosted.org
https://fedorahosted.org/mailman/listinfo/vdsm-devel

Reply via email to