Andrew Fogarty created LIVY-782:
-----------------------------------

             Summary: New APIs for Livy session PUT
                 Key: LIVY-782
                 URL: https://issues.apache.org/jira/browse/LIVY-782
             Project: Livy
          Issue Type: New Feature
          Components: API, Server
            Reporter: Andrew Fogarty


h2. Problem description

Livy currently has POST APIs for creating sessions:
 * To create a batch session, a client must submit a post request to “/batches”.
 * To create an interactive session, a client must submit a POST request to 
“/sessions”.

Both APIs generate a unique session ID which is returned to the client as part 
of the response payload.

These APIs are not idempotent.  That is, if either the request or the response 
is lost in transit, the client has no way to validate whether that job has 
started.  The only way to retry is to submit another POST, which could 
potentially start a second job.

For example, suppose a client submits a POST to create a new batch session. 
Livy receives the request and starts the batch session with ID=12. When Livy 
sends the response, assume it is lost in transit due to some networking issue. 
The client never receives the response, so it does not know if the batch 
started correctly and does not have an ID to query the status of the batch 
session.

This document contains two proposed solutions for this idempotence problem. 
These solutions introduce APIs for creating sessions in an idempotent manner. 
Neither solution makes changes to existing APIs.
h2. Suggested solution

This proposed solution introduces 1 new API:
 * PUT(“/\{session type}/”) -> Session

This API is described below.  *Note:* ‘->’ indicates the call “returns”.
h3. API: PUT(“/\{session type}/”) -> Session – Create session with given 
request ID header

This new API is a PUT to create a new session (batch or interactive) for the 
given session ID.  This new API is very similar to the existing POST API to 
create a session and expects the request payload to be a CreateBatchRequest or 
CreateInteractiveRequest as appropriate.

The difference between this PUT API and the existing session POST API is that 
requests to this API must contain a “requestId” header with a GUID value.  If 
the requestId is not provided, then PUT will fail with an error. This requestId 
is saved as an optional field on the metadata object (BatchRecoveryMetadata or 
InteractiveRecoveryMetadata) stored in the SessionStore.

When creating the session, before storing the metadata object in the 
SessionStore, we query the SessionStore to see if some session already exists 
with that requestId. If a session with the requestId already exists, then we 
return that session instead of creating a new one.  If there is no existing 
session with that requestId, then we create the session normally.
h3. Example

This solution solves the idempotence problem by ensuring that repeat calls to 
PUT with the same requestId will return the first created session. If a client 
makes a request to PUT but for some reason does not receive a response, then 
they can retry that request with the same requestId. If the session had not 
started, then it will start. Otherwise, if the session has already started, 
then its session object will be returned to the client.
h2. Alternative solution

Introduce 2 new APIs:
 # POST(“/\{session type}/id”) -> \{sessionId: Int, : GUID}
 # PUT(“/\{session type}/\{session id}”) -> Session

Both are described below.
h3. API 1: POST(“/\{session type}/id”) -> \{sessionId: Int, putKey: GUID} – 
Generates a new unique sessionId

The first API is a POST to generate a new unique session ID for the given 
session type (batch or interactive).

This API would:
 # Increment the existing sessionId incrementor.
 # Store a new value \{“/putkey/{session type}/\{session id}” -> putKey} in the 
SessionStore.
 # Return \{sessionId: Int, putKey: GUID} payload.

The returned payload contains the session ID as well as the “putKey”, which is 
a GUID used in the second API to validate the sessionID. We call this the 
“putKey” because it represents a unique key used to identify the PUT request. 
We store the mapping from session ID to putKey in the SessionStore so that the 
second API can validate that a provided session ID matches its putKey.
h3. API 2: PUT(“/\{session type}/\{session id}”) -> Session – Create a session 
with the given session ID

The second API is a PUT to create a new session (batch or interactive) for the 
given session ID. This new API is very similar to the existing POST API to 
create a session and expects the request payload to be a CreateBatchRequest or 
CreateInteractiveRequest as appropriate. CreateBatchRequest and 
CreateInteractiveRequest will contain the optional putKey field.

This API would:
 # Validate that the provided session ID matches the putKey by reading the 
\{“/putkey/{session type}/\{session id}” value from the SessionStore.
 ## If no putKey is provided, or the session ID does not match the putKey, then 
we fail the request. This is to ensure that the provided sessionID was 
generated by the first API, and that some client isn’t using a sessionID that 
it should not have permission to use.
 # Follow the usual code path to create a session, except pass down the session 
ID and the putkey.
 ** For this feature, we would change that code path in BatchSession and 
InteractiveSession. Before saving the session metadata record to SessionStore, 
we check that some record with this ID does not already exist in the 
SessionStore. If it does, then we just return that session and do not create a 
new session.

h3. Example

With these new APIs, a client can get a valid session ID before submitting 
their batch or interactive session to Livy. 

The sequence would be:
 # Call POST(“/\{session type}/id”) to get a new valid session ID and putKey.
 # Call PUT(“/\{session type/{session id}”) to start a new session with that 
valid session ID.
 # If for some reason the client does not receive a response, use the ID to 
query Livy for the status. Otherwise, they can re-submit the PUT request. When 
a request is re-submitted:
 ## If the session had not started, it will start.
 ## If the session had started already, its session object will be returned to 
the client.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to