[jira] [Comment Edited] (LIVY-718) Support multi-active high availability in Livy

2019-12-31 Thread Yiheng Wang (Jira)


[ 
https://issues.apache.org/jira/browse/LIVY-718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17006069#comment-17006069
 ] 

Yiheng Wang edited comment on LIVY-718 at 12/31/19 12:28 PM:
-

bq. When a server fails, its sessions become unavailable until other servers 
are designated to handle them. This was not acceptable behavior, at least for 
clusters that I worked with in my previous job.

[~meisam] Currently, Livy only supports single node failure recover. Do you use 
Livy in that cluster? If so, how do you handle the downtime?


was (Author: yihengw):
bq. When a server fails, its sessions become unavailable until other servers 
are designated to handle them. This was not acceptable behavior, at least for 
clusters that I worked with in my previous job.

[~meisam] Currently, Livy only supports single node failure recover. Do you use 
Livy in that cluster? If so, would you like to share your solution?

> Support multi-active high availability in Livy
> --
>
> Key: LIVY-718
> URL: https://issues.apache.org/jira/browse/LIVY-718
> Project: Livy
>  Issue Type: Epic
>  Components: RSC, Server
>Reporter: Yiheng Wang
>Priority: Major
>
> In this JIRA we want to discuss how to implement multi-active high 
> availability in Livy.
> Currently, Livy only supports single node recovery. This is not sufficient in 
> some production environments. In our scenario, the Livy server serves many 
> notebook and JDBC services. We want to make Livy service more fault-tolerant 
> and scalable.
> There're already some proposals in the community for high availability. But 
> they're not so complete or just for active-standby high availability. So we 
> propose a multi-active high availability design to achieve the following 
> goals:
> # One or more servers will serve the client requests at the same time.
> # Sessions are allocated among different servers.
> # When one node crashes, the affected sessions will be moved to other active 
> services.
> Here's our design document, please review and comment:
> https://docs.google.com/document/d/1bD3qYZpw14_NuCcSGUOfqQ0pqvSbCQsOLFuZp26Ohjc/edit?usp=sharing
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (LIVY-718) Support multi-active high availability in Livy

2019-12-30 Thread Marco Gaido (Jira)


[ 
https://issues.apache.org/jira/browse/LIVY-718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17005512#comment-17005512
 ] 

Marco Gaido edited comment on LIVY-718 at 12/30/19 6:29 PM:


bq. IIRC JDBC had a REST and an RPC mode. The RPC mode might not be HA without 
a fat client but perhaps the REST mode could. Does Hive JDBC support HA on the 
Hive Thrift server? Then maybe the hive JDBC client now supports server side 
transitions. If not, then we may have the caveat that HA won't work for such 
connections. I am not super familiar with the JDBC client.

Actually, JDBC has either HTTP or binary mode, but it relates only to the 
communication between the Livy server and the client and it is unrelated to how 
the Livy server interacts with Spark (ie. via RPC).


was (Author: mgaido):
> IIRC JDBC had a REST and an RPC mode. The RPC mode might not be HA without a 
> fat client but perhaps the REST mode could. Does Hive JDBC support HA on the 
> Hive Thrift server? Then maybe the hive JDBC client now supports server side 
> transitions. If not, then we may have the caveat that HA won't work for such 
> connections. I am not super familiar with the JDBC client.

Actually, JDBC has either HTTP or binary mode, but it relates only to the 
communication between the Livy server and the client and it is unrelated to how 
the Livy server interacts with Spark (ie. via RPC).

> Support multi-active high availability in Livy
> --
>
> Key: LIVY-718
> URL: https://issues.apache.org/jira/browse/LIVY-718
> Project: Livy
>  Issue Type: Epic
>  Components: RSC, Server
>Reporter: Yiheng Wang
>Priority: Major
>
> In this JIRA we want to discuss how to implement multi-active high 
> availability in Livy.
> Currently, Livy only supports single node recovery. This is not sufficient in 
> some production environments. In our scenario, the Livy server serves many 
> notebook and JDBC services. We want to make Livy service more fault-tolerant 
> and scalable.
> There're already some proposals in the community for high availability. But 
> they're not so complete or just for active-standby high availability. So we 
> propose a multi-active high availability design to achieve the following 
> goals:
> # One or more servers will serve the client requests at the same time.
> # Sessions are allocated among different servers.
> # When one node crashes, the affected sessions will be moved to other active 
> services.
> Here's our design document, please review and comment:
> https://docs.google.com/document/d/1bD3qYZpw14_NuCcSGUOfqQ0pqvSbCQsOLFuZp26Ohjc/edit?usp=sharing
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)