[jira] [Commented] (LIVY-718) Support multi-active high availability in Livy
[ https://issues.apache.org/jira/browse/LIVY-718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17006078#comment-17006078 ] Yiheng Wang commented on LIVY-718: -- [~bikassaha] Compared to the designated server solution, I think stateless server solution get more accessibility by sacrificing scalability. In this background, one concern is memory. We observed that when the running session number grows to 400~500, the Livy server process consuming about 2G memory. Another concern is Livy use long connections between server and spark drivers. Say there're M server and N session. In designate solution, there're N connections. In stateless solution, there're M x N connections. I'm afraid this may bring a lot of overhead in RPC communication(e.g. serialization, routing). > Support multi-active high availability in Livy > -- > > Key: LIVY-718 > URL: https://issues.apache.org/jira/browse/LIVY-718 > Project: Livy > Issue Type: Epic > Components: RSC, Server >Reporter: Yiheng Wang >Priority: Major > > In this JIRA we want to discuss how to implement multi-active high > availability in Livy. > Currently, Livy only supports single node recovery. This is not sufficient in > some production environments. In our scenario, the Livy server serves many > notebook and JDBC services. We want to make Livy service more fault-tolerant > and scalable. > There're already some proposals in the community for high availability. But > they're not so complete or just for active-standby high availability. So we > propose a multi-active high availability design to achieve the following > goals: > # One or more servers will serve the client requests at the same time. > # Sessions are allocated among different servers. > # When one node crashes, the affected sessions will be moved to other active > services. > Here's our design document, please review and comment: > https://docs.google.com/document/d/1bD3qYZpw14_NuCcSGUOfqQ0pqvSbCQsOLFuZp26Ohjc/edit?usp=sharing > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (LIVY-718) Support multi-active high availability in Livy
[ https://issues.apache.org/jira/browse/LIVY-718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17006069#comment-17006069 ] Yiheng Wang edited comment on LIVY-718 at 12/31/19 12:28 PM: - bq. When a server fails, its sessions become unavailable until other servers are designated to handle them. This was not acceptable behavior, at least for clusters that I worked with in my previous job. [~meisam] Currently, Livy only supports single node failure recover. Do you use Livy in that cluster? If so, how do you handle the downtime? was (Author: yihengw): bq. When a server fails, its sessions become unavailable until other servers are designated to handle them. This was not acceptable behavior, at least for clusters that I worked with in my previous job. [~meisam] Currently, Livy only supports single node failure recover. Do you use Livy in that cluster? If so, would you like to share your solution? > Support multi-active high availability in Livy > -- > > Key: LIVY-718 > URL: https://issues.apache.org/jira/browse/LIVY-718 > Project: Livy > Issue Type: Epic > Components: RSC, Server >Reporter: Yiheng Wang >Priority: Major > > In this JIRA we want to discuss how to implement multi-active high > availability in Livy. > Currently, Livy only supports single node recovery. This is not sufficient in > some production environments. In our scenario, the Livy server serves many > notebook and JDBC services. We want to make Livy service more fault-tolerant > and scalable. > There're already some proposals in the community for high availability. But > they're not so complete or just for active-standby high availability. So we > propose a multi-active high availability design to achieve the following > goals: > # One or more servers will serve the client requests at the same time. > # Sessions are allocated among different servers. > # When one node crashes, the affected sessions will be moved to other active > services. > Here's our design document, please review and comment: > https://docs.google.com/document/d/1bD3qYZpw14_NuCcSGUOfqQ0pqvSbCQsOLFuZp26Ohjc/edit?usp=sharing > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (LIVY-718) Support multi-active high availability in Livy
[ https://issues.apache.org/jira/browse/LIVY-718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17006069#comment-17006069 ] Yiheng Wang commented on LIVY-718: -- bq. When a server fails, its sessions become unavailable until other servers are designated to handle them. This was not acceptable behavior, at least for clusters that I worked with in my previous job. [~meisam] Currently, Livy only supports single node failure recover. Do you use Livy in that cluster? If so, would you like to share your solution? > Support multi-active high availability in Livy > -- > > Key: LIVY-718 > URL: https://issues.apache.org/jira/browse/LIVY-718 > Project: Livy > Issue Type: Epic > Components: RSC, Server >Reporter: Yiheng Wang >Priority: Major > > In this JIRA we want to discuss how to implement multi-active high > availability in Livy. > Currently, Livy only supports single node recovery. This is not sufficient in > some production environments. In our scenario, the Livy server serves many > notebook and JDBC services. We want to make Livy service more fault-tolerant > and scalable. > There're already some proposals in the community for high availability. But > they're not so complete or just for active-standby high availability. So we > propose a multi-active high availability design to achieve the following > goals: > # One or more servers will serve the client requests at the same time. > # Sessions are allocated among different servers. > # When one node crashes, the affected sessions will be moved to other active > services. > Here's our design document, please review and comment: > https://docs.google.com/document/d/1bD3qYZpw14_NuCcSGUOfqQ0pqvSbCQsOLFuZp26Ohjc/edit?usp=sharing > -- This message was sent by Atlassian Jira (v8.3.4#803005)