[jira] [Commented] (SPARK-17313) Support spark-shell on cluster mode
[ https://issues.apache.org/jira/browse/SPARK-17313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15451797#comment-15451797 ] Sean Owen commented on SPARK-17313: --- You don't have to run the driver on a laptop or other development machine. You can run it on a cluster machine even in client mode, BTW, the same machine you'll be running it on in any YARN container. Supporting the shell in yarn-cluster mode is not the same as using Livy, but it accomplishes something like a similar purpose. Why not use Livy? if that addresses this use case, then I'd close this. > Support spark-shell on cluster mode > --- > > Key: SPARK-17313 > URL: https://issues.apache.org/jira/browse/SPARK-17313 > Project: Spark > Issue Type: New Feature >Reporter: Mahmoud Elgamal > > The main issue with the current spark shell is that the driver is running on > the user machine. If the driver resource requirement is beyond user machine > capacity, then spark shell will be useless. If we are to add the cluster > mode(Yarn or Mesos ) for spark shell via some sort of proxy where user > machine only hosts a rest client to the running driver at the cluster, the > shell will be more powerful -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17313) Support spark-shell on cluster mode
[ https://issues.apache.org/jira/browse/SPARK-17313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15451785#comment-15451785 ] Sameh El-Ansary commented on SPARK-17313: - 1- WHY? (The driver is not doing much work) True but in many development/education situations, a cloud-deployed cluster is shared between students/developers, on shared large data sets with a large number of executors. Running the driver on the development machine (even for exploratory purposes) can be problematic due to memory issues on the workstation or many yarn executors talking to one driver on a remote connection. 2- HOW?: about interacting with local machine shell when the driver in the cluster Short answer: Through Livy or similar https://github.com/cloudera/livy More details here: Dummy REPL+RestClient ———> REST——> RestServer+Driver (workstation) (cluster node) Since yarn allocates nodes dynamically on the cluster, and usually the workstation machine is not open to all cluster nodes, one needs a Proxy server that would run typically on the Yarn application master. Thus the architecture would be: Dummy REPL+RestClient ———> REST ——> Proxy Sever --REST--> RestServer+Driver (workstation) (App Master Node)(cluster node) Livy provides the RestClient, Proxy Server and RestServer+Driver. Adding a Livy RestClient to spark-shell would make it capable of operating in yarn-cluster mode. 3- Zeppelin or spark-shell Using Zeppelin or Spark-shell for exploratory work is a matter of taste IMHO. The simplicity of the shell and the power of Zeppelin, each, are needed in different times as per case at hand. 4- Yarn-cluster on Zeppelin True, yarn-cluster has not been supported in Zeppelin for a long time, but it has recently been added using Livy (https://github.com/apache/zeppelin/pull/827), where the submitter of the issue has contributed a bit. Similarly, it would be good to be support that from the spark-shell. > Support spark-shell on cluster mode > --- > > Key: SPARK-17313 > URL: https://issues.apache.org/jira/browse/SPARK-17313 > Project: Spark > Issue Type: New Feature >Reporter: Mahmoud Elgamal > > The main issue with the current spark shell is that the driver is running on > the user machine. If the driver resource requirement is beyond user machine > capacity, then spark shell will be useless. If we are to add the cluster > mode(Yarn or Mesos ) for spark shell via some sort of proxy where user > machine only hosts a rest client to the running driver at the cluster, the > shell will be more powerful -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17313) Support spark-shell on cluster mode
[ https://issues.apache.org/jira/browse/SPARK-17313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15451428#comment-15451428 ] Sean Owen commented on SPARK-17313: --- I don't think this is what the OP means. This is about 'yarn-client' vs 'yarn-cluster' mode in Spark. The shell has to run in yarn-client mode because you're presenting an interactive shell to a user on machine X and so the driver needs to run on machine X. Otherwise it's possible in yarn-cluster mode to run the Spark job from machine X but have it start the driver within the cluster. You can't use Zeppelin with 'yarn-cluster' mode right now either. > Support spark-shell on cluster mode > --- > > Key: SPARK-17313 > URL: https://issues.apache.org/jira/browse/SPARK-17313 > Project: Spark > Issue Type: New Feature >Reporter: Mahmoud Elgamal > > The main issue with the current spark shell is that the driver is running on > the user machine. If the driver resource requirement is beyond user machine > capacity, then spark shell will be useless. If we are to add the cluster > mode(Yarn or Mesos ) for spark shell via some sort of proxy where user > machine only hosts a rest client to the running driver at the cluster, the > shell will be more powerful -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17313) Support spark-shell on cluster mode
[ https://issues.apache.org/jira/browse/SPARK-17313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15450194#comment-15450194 ] Ewan Leith commented on SPARK-17313: I think Apache Zeppelin and Spark Notebook both cover this requirement better than the Spark shell ever will? The installation requirements for either are fairly minimal and give you all sorts of additional benefits over the raw shell. > Support spark-shell on cluster mode > --- > > Key: SPARK-17313 > URL: https://issues.apache.org/jira/browse/SPARK-17313 > Project: Spark > Issue Type: New Feature >Reporter: Mahmoud Elgamal > > The main issue with the current spark shell is that the driver is running on > the user machine. If the driver resource requirement is beyond user machine > capacity, then spark shell will be useless. If we are to add the cluster > mode(Yarn or Mesos ) for spark shell via some sort of proxy where user > machine only hosts a rest client to the running driver at the cluster, the > shell will be more powerful -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17313) Support spark-shell on cluster mode
[ https://issues.apache.org/jira/browse/SPARK-17313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15449684#comment-15449684 ] Sean Owen commented on SPARK-17313: --- The problem is: how are you going to interact with a shell on your local machine when the driver is somewhere else? It's not impossible but not clear it's worthwhile. The driver is in general not doing much work, or shouldn't be; the shell is more for exploration that production. > Support spark-shell on cluster mode > --- > > Key: SPARK-17313 > URL: https://issues.apache.org/jira/browse/SPARK-17313 > Project: Spark > Issue Type: New Feature >Reporter: Mahmoud Elgamal > > The main issue with the current spark shell is that the driver is running on > the user machine. If the driver resource requirement is beyond user machine > capacity, then spark shell will be useless. If we are to add the cluster > mode(Yarn or Mesos ) for spark shell via some sort of proxy where user > machine only hosts a rest client to the running driver at the cluster, the > shell will be more powerful -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org