[jira] [Commented] (SPARK-17313) Support spark-shell on cluster mode

2016-08-31 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15451797#comment-15451797
 ] 

Sean Owen commented on SPARK-17313:
---

You don't have to run the driver on a laptop or other development machine. You 
can run it on a cluster machine even in client mode, BTW, the same machine 
you'll be running it on in any YARN container.

Supporting the shell in yarn-cluster mode is not the same as using Livy, but it 
accomplishes something like a similar purpose. Why not use Livy? if that 
addresses this use case, then I'd close this.

> Support spark-shell on cluster mode
> ---
>
> Key: SPARK-17313
> URL: https://issues.apache.org/jira/browse/SPARK-17313
> Project: Spark
>  Issue Type: New Feature
>Reporter: Mahmoud Elgamal
>
> The main issue with the current spark shell is that the driver is running on 
> the user machine. If the driver resource requirement is beyond user machine 
> capacity, then spark shell will be useless. If we are to add the cluster 
> mode(Yarn or Mesos ) for spark shell via some sort of proxy where user 
> machine only hosts a rest client to the running driver at the cluster, the 
> shell will be more powerful



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17313) Support spark-shell on cluster mode

2016-08-31 Thread Sameh El-Ansary (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15451785#comment-15451785
 ] 

Sameh El-Ansary commented on SPARK-17313:
-

1- WHY? (The driver is not doing much work)
True but in many development/education situations, a cloud-deployed cluster is 
shared between students/developers, on shared large data sets with a large 
number of executors. Running the driver on the development machine (even for 
exploratory purposes) can be problematic due to memory issues on the 
workstation or many yarn executors talking to one driver on a remote connection.

2- HOW?: about interacting with local machine shell when the driver in the 
cluster
Short answer: Through Livy or similar
https://github.com/cloudera/livy

More details here:
Dummy REPL+RestClient  ———> REST——>  RestServer+Driver 
   (workstation)
(cluster node) 

Since yarn allocates nodes dynamically on the cluster, and usually the 
workstation machine is not open to all cluster nodes, one needs a Proxy server 
that would run typically on the Yarn application master. Thus the architecture 
would be:

Dummy REPL+RestClient  ———> REST ——> Proxy Sever --REST-->  
RestServer+Driver 
   (workstation)  (App Master 
Node)(cluster node) 

Livy provides the RestClient, Proxy Server and RestServer+Driver. Adding a Livy 
RestClient to spark-shell would make it capable of operating in yarn-cluster 
mode.

3- Zeppelin or spark-shell
Using Zeppelin or Spark-shell for exploratory work is a matter of taste IMHO. 
The simplicity of the shell and the power of Zeppelin, each, are needed in 
different times as per case at hand. 

4- Yarn-cluster on Zeppelin
True, yarn-cluster has not been supported in Zeppelin for a long time, but it 
has  recently been added using Livy 
(https://github.com/apache/zeppelin/pull/827),  where the submitter of the 
issue has contributed a bit. 
Similarly, it would be good to be support that from the spark-shell.

> Support spark-shell on cluster mode
> ---
>
> Key: SPARK-17313
> URL: https://issues.apache.org/jira/browse/SPARK-17313
> Project: Spark
>  Issue Type: New Feature
>Reporter: Mahmoud Elgamal
>
> The main issue with the current spark shell is that the driver is running on 
> the user machine. If the driver resource requirement is beyond user machine 
> capacity, then spark shell will be useless. If we are to add the cluster 
> mode(Yarn or Mesos ) for spark shell via some sort of proxy where user 
> machine only hosts a rest client to the running driver at the cluster, the 
> shell will be more powerful



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17313) Support spark-shell on cluster mode

2016-08-31 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15451428#comment-15451428
 ] 

Sean Owen commented on SPARK-17313:
---

I don't think this is what the OP means. This is about 'yarn-client' vs 
'yarn-cluster' mode in Spark. The shell has to run in yarn-client mode because 
you're presenting an interactive shell to a user on machine X and so the driver 
needs to run on machine X. Otherwise it's possible in yarn-cluster mode to run 
the Spark job from machine X but have it start the driver within the cluster. 
You can't use Zeppelin with 'yarn-cluster' mode right now either.

> Support spark-shell on cluster mode
> ---
>
> Key: SPARK-17313
> URL: https://issues.apache.org/jira/browse/SPARK-17313
> Project: Spark
>  Issue Type: New Feature
>Reporter: Mahmoud Elgamal
>
> The main issue with the current spark shell is that the driver is running on 
> the user machine. If the driver resource requirement is beyond user machine 
> capacity, then spark shell will be useless. If we are to add the cluster 
> mode(Yarn or Mesos ) for spark shell via some sort of proxy where user 
> machine only hosts a rest client to the running driver at the cluster, the 
> shell will be more powerful



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17313) Support spark-shell on cluster mode

2016-08-30 Thread Ewan Leith (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15450194#comment-15450194
 ] 

Ewan Leith commented on SPARK-17313:


I think Apache Zeppelin and Spark Notebook both cover this requirement better 
than the Spark shell ever will? The installation requirements for either are 
fairly minimal and give you all sorts of additional benefits over the raw shell.

> Support spark-shell on cluster mode
> ---
>
> Key: SPARK-17313
> URL: https://issues.apache.org/jira/browse/SPARK-17313
> Project: Spark
>  Issue Type: New Feature
>Reporter: Mahmoud Elgamal
>
> The main issue with the current spark shell is that the driver is running on 
> the user machine. If the driver resource requirement is beyond user machine 
> capacity, then spark shell will be useless. If we are to add the cluster 
> mode(Yarn or Mesos ) for spark shell via some sort of proxy where user 
> machine only hosts a rest client to the running driver at the cluster, the 
> shell will be more powerful



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17313) Support spark-shell on cluster mode

2016-08-30 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15449684#comment-15449684
 ] 

Sean Owen commented on SPARK-17313:
---

The problem is: how are you going to interact with a shell on your local 
machine when the driver is somewhere else? It's not impossible but not clear 
it's worthwhile. The driver is in general not doing much work, or shouldn't be; 
the shell is more for exploration that production.

> Support spark-shell on cluster mode
> ---
>
> Key: SPARK-17313
> URL: https://issues.apache.org/jira/browse/SPARK-17313
> Project: Spark
>  Issue Type: New Feature
>Reporter: Mahmoud Elgamal
>
> The main issue with the current spark shell is that the driver is running on 
> the user machine. If the driver resource requirement is beyond user machine 
> capacity, then spark shell will be useless. If we are to add the cluster 
> mode(Yarn or Mesos ) for spark shell via some sort of proxy where user 
> machine only hosts a rest client to the running driver at the cluster, the 
> shell will be more powerful



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org