Hi Jhang, What is the expected behavior with standalone cluster mode? Should we see separate driver processes in the cluster(one per user) or multiple SparkSubmit processes?
I was trying to dig in Zeppelin code & didn’t see where Zeppelin does the Spark-submit to the cluster? Can you please point to it? Thanks Ankit > On Mar 13, 2018, at 5:25 PM, Jeff Zhang <zjf...@gmail.com> wrote: > > > ZEPPELIN-2898 is for yarn cluster model. And Zeppelin have integration test > for yarn mode, so guaranteed it would work. But don't' have test for > standalone, so not sure the behavior of standalone mode. > > > Ruslan Dautkhanov <dautkha...@gmail.com>于2018年3月14日周三 上午8:06写道: >> https://github.com/apache/zeppelin/pull/2577 pronounces yarn-cluster in it's >> title so I assume it's only yarn-cluster. >> Never used standalone-cluster myself. >> >> Which distro of Hadoop do you use? >> Cloudera desupported standalone in CDH 5.5 and will remove in CDH 6. >> https://www.cloudera.com/documentation/enterprise/release-notes/topics/rg_deprecated.html >> >> >> >> -- >> Ruslan Dautkhanov >> >>> On Tue, Mar 13, 2018 at 5:45 PM, Jhon Anderson Cardenas Diaz >>> <jhonderson2...@gmail.com> wrote: >> >>> Does this new feature work only for yarn-cluster ?. Or for spark standalone >>> too ? >> >>> El mar., 13 de mar. de 2018 18:34, Ruslan Dautkhanov <dautkha...@gmail.com> >>> escribió: >> >>>> > Zeppelin version: 0.8.0 (merged at September 2017 version) >>>> >>>> https://issues.apache.org/jira/browse/ZEPPELIN-2898 was merged end of >>>> September so not sure if you have that. >>>> >>>> Check out >>>> https://medium.com/@zjffdu/zeppelin-0-8-0-new-features-ea53e8810235 how to >>>> set this up. >>>> >> >>>> >>>> -- >>>> Ruslan Dautkhanov >>>> >> >>>>> On Tue, Mar 13, 2018 at 5:24 PM, Jhon Anderson Cardenas Diaz >>>>> <jhonderson2...@gmail.com> wrote: >> >>>>> Hi zeppelin users ! >>>>> >>>>> I am working with zeppelin pointing to a spark in standalone. I am trying >>>>> to figure out a way to make zeppelin runs the spark driver outside of >>>>> client process that submits the application. >>>>> >>>>> According with the documentation >>>>> (http://spark.apache.org/docs/2.1.1/spark-standalone.html): >>>>> >>>>> For standalone clusters, Spark currently supports two deploy modes. In >>>>> client mode, the driver is launched in the same process as the client >>>>> that submits the application. In cluster mode, however, the driver is >>>>> launched from one of the Worker processes inside the cluster, and the >>>>> client process exits as soon as it fulfills its responsibility of >>>>> submitting the application without waiting for the application to finish. >>>>> >>>>> The problem is that, even when I set the properties for spark-standalone >>>>> cluster and deploy mode in cluster, the driver still run inside zeppelin >>>>> machine (according with spark UI/executors page). These are properties >>>>> that I am setting for the spark interpreter: >>>>> >>>>> master: spark://<master-name>:7077 >>>>> spark.submit.deployMode: cluster >>>>> spark.executor.memory: 16g >>>>> >>>>> Any ideas would be appreciated. >>>>> >>>>> Thank you >>>>> >>>>> Details: >>>>> Spark version: 2.1.1 >>>>> Zeppelin version: 0.8.0 (merged at September 2017 version)