Re: Setting Zeppelin to work with multiple Hadoop clusters when running Spark.

2017-03-26 Thread Serega Sheypak
https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html You don't have to rely on single NN. You can specify a kind of "NN HA alias" and underlying HDFS client would connect to NN which is active right now. Thanks for pointing HADOOP_CONF_DIR, seems

Re: Why does zeppelin try to do during web application startup?

2017-03-26 Thread Иван Шаповалов
As a part of application Helium is downloading and installing npm & node of a version it needs during the startup. This greatly increases startup and may be one of the reasons. 2017-03-26 14:51 GMT+03:00 Serega Sheypak : > Hi, I'm trying run Zeppelin 0.8.0-SNAPSHOT in

Re: Setting Zeppelin to work with multiple Hadoop clusters when running Spark.

2017-03-26 Thread Jianfeng (Jeff) Zhang
What do you mean non-reliable ? If you want to read/write 2 hadoop cluster in one program, I am afraid this is the only way. It is impossible to specify multiple HADOOP_CONF_DIR under one jvm classpath. Only one default configuration will be used. Best Regard, Jeff Zhang From: Serega

Re: Zeppelin out of memory issue - (GC overhead limit exceeded)

2017-03-26 Thread Jianfeng (Jeff) Zhang
I verify it in master branch, it works for me. Set it in interpreter setting page as following. [cid:8CB49F76-39F5-4A53-816B-9E47F7993050] Best Regard, Jeff Zhang From: RUSHIKESH RAUT > Reply-To:

Re: Zeppelin out of memory issue - (GC overhead limit exceeded)

2017-03-26 Thread RUSHIKESH RAUT
Thanks Jianfeng, But i am still not able to solve the issue. I have set it to 4g but still no luck.Can you please explain it to me how can I set SPARK_DRIVER_MEMORY property. Also as I have read that GC overhead limit exceeded error occurs when the heap memory is insufficient. So How can I

Why does zeppelin try to do during web application startup?

2017-03-26 Thread Serega Sheypak
Hi, I'm trying run Zeppelin 0.8.0-SNAPSHOT in Docker. Startup takes forever. It starts in seconds when launched on host, not in Docker container. I suspect Docker container has poorly configured network and some part of zeppelin tries to reach remote resource. SLF4J: See

Re: Setting Zeppelin to work with multiple Hadoop clusters when running Spark.

2017-03-26 Thread Serega Sheypak
I know it, thanks, but it's non reliable solution. 2017-03-26 5:23 GMT+02:00 Jianfeng (Jeff) Zhang : > > You can try to specify the namenode address for hdfs file. e.g > > spark.read.csv(“hdfs://localhost:9009/file”) > > Best Regard, > Jeff Zhang > > > From: Serega

Re: Zeppelin out of memory issue - (GC overhead limit exceeded)

2017-03-26 Thread Jianfeng (Jeff) Zhang
This is a bug of zeppelin. spark.driver.memory won't take effect. As for now it isn't passed to spark through -conf parameter. See https://issues.apache.org/jira/browse/ZEPPELIN-1263 The workaround is to specify SPARK_DRIVER_MEMORY in interpreter setting page. Best Regard, Jeff Zhang From:

Re: Zeppelin out of memory issue - (GC overhead limit exceeded)

2017-03-26 Thread Eric Charles
You don't have to set spark.driver.memory with -X... but simply with memory size. Look at http://spark.apache.org/docs/latest/configuration.html spark.driver.memory 1g Amount of memory to use for the driver process, i.e. where SparkContext is initialized. (e.g. 1g, 2g). Note: In client