Hi


I am running zeppelin with yarn-client mode.

my hadoop cluster running remotely  with CDH5.4.0  ( spark1.3.0 ) and my
spark cluster  is  yarn based.



zeppelin Installation steps:

git clone https://github.com/apache/incubator-zeppelin

mvn clean package -Pspark-1.3   -Dhadoop.version=2.6.0-cdh5.4.0
-Phadoop-2.6  -Pyarn -DskipTests



Added  below lines in conf/zeppelin-env.sh :



export MASTER=yarn-client

export HADOOP_CONF_DIR=/home/ubuntu/hadoop/





so when I run sample program



%spark

val textFile = sc.textFile("hdfs://master:8020/user/prateek/bigdata.csv", 1)

textFile.count



its show result:

textFile: org.apache.spark.rdd.RDD[String] =
hdfs://master:8020/user/prateek/bigdata.csv MapPartitionsRDD[1] at textFile
at <console>:23

res0: Long = 114955604



Also zeppelin application entry shows on resource manager (
http://ip-address:8080/)



I observe below scenario after execution of sample program:



   - At zeppelin web page show application status is finished but resource
   manager always show status is running.





[image: Inline image 1]




   - If I run other sample program like

%spark



val textFile = sc.textFile("hdfs://master:8020/user/ubuntu/Amalgam_row.csv")

textFile.count



then there is no new application entry show in resource manager and
zeppelin execute program and show result.




*Is above scenario's are default behavior of zeppelin or am I doing
anything wrong, please suggest?*



Regards

Prateek

Reply via email to