I have a 6 node cluster and 1 edge node to access. The edge node has Python
2.7 + NLTK + other libraries + hadoop client and Zeppelin installed. All
hadoop nodes have Python 2.6 and no other additional libraries.

Running Zeppelin and my python code (with NLTK) is running under pyspark
interpreter fine. It must be running locally as I have not distributed the
python libraries to the other nodes yet. I dont see any errors in my Yarn
logs either.

This is my interpreter setup. Can you please tell  me how this is working?

Also, if it is working locally, how to distribute over multiple nodes?


Thanks,

Abhi

spark %spark (default), %pyspark, %sql, %dep edit  restart  remove
Properties
namevalue
args
master yarn-client
spark.app.name Zeppelin-App
spark.cores.max 4
spark.executor.memory 1024m
zeppelin.dep.additionalRemoteRepository spark-packages,
http://dl.bintray.com/spark-packages/maven,false;
zeppelin.dep.localrepo local-repo
zeppelin.pyspark.python /usr/local/bin/python2.7
zeppelin.spark.concurrentSQL true
zeppelin.spark.maxResult 1000
zeppelin.spark.useHiveContext true


-- 
Abhi Basu

Reply via email to