I have a 6 node cluster and 1 edge node to access. The edge node has Python 2.7 + NLTK + other libraries + hadoop client and Zeppelin installed. All hadoop nodes have Python 2.6 and no other additional libraries.
Running Zeppelin and my python code (with NLTK) is running under pyspark interpreter fine. It must be running locally as I have not distributed the python libraries to the other nodes yet. I dont see any errors in my Yarn logs either. This is my interpreter setup. Can you please tell me how this is working? Also, if it is working locally, how to distribute over multiple nodes? Thanks, Abhi spark %spark (default), %pyspark, %sql, %dep edit restart remove Properties namevalue args master yarn-client spark.app.name Zeppelin-App spark.cores.max 4 spark.executor.memory 1024m zeppelin.dep.additionalRemoteRepository spark-packages, http://dl.bintray.com/spark-packages/maven,false; zeppelin.dep.localrepo local-repo zeppelin.pyspark.python /usr/local/bin/python2.7 zeppelin.spark.concurrentSQL true zeppelin.spark.maxResult 1000 zeppelin.spark.useHiveContext true -- Abhi Basu