I also saw an example you posted regarding %dep and python This example %dep z.load("org.apache.spark:spark-streaming-kafka_2.10:1.5.1”)
works even if you remove the %dep. from pyspark.streaming.kafka import KafkaUtils This import will always resolve – likely because it is part of the spark assembly already. Give it a try – reset the interpreter, and just run (with no z.load(…): %pyspark from pyspark.streaming.kafka import KafkaUtils from pyspark.streaming import StreamingContext So – still looking for a real world example of an external dependency loaded in %dep that is demonstrates best practice around %pyspark dependency loading. I’ll stay tuned – and continue to dig around a bit. Next step is to start over and try a no frills basic install with z-manager Jeff From: moon soo Lee Reply-To: <users@zeppelin.incubator.apache.org> Date: Thursday, October 29, 2015 at 8:00 PM To: <users@zeppelin.incubator.apache.org> Subject: Re: pyspark with jar Hi, Thanks for the question. Actually, %pyspark runs in the same JVM process that %spark runs. And it shares a single SparkContext instance. (although %pyspark runs additional python process) Libraries loaded from %dep should be available in %pyspark, too. interpreter property 'spark.home' is little bit confusing with SPARK_HOME. At the moment, defining SPARK_HOME in conf/zeppelin-env.sh is recommended instead of spark.home. Best, moon On Fri, Oct 30, 2015 at 2:44 AM Jeff Steinmetz <jeffrey.steinm...@gmail.com> wrote: That’s a good pointer. Question still stands, how do you load libraries (jars) for %pyspark? Its clear how to do it for %spark (scala) via %dep. Looking for the equivalent of: ./bin/pyspark --master local[2] --jars jars/elasticsearch-hadoop-2.1.0.Beta2.jar From: Matt Sochor Reply-To: <users@zeppelin.incubator.apache.org> Date: Thursday, October 29, 2015 at 3:19 PM To: <users@zeppelin.incubator.apache.org> Subject: Re: pyspark with jar I actually *just* figured it out. Zeppelin has sqlContext "already created and exposed" (https://zeppelin.incubator.apache.org/docs/interpreter/spark.html). So when I do "sqlContext = SQLContext(sc)" I overwrite sqlContext. Then Zeppelin cannot see this new sqlContext. Anyway, anyone out there experiencing this problem, do NOT initialize sqlContext and it works fine. On Thu, Oct 29, 2015 at 6:10 PM Jeff Steinmetz <jeffrey.steinm...@gmail.com> wrote: In zeppelin, what is the equivalent to adding jars in a pyspark call? Such as running pyspark with the elasticsearch-hadoop jar ./bin/pyspark --master local[2] --jars jars/elasticsearch-hadoop-2.1.0.Beta2.jar My assumption is that loading something like this inside a %dep is pointless, since those dependencies would only live in the %spark scala world (the spark jvm). In zeppelin - pyspark spawns a separate process. Also how is the interpreters “spark.home” used? How is it different that the “SPARK_HOME” zeppelin-env.sh And finally – how are args used in the interpreter? (what uses them)? Thank you. Jeff -- Best regards, Matt Sochor Data Scientist Mobile Defense Mobile +1 215 307 7768 This email and any of its attachments may contain Mobile Defense Inc. proprietary information, which is privileged, confidential, or subject to copyright belonging to Mobile Defense Inc. This email is intended solely for the use of the individuals or entities to which it is addressed by Mobile Defense Inc. If you are not the intended recipient of this email, you are hereby notified that any dissemination, distribution, copying, or action taken in relation to the contents of and attachments to this email is strictly prohibited and may be unlawful. If you have received this email in error, please notify the sender immediately and permanently delete the original and any copy of this email and any printout.