Re: pyspark with jar

Jeff Steinmetz Thu, 29 Oct 2015 23:15:07 -0700

Update -  I have it working now.
%dep loading elasticsearch-hadoop then using %pyspark works.


Tried it with spark 1.3.1 via z-manager using a vanilla install

Thanks again for the pointers.  I was originally trying to use zeppelin 0.5.0 
with spark 1.4

The version that I have working via z-manager looks like a zeppelin 0.6.0 
snapshot build.  Spark 1.3.1, hadoop 2.4.0
With:

%dep
z.load("org.elasticsearch:elasticsearch-hadoop:2.2.0-beta1")
z.load("org.elasticsearch::elasticsearch-spark:2.2.0-beta1”)

Best
Jeff



From:  moon soo Lee
Reply-To:  <users@zeppelin.incubator.apache.org>
Date:  Thursday, October 29, 2015 at 8:00 PM
To:  <users@zeppelin.incubator.apache.org>
Subject:  Re: pyspark with jar

Hi,

Thanks for the question.

Actually, %pyspark runs in the same JVM process that %spark runs. And it shares 
a single SparkContext instance. (although %pyspark runs additional python 
process)
Libraries loaded from %dep should be available in %pyspark, too.

interpreter property 'spark.home' is little bit confusing with SPARK_HOME.
At the moment, defining SPARK_HOME in conf/zeppelin-env.sh is recommended 
instead of spark.home.

Best,
moon

On Fri, Oct 30, 2015 at 2:44 AM Jeff Steinmetz <jeffrey.steinm...@gmail.com> 
wrote:
That’s a good pointer.
Question still stands, how do you load libraries (jars) for %pyspark?

Its clear how to do it for %spark (scala) via %dep.

Looking for the equivalent of:

./bin/pyspark --master local[2] --jars jars/elasticsearch-hadoop-2.1.0.Beta2.jar


From:  Matt Sochor
Reply-To:  <users@zeppelin.incubator.apache.org>
Date:  Thursday, October 29, 2015 at 3:19 PM
To:  <users@zeppelin.incubator.apache.org>
Subject:  Re: pyspark with jar

I actually *just* figured it out.  Zeppelin has sqlContext "already created and 
exposed" (https://zeppelin.incubator.apache.org/docs/interpreter/spark.html).

So when I do "sqlContext = SQLContext(sc)" I overwrite sqlContext.  Then 
Zeppelin cannot see this new sqlContext.

Anyway, anyone out there experiencing this problem, do NOT initialize 
sqlContext and it works fine.  

On Thu, Oct 29, 2015 at 6:10 PM Jeff Steinmetz <jeffrey.steinm...@gmail.com> 
wrote:
In zeppelin, what is the equivalent to adding jars in a pyspark call?

Such as running pyspark with the elasticsearch-hadoop jar

./bin/pyspark --master local[2] --jars jars/elasticsearch-hadoop-2.1.0.Beta2.jar

My assumption is that loading something like this inside a %dep is pointless, 
since those dependencies would only live in the %spark scala world (the spark 
jvm).  In zeppelin - pyspark spawns a separate process.

Also how is the interpreters “spark.home” used?  How is it different that the  
“SPARK_HOME” zeppelin-env.sh
And finally – how are args used in the interpreter?  (what uses them)?

Thank you.
Jeff
-- 
Best regards,

Matt Sochor
Data Scientist
Mobile Defense

Mobile +1 215 307 7768


This email and any of its attachments may contain Mobile Defense Inc. 
proprietary information, which is privileged, confidential, or subject to 
copyright belonging to Mobile Defense Inc. This email is intended solely for 
the use of the individuals or entities to which it is addressed by Mobile 
Defense Inc. If you are not the intended recipient of this email, you are 
hereby notified that any dissemination, distribution, copying, or action taken 
in relation to the contents of and attachments to this email is strictly 
prohibited and may be unlawful. If you have received this email in error, 
please notify the sender immediately and permanently delete the original and 
any copy of this email and any printout.

Re: pyspark with jar

Reply via email to