You can load the hadoop dependency directly from the elastic search maven 
repository if you like.
I’m using the snapshot builds since it fixes a few issues that I’ve been 
testing with costin @ elastic recently.

In your interpreter, you will want to set a new flag of es.nodes and list your 
comma separated elastic search IP addresses or host names.

I.e, you can do this (assuming you want to use the native elasticsearch-spark 
approach - which is preferred over hadoop reader/writers)


%dep

z.addRepo("Sonatype 
snapshot").url("https://oss.sonatype.org/content/repositories/snapshots";).snapshot
z.load("org.elasticsearch::elasticsearch-spark:2.2.0.BUILD-SNAPSHOT")



%spark

val query = “{ some ES style query string here }”

val RDD = sc.esJsonRDD("evo-session-reader/session", query ) // this returns 
the original json, if you omit query, it will assume match_all
val RDD2 = sc.esRDD("evo-session-reader/session", query )  // this returns a 
Map[String,AnyRef]







Best
Jeff Steinmetz


On 11/10/15, 1:45 PM, "SiS" <n...@cht3.com> wrote:

>Hi Everybody, 
>
>through the effect that I’m new to Spark and Zeppelin I hope my question I 
>have is here in the right place. 
>I played around with Zeppelin and Spark and tried to load data by connection 
>to an elasticsearch cluster. 
>But to be honest I have no clue how I have to setup zeppelin or the notebook 
>to use the elasticsearch/hadoop/spark
>library (jar) so I’m able to connect using pyspark. 
>Do I have to copy the jar somewhere in the zeppelin folders?
>
>My plan is to transfer an index/type from elasticsearch to datafframes in 
>spark.
>
>Did somebody give me a short explanation for setting this up? Or could point 
>me to the right documentation.
>
>Any help would be appreciated.
>
>Thanks a lot
>Sven

Reply via email to