You can load the hadoop dependency directly from the elastic search maven repository if you like. I’m using the snapshot builds since it fixes a few issues that I’ve been testing with costin @ elastic recently.
In your interpreter, you will want to set a new flag of es.nodes and list your comma separated elastic search IP addresses or host names. I.e, you can do this (assuming you want to use the native elasticsearch-spark approach - which is preferred over hadoop reader/writers) %dep z.addRepo("Sonatype snapshot").url("https://oss.sonatype.org/content/repositories/snapshots").snapshot z.load("org.elasticsearch::elasticsearch-spark:2.2.0.BUILD-SNAPSHOT") %spark val query = “{ some ES style query string here }” val RDD = sc.esJsonRDD("evo-session-reader/session", query ) // this returns the original json, if you omit query, it will assume match_all val RDD2 = sc.esRDD("evo-session-reader/session", query ) // this returns a Map[String,AnyRef] Best Jeff Steinmetz On 11/10/15, 1:45 PM, "SiS" <n...@cht3.com> wrote: >Hi Everybody, > >through the effect that I’m new to Spark and Zeppelin I hope my question I >have is here in the right place. >I played around with Zeppelin and Spark and tried to load data by connection >to an elasticsearch cluster. >But to be honest I have no clue how I have to setup zeppelin or the notebook >to use the elasticsearch/hadoop/spark >library (jar) so I’m able to connect using pyspark. >Do I have to copy the jar somewhere in the zeppelin folders? > >My plan is to transfer an index/type from elasticsearch to datafframes in >spark. > >Did somebody give me a short explanation for setting this up? Or could point >me to the right documentation. > >Any help would be appreciated. > >Thanks a lot >Sven