So I'm giving a talk at the Spark summit on using Spark & ElasticSearch, but for now if you want to see a simple demo which uses elasticsearch for geo input you can take a look at my quick & dirty implementation with TopTweetsInALocation ( https://github.com/holdenk/elasticsearchspark/blob/master/src/main/scala/com/holdenkarau/esspark/TopTweetsInALocation.scala ). This approach uses the ESInputFormat which avoids the difficulty of having to manually create ElasticSearch clients.
This approach might not work for your data, e.g. if you need to create a query for each record in your RDD. If this is the case, you could instead look at using mapPartitions and setting up your Elasticsearch connection inside of that, so you could then re-use the client for all of the queries on each partition. This approach will avoid having to serialize the Elasticsearch connection because it will be local to your function. Hope this helps! Cheers, Holden :) On Tue, Jun 24, 2014 at 4:28 PM, Mayur Rustagi <mayur.rust...@gmail.com> wrote: > Its not used as default serializer for some issues with compatibility & > requirement to register the classes.. > > Which part are you getting as nonserializable... you need to serialize > that class if you are sending it to spark workers inside a map, reduce , > mappartition or any of the operations on RDD. > > > Mayur Rustagi > Ph: +1 (760) 203 3257 > http://www.sigmoidanalytics.com > @mayur_rustagi <https://twitter.com/mayur_rustagi> > > > > On Wed, Jun 25, 2014 at 4:52 AM, Peng Cheng <pc...@uow.edu.au> wrote: > >> I'm afraid persisting connection across two tasks is a dangerous act as >> they >> can't be guaranteed to be executed on the same machine. Your ES server may >> think its a man-in-the-middle attack! >> >> I think its possible to invoke a static method that give you a connection >> in >> a local 'pool', so nothing will sneak into your closure, but its too >> complex >> and there should be a better option. >> >> Never use kryo before, if its that good perhaps we should use it as the >> default serializer >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/ElasticSearch-enrich-tp8209p8222.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> > > -- Cell : 425-233-8271