First off, I'd recommend using the latest es-hadoop beta (2.1.0.Beta3) or even better, the dev build [1]. Second, using the native Java/Scala API [2] since the configuration and performance are both easier. Third, when you are using JSON input, tell es-hadoop/spark that. the connector can work with both objects (the default) or raw json.
It so just happens, the es-hadoop connector describes the above here [3] :). Hope this helps, [1] http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/master/install.html#download-dev [2] http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/master/spark.html#spark-native [3] http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/master/spark.html#spark-write-json On 2/10/15 6:58 PM, shahid ashraf wrote:
thanks costin i m grouping data together based on id in json and rdd contains rdd = (1,{'SOURCES': [{n no. of key/valu}],}),(2,{'SOURCES': [{n no. of key/valu}],}),(3,{'SOURCES': [{n no. of key/valu}],}),(4,{'SOURCES': [{n no. of key/valu}],}) rdd.saveAsNewAPIHadoopFile( path='-', outputFormatClass="org.elasticsearch.hadoop.mr.EsOutputFormat", keyClass="org.apache.hadoop.io.NullWritable", valueClass="org.elasticsearch.hadoop.mr.LinkedMapWritable", conf={ "es.nodes" : "localhost", "es.port" : "9200", "es.resource" : "shahid/hcp_id" }) spark-1.1.0-bin-hadoop1 java version "1.7.0_71" elasticsearch-1.4.2 elasticsearch-hadoop-2.1.0.Beta2.jar On Tue, Feb 10, 2015 at 10:05 PM, Costin Leau <costin.l...@gmail.com <mailto:costin.l...@gmail.com>> wrote: Sorry but there's too little information in this email to make any type of assesment. Can you please describe what you are trying to do, what version of Elastic and es-spark are you suing and potentially post a snippet of code? What does your RDD contain? On 2/10/15 6:05 PM, shahid wrote: INFO scheduler.TaskSetManager: Starting task 2.1 in stage 2.0 (TID 9, ip-10-80-98-118.ec2.internal, PROCESS_LOCAL, 1025 bytes) 15/02/10 15:54:08 INFO scheduler.TaskSetManager: Lost task 1.0 in stage 2.0 (TID 6) on executor ip-10-80-15-145.ec2.internal: org.apache.spark.__SparkException (Data of type java.util.ArrayList cannot be used) [duplicate 1] 15/02/10 15:54:08 INFO scheduler.TaskSetManager: Starting task 1.1 in stage 2.0 (TID 10, ip-10-80-15-145.ec2.internal, PROCESS_LOCAL, 1025 bytes) -- View this message in context: http://apache-spark-user-list.__1001560.n3.nabble.com/__Exception-when-trying-to-use-__EShadoop-connector-and-__writing-rdd-to-ES-tp21579.html <http://apache-spark-user-list.1001560.n3.nabble.com/Exception-when-trying-to-use-EShadoop-connector-and-writing-rdd-to-ES-tp21579.html> Sent from the Apache Spark User List mailing list archive at Nabble.com. -- Costin -- with Regards Shahid Ashraf
-- Costin --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org