First off, I'd recommend using the latest es-hadoop beta (2.1.0.Beta3) or even 
better, the dev build [1].
Second, using the native Java/Scala API [2] since the configuration and 
performance are both easier.
Third, when you are using JSON input, tell es-hadoop/spark that. the connector 
can work with both objects (the default) or
raw json.

It so just happens, the es-hadoop connector describes the above here [3] :).

Hope this helps,

[1] 
http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/master/install.html#download-dev
[2] 
http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/master/spark.html#spark-native
[3] 
http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/master/spark.html#spark-write-json

On 2/10/15 6:58 PM, shahid ashraf wrote:
thanks costin

i m grouping data together based on id in json and rdd contains
rdd = (1,{'SOURCES': [{n no. of key/valu}],}),(2,{'SOURCES': [{n no. of 
key/valu}],}),(3,{'SOURCES': [{n no. of
key/valu}],}),(4,{'SOURCES': [{n no. of key/valu}],})
rdd.saveAsNewAPIHadoopFile(
             path='-',
             outputFormatClass="org.elasticsearch.hadoop.mr.EsOutputFormat",
             keyClass="org.apache.hadoop.io.NullWritable",
             valueClass="org.elasticsearch.hadoop.mr.LinkedMapWritable",
             conf={
         "es.nodes" : "localhost",
         "es.port" : "9200",
         "es.resource" : "shahid/hcp_id"
     })


spark-1.1.0-bin-hadoop1
java version "1.7.0_71"
elasticsearch-1.4.2
elasticsearch-hadoop-2.1.0.Beta2.jar


On Tue, Feb 10, 2015 at 10:05 PM, Costin Leau <costin.l...@gmail.com 
<mailto:costin.l...@gmail.com>> wrote:

    Sorry but there's too little information in this email to make any type of 
assesment.
    Can you please describe what you are trying to do, what version of Elastic 
and es-spark are you suing
    and potentially post a snippet of code?
    What does your RDD contain?


    On 2/10/15 6:05 PM, shahid wrote:

        INFO scheduler.TaskSetManager: Starting task 2.1 in stage 2.0 (TID 9,
        ip-10-80-98-118.ec2.internal, PROCESS_LOCAL, 1025 bytes)
        15/02/10 15:54:08 INFO scheduler.TaskSetManager: Lost task 1.0 in stage 
2.0
        (TID 6) on executor ip-10-80-15-145.ec2.internal:
        org.apache.spark.__SparkException (Data of type java.util.ArrayList 
cannot be
        used) [duplicate 1]
        15/02/10 15:54:08 INFO scheduler.TaskSetManager: Starting task 1.1 in 
stage
        2.0 (TID 10, ip-10-80-15-145.ec2.internal, PROCESS_LOCAL, 1025 bytes)



        --
        View this message in context:
        
http://apache-spark-user-list.__1001560.n3.nabble.com/__Exception-when-trying-to-use-__EShadoop-connector-and-__writing-rdd-to-ES-tp21579.html
        
<http://apache-spark-user-list.1001560.n3.nabble.com/Exception-when-trying-to-use-EShadoop-connector-and-writing-rdd-to-ES-tp21579.html>
        Sent from the Apache Spark User List mailing list archive at Nabble.com.


    --
    Costin




--
with Regards
Shahid Ashraf

--
Costin

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to