Hi, I am trying to write JavaPairRDD into elasticsearch 1.7 using spark 1.2.1 using elasticsearch-hadoop 2.0.2
JavaPairRDD<NullWritable, Text> output = ... final JobConf jc = new JobConf(output.context().hadoopConfiguration()); jc.set("mapred.output.format.class", "org.elasticsearch.hadoop.mr.EsOutputFormat"); jc.setOutputCommitter(FileOutputCommitter.class); jc.set(ConfigurationOptions.ES_RESOURCE_WRITE, "streaming-chck/checks"); jc.set(ConfigurationOptions.ES_NODES, "localhost:9200"); FileOutputFormat.setOutputPath(jc, new Path("-")); output.saveAsHadoopDataset(jc); I am getting following error 16/03/19 13:54:51 ERROR Executor: Exception in task 3.0 in stage 14.0 (TID 279) org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest: Found unrecoverable error [Bad Request(400) - [MapperParsingException[Malformed content, must start with an object]]]; Bailing out.. at org.elasticsearch.hadoop.rest.RestClient.retryFailedEntries(RestClient.java:199) at org.elasticsearch.hadoop.rest.RestClient.bulk(RestClient.java:165) at org.elasticsearch.hadoop.rest.RestRepository.sendBatch(RestRepository.java:170) at org.elasticsearch.hadoop.rest.RestRepository.close(RestRepository.java:189) at org.elasticsearch.hadoop.mr.EsOutputFormat$EsRecordWriter.doClose(EsOutputFormat.java:293) at org.elasticsearch.hadoop.mr.EsOutputFormat$EsRecordWriter.close(EsOutputFormat.java:280) at org.apache.spark.SparkHadoopWriter.close(SparkHadoopWriter.scala:102) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1072) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1051) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) at org.apache.spark.scheduler.Task.run(Task.scala:56) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:200) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744)16/03/19 13:54:51 ERROR Executor: Exception in task 3.0 in stage 14.0 (TID 279) org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest: Found unrecoverable error [Bad Request(400) - [MapperParsingException[Malformed content, must start with an object]]]; Bailing out.. at org.elasticsearch.hadoop.rest.RestClient.retryFailedEntries(RestClient.java:199) at org.elasticsearch.hadoop.rest.RestClient.bulk(RestClient.java:165) at org.elasticsearch.hadoop.rest.RestRepository.sendBatch(RestRepository.java:170) at org.elasticsearch.hadoop.rest.RestRepository.close(RestRepository.java:189) at org.elasticsearch.hadoop.mr.EsOutputFormat$EsRecordWriter.doClose(EsOutputFormat.java:293) at org.elasticsearch.hadoop.mr.EsOutputFormat$EsRecordWriter.close(EsOutputFormat.java:280) at org.apache.spark.SparkHadoopWriter.close(SparkHadoopWriter.scala:102) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1072) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1051) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) at org.apache.spark.scheduler.Task.run(Task.scala:56) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:200) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Seems that I don't have correct Writable here or missing some setting when trying to write json serialized messages as Strings/Text. Could anybody help? Thanks Jakub