Hi,
I am trying to write JavaPairRDD into elasticsearch 1.7 using spark 1.2.1
using elasticsearch-hadoop 2.0.2
JavaPairRDD<NullWritable, Text> output = ...
final JobConf jc = new JobConf(output.context().hadoopConfiguration());
jc.set("mapred.output.format.class",
"org.elasticsearch.hadoop.mr.EsOutputFormat");
jc.setOutputCommitter(FileOutputCommitter.class);
jc.set(ConfigurationOptions.ES_RESOURCE_WRITE,
"streaming-chck/checks");
jc.set(ConfigurationOptions.ES_NODES, "localhost:9200");
FileOutputFormat.setOutputPath(jc, new Path("-"));
output.saveAsHadoopDataset(jc);
I am getting following error
16/03/19 13:54:51 ERROR Executor: Exception in task 3.0 in stage 14.0 (TID 279)
org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest: Found
unrecoverable error [Bad Request(400) -
[MapperParsingException[Malformed content, must start with an
object]]]; Bailing out..
at
org.elasticsearch.hadoop.rest.RestClient.retryFailedEntries(RestClient.java:199)
at org.elasticsearch.hadoop.rest.RestClient.bulk(RestClient.java:165)
at
org.elasticsearch.hadoop.rest.RestRepository.sendBatch(RestRepository.java:170)
at
org.elasticsearch.hadoop.rest.RestRepository.close(RestRepository.java:189)
at
org.elasticsearch.hadoop.mr.EsOutputFormat$EsRecordWriter.doClose(EsOutputFormat.java:293)
at
org.elasticsearch.hadoop.mr.EsOutputFormat$EsRecordWriter.close(EsOutputFormat.java:280)
at org.apache.spark.SparkHadoopWriter.close(SparkHadoopWriter.scala:102)
at
org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1072)
at
org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1051)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at org.apache.spark.scheduler.Task.run(Task.scala:56)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:200)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)16/03/19 13:54:51
ERROR Executor: Exception in task 3.0 in stage 14.0 (TID 279)
org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest: Found
unrecoverable error [Bad Request(400) -
[MapperParsingException[Malformed content, must start with an
object]]]; Bailing out..
at
org.elasticsearch.hadoop.rest.RestClient.retryFailedEntries(RestClient.java:199)
at org.elasticsearch.hadoop.rest.RestClient.bulk(RestClient.java:165)
at
org.elasticsearch.hadoop.rest.RestRepository.sendBatch(RestRepository.java:170)
at
org.elasticsearch.hadoop.rest.RestRepository.close(RestRepository.java:189)
at
org.elasticsearch.hadoop.mr.EsOutputFormat$EsRecordWriter.doClose(EsOutputFormat.java:293)
at
org.elasticsearch.hadoop.mr.EsOutputFormat$EsRecordWriter.close(EsOutputFormat.java:280)
at org.apache.spark.SparkHadoopWriter.close(SparkHadoopWriter.scala:102)
at
org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1072)
at
org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1051)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at org.apache.spark.scheduler.Task.run(Task.scala:56)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:200)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Seems that I don't have correct Writable here or missing some setting when
trying to write json serialized messages as Strings/Text.
Could anybody help?
Thanks
Jakub