On Tue, Dec 9, 2014 at 11:32 AM, Mohamed Lrhazi <[email protected]> wrote: > While trying simple examples of PySpark code, I systematically get these > failures when I try this.. I dont see any prior exceptions in the output... > How can I debug further to find root cause? > > > es_rdd = sc.newAPIHadoopRDD( > inputFormatClass="org.elasticsearch.hadoop.mr.EsInputFormat", > keyClass="org.apache.hadoop.io.NullWritable", > valueClass="org.elasticsearch.hadoop.mr.LinkedMapWritable", > conf={ > "es.resource" : "en_2014/doc", > "es.nodes":"rap-es2", > "es.query" : """{"query":{"match_all":{}},"fields":["title"], > "size": 100}""" > } > ) > > > titles=es_rdd.map(lambda d: d[1]['title'][0]) > counts = titles.flatMap(lambda x: x.split(' ')).map(lambda x: (x, > 1)).reduceByKey(add) > output = counts.collect() > > > > ... > 14/12/09 19:27:20 INFO BlockManager: Removing broadcast 93 > 14/12/09 19:27:20 INFO BlockManager: Removing block broadcast_93 > 14/12/09 19:27:20 INFO MemoryStore: Block broadcast_93 of size 2448 dropped > from memory (free 274984768) > 14/12/09 19:27:20 INFO ContextCleaner: Cleaned broadcast 93 > 14/12/09 19:27:20 INFO BlockManager: Removing broadcast 92 > 14/12/09 19:27:20 INFO BlockManager: Removing block broadcast_92 > 14/12/09 19:27:20 INFO MemoryStore: Block broadcast_92 of size 163391 > dropped from memory (free 275148159) > 14/12/09 19:27:20 INFO ContextCleaner: Cleaned broadcast 92 > 14/12/09 19:27:20 INFO BlockManager: Removing broadcast 91 > 14/12/09 19:27:20 INFO BlockManager: Removing block broadcast_91 > 14/12/09 19:27:20 INFO MemoryStore: Block broadcast_91 of size 163391 > dropped from memory (free 275311550) > 14/12/09 19:27:20 INFO ContextCleaner: Cleaned broadcast 91 > 14/12/09 19:27:30 ERROR Executor: Exception in task 0.0 in stage 67.0 (TID > 72) > java.lang.UnsupportedOperationException > at java.util.AbstractMap.put(AbstractMap.java:203) > at java.util.AbstractMap.putAll(AbstractMap.java:273) > at > org.elasticsearch.hadoop.mr.EsInputFormat$WritableShardRecordReader.setCurrentValue(EsInputFormat.java:373) > at
It looks like it's a bug in ElasticSearch (EsInputFormat). > org.elasticsearch.hadoop.mr.EsInputFormat$WritableShardRecordReader.setCurrentValue(EsInputFormat.java:322) > at > org.elasticsearch.hadoop.mr.EsInputFormat$ShardRecordReader.next(EsInputFormat.java:299) > at > org.elasticsearch.hadoop.mr.EsInputFormat$ShardRecordReader.nextKeyValue(EsInputFormat.java:227) > at > org.apache.spark.rdd.NewHadoopRDD$$anon$1.hasNext(NewHadoopRDD.scala:138) > at > org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) > at > scala.collection.Iterator$GroupedIterator.takeDestructively(Iterator.scala:913) > at scala.collection.Iterator$GroupedIterator.go(Iterator.scala:929) > at > scala.collection.Iterator$GroupedIterator.fill(Iterator.scala:969) > at > scala.collection.Iterator$GroupedIterator.hasNext(Iterator.scala:972) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) > at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:350) > at scala.collection.Iterator$class.foreach(Iterator.scala:727) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > at > org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:339) > at This means that the task failed when it read the data in EsInputFormat to feed Python mapper. > org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply$mcV$sp(PythonRDD.scala:209) > at > org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:184) > at > org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:184) > at > org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1364) > at > org.apache.spark.api.python.PythonRDD$WriterThread.run(PythonRDD.scala:183) > 14/12/09 19:27:30 INFO TaskSetManager: Starting task 2.0 in stage 67.0 (TID > 74, localhost, ANY, 26266 bytes) > 14/12/09 19:27:30 INFO Executor: Running task 2.0 in stage 67.0 (TID 74) > 14/12/09 19:27:30 WARN TaskSetManager: Lost task 0.0 in stage 67.0 (TID 72, > localhost): java.lang.UnsupportedOperationException: --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
