Looking at ExternalSorter.scala line 192, i suspect some input record has Null key.
189 while (records.hasNext) { 190 addElementsRead() 191 kv = records.next() 192 map.changeValue((getPartition(kv._1), kv._1), update) On Sat, Mar 12, 2016 at 12:48 PM, Prabhu Joseph <prabhujose.ga...@gmail.com> wrote: > Looking at ExternalSorter.scala line 192 > > 189 > while (records.hasNext) { addElementsRead() kv = records.next() > map.changeValue((getPartition(kv._1), kv._1), update) > maybeSpillCollection(usingMap = true) } > > On Sat, Mar 12, 2016 at 12:31 PM, Saurabh Guru <saurabh.g...@gmail.com> > wrote: > >> I am seeing the following exception in my Spark Cluster every few days in >> production. >> >> 2016-03-12 05:30:00,541 - WARN TaskSetManager - Lost task 0.0 in stage >> 12528.0 (TID 18792, ip-1X-1XX-1-1XX.us <http://ip-10-180-1-188.us> >> -west-1.compute.internal >> ): java.lang.NullPointerException >> at >> org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:192) >> at >> org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:64) >> at >> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) >> at >> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) >> at org.apache.spark.scheduler.Task.run(Task.scala:89) >> at >> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >> at java.lang.Thread.run(Thread.java:745) >> >> >> I have debugged in local machine but haven’t been able to pin point the >> cause of the error. Anyone knows why this might occur? Any suggestions? >> >> >> Thanks, >> Saurabh >> >> >> >> >