Hi Ashwani,
Please verify input data by ensuring that the data being processed is
valid and free of null values or unexpected data types.
if data undergoes complex transformations before sorting review the
data Transformations, verify that data transformations don't introduce
inconsistencies or null values.

regards,
Guru

On Mon, Nov 11, 2024 at 6:06 PM Ashwani Pundir
<ashwani.pund...@gmail.com> wrote:
>
> Dear Spark Devs,
>
> I am reaching out because I am struggling to find the root cause of the below 
> exception
> (I have spent almost 4 days trying to figure out the root cause of this 
> issue. I tried to search on various forums including StackOverFlow but did 
> not find anyone reported this kind of error)
> -----------------------------------------------------------------
> 2024-11-11 13:56:43,932  [Executor task launch worker for task 0.0 in stage 
> 28.0 (TID 1007)] DEBUG - o.a.s.s.e.c. Compressor for [AggAllLevel]: 
> org.apache.spark.sql.execution.columnar.compression.PassThrough$Encoder@766c1b92,
>  ratio: 1.0
> 2024-11-11 13:56:43,936  [Executor task launch worker for task 0.0 in stage 
> 28.0 (TID 1007)] WARN  - o.a.s.s.BlockManager Putting block rdd_66_0 failed 
> due to exception java.lang.NullPointerException: Cannot invoke 
> "org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.numRecords()"
>  because "this.inMemSorter" is null.
> 2024-11-11 13:56:43,937  [Executor task launch worker for task 0.0 in stage 
> 28.0 (TID 1007)] WARN  - o.a.s.s.BlockManager Block rdd_66_0 could not be 
> removed as it was not found on disk or in memory
> 2024-11-11 13:56:43,937  [dispatcher-BlockManagerMaster] DEBUG - 
> o.a.s.s.BlockManagerMasterEndpoint Updating block info on master rdd_66_0 for 
> BlockManagerId(driver, ibpri011662.emeter.com, 17623, None)
> 2024-11-11 13:56:43,937  [Executor task launch worker for task 0.0 in stage 
> 28.0 (TID 1007)] DEBUG - o.a.s.s.BlockManagerMaster Updated info of block 
> rdd_66_0
> 2024-11-11 13:56:43,937  [Executor task launch worker for task 0.0 in stage 
> 28.0 (TID 1007)] DEBUG - o.a.s.s.BlockManager Told master about block rdd_66_0
> 2024-11-11 13:56:43,941  [Executor task launch worker for task 0.0 in stage 
> 28.0 (TID 1007)] ERROR - o.a.s.e.Executor Exception in task 0.0 in stage 28.0 
> (TID 1007)
> java.lang.NullPointerException: Cannot invoke 
> "org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.numRecords()"
>  because "this.inMemSorter" is null
> at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.insertRecord(UnsafeExternalSorter.java:478)
>  ~[em-spark-assembly.jar!/:?]
> at 
> org.apache.spark.sql.execution.UnsafeExternalRowSorter.insertRow(UnsafeExternalRowSorter.java:138)
>  ~[em-spark-assembly.jar!/:?]
> at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage4.sort_addToSorter_0$(Unknown
>  Source) ~[?:?]
> at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage4.processNext(UDoubleColumnBuildernknown
>  Source) ~[?:?]
> at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>  ~[em-spark-assembly.jar!/:?]
> at 
> org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:43)
>  ~[em-spark-assembly.jar!/:?]
> at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage8.smj_findNextJoinRows_0$(Unknown
>  Source) ~[?:?]
> at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage8.processNext(Unknown
>  Source) ~[?:?]
> at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>  ~[em-spark-assembly.jar!/:?]
> at 
> org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:43)
>  ~[em-spark-assembly.jar!/:?]
> at 
> org.apache.spark.sql.execution.columnar.DefaultCachedBatchSerializer$$anon$1.hasNext(InMemoryRelation.scala:119)
>  ~[em-spark-assembly.jar!/:?]
> at 
> org.apache.spark.sql.execution.columnar.CachedRDDBuilder$$anon$2.hasNext(InMemoryRelation.scala:288)
>  ~[em-spark-assembly.jar!/:?]
> at scala.collection.Iterator$$anon$6.hasNext(Iterator.scala:477) 
> ~[em-spark-assembly.jar!/:?]
> at scala.collection.Iterator$$anon$9.hasNext(Iterator.scala:583) 
> ~[em-spark-assembly.jar!/:?]
> at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificColumnarIterator.hasNext(Unknown
>  Source) ~[?:?]
> at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(Unknown
>  Source) ~[?:?]
> at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>  ~[em-spark-assembly.jar!/:?]
> at 
> org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:43)
>  ~[em-spark-assembly.jar!/:?]
> at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.smj_findNextJoinRows_0$(Unknown
>  Source) ~[?:?]
> at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.processNext(Unknown
>  Source) ~[?:?]
> at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>  ~[em-spark-assembly.jar!/:?]
> at 
> org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:43)
>  ~[em-spark-assembly.jar!/:?]
> at 
> org.apache.spark.sql.execution.columnar.DefaultCachedBatchSerializer$$anon$1.hasNext(InMemoryRelation.scala:119)
>  ~[em-spark-assembly.jar!/:?]
> at 
> org.apache.spark.sql.execution.columnar.CachedRDDBuilder$$anon$2.hasNext(InMemoryRelation.scala:288)
>  ~[em-spark-assembly.jar!/:?]
> at scala.collection.Iterator$$anon$6.hasNext(Iterator.scala:477) 
> ~[em-spark-assembly.jar!/:?]
> at scala.collection.Iterator$$anon$9.hasNext(Iterator.scala:583) 
> ~[em-spark-assembly.jar!/:?]
> at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificColumnarIterator.hasNext(Unknown
>  Source) ~[?:?]
> at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(Unknown
>  Source) ~[?:?]
> at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>  ~[em-spark-assembly.jar!/:?]
> at 
> org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:43)
>  ~[em-spark-assembly.jar!/:?]
> at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.smj_findNextJoinRows_0$(Unknown
>  Source) ~[?:?]
> at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.processNext(Unknown
>  Source) ~[?:?]
> at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>  ~[em-spark-assembly.jar!/:?]
> at 
> org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:43)
>  ~[em-spark-assembly.jar!/:?]
> at 
> org.apache.spark.sql.execution.columnar.DefaultCachedBatchSerializer$$anon$1.hasNext(InMemoryRelation.scala:119)
>  ~[em-spark-assembly.jar!/:?]
> at 
> org.apache.spark.sql.execution.columnar.CachedRDDBuilder$$anon$2.hasNext(InMemoryRelation.scala:288)
>  ~[em-spark-assembly.jar!/:?]
> at scala.collection.Iterator$$anon$6.hasNext(Iterator.scala:477) 
> ~[em-spark-assembly.jar!/:?]
> at scala.collection.Iterator$$anon$9.hasNext(Iterator.scala:583) 
> ~[em-spark-assembly.jar!/:?]
> at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificColumnarIterator.hasNext(Unknown
>  Source) ~[?:?]
> at scala.collection.Iterator$$anon$9.hasNext(Iterator.scala:583) 
> ~[em-spark-assembly.jar!/:?]
> at scala.collection.Iterator$$anon$9.hasNext(Iterator.scala:583) 
> ~[em-spark-assembly.jar!/:?]
> at 
> org.apache.spark.serializer.SerializationStream.writeAll(Serializer.scala:139)
>  ~[em-spark-assembly.jar!/:?]
> at 
> org.apache.spark.serializer.SerializerManager.dataSerializeStream(SerializerManager.scala:177)
>  ~[em-spark-assembly.jar!/:?]
> at 
> org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$6(BlockManager.scala:1635)
>  ~[em-spark-assembly.jar!/:?]
> at 
> org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$6$adapted(BlockManager.scala:1633)
>  ~[em-spark-assembly.jar!/:?]
> at org.apache.spark.storage.DiskStore.put(DiskStore.scala:88) 
> ~[em-spark-assembly.jar!/:?]
> at 
> org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1633)
>  ~[em-spark-assembly.jar!/:?]
> at 
> org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1524)
>  ~[em-spark-assembly.jar!/:?]
> at 
> org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1588) 
> ~[em-spark-assembly.jar!/:?]
> at 
> org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1389)
>  ~[em-spark-assembly.jar!/:?]
> at 
> org.apache.spark.storage.BlockManager.getOrElseUpdateRDDBlock(BlockManager.scala:1343)
>  ~[em-spark-assembly.jar!/:?]
> at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:379) 
> ~[em-spark-assembly.jar!/:?]
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:329) 
> ~[em-spark-assembly.jar!/:?]
> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) 
> ~[em-spark-assembly.jar!/:?]
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:367) 
> ~[em-spark-assembly.jar!/:?]
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:331) 
> ~[em-spark-assembly.jar!/:?]
> at 
> org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
>  ~[em-spark-assembly.jar!/:?]
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:104) 
> ~[em-spark-assembly.jar!/:?]
> at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:54) 
> ~[em-spark-assembly.jar!/:?]
> at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:166) 
> ~[em-spark-assembly.jar!/:?]
> at org.apache.spark.scheduler.Task.run(Task.scala:141) 
> ~[em-spark-assembly.jar!/:?]
> at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620)
>  ~[em-spark-assembly.jar!/:?]
> at 
> org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
>  ~[em-spark-assembly.jar!/:?]
> at 
> org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
>  ~[em-spark-assembly.jar!/:?]
> at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94) 
> ~[em-spark-assembly.jar!/:?]
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:623) 
> [em-spark-assembly.jar!/:?]
> at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
>  [?:?]
> at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
>  [?:?]
> at java.base/java.lang.Thread.run(Thread.java:840) [?:?]
> --------------------------------------------------------
> Need your help in understanding when this issue can occur.
> Please let me know if any further information will be needed from my 
> side.(Attaching full logs for the reference)
>
> PS; Spark is running in local mode.
>
> Regards,
> Ashwani
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to