HyukjinKwon opened a new pull request #25551: [SPARK-28839][CORE] Avoids NPE in 
context cleaner when shuffle service is on
URL: https://github.com/apache/spark/pull/25551
 
 
   ### What changes were proposed in this pull request?
   
   This PR proposes to avoid to thrown NPE at context cleaner when shuffle 
service is on - it is kind of a small followup in 
https://github.com/apache/spark/pull/24817
   
   Seems like it sets `null` for `shuffleIds` to track when the service is on. 
Later, `removeShuffle` tries to remove an element at `shuffleIds` which leads 
to NPE.
   
   See the code path below:
   
   
https://github.com/apache/spark/blob/cbad616d4cb0c58993a88df14b5e30778c7f7e85/core/src/main/scala/org/apache/spark/SparkContext.scala#L584
   
   
https://github.com/apache/spark/blob/cbad616d4cb0c58993a88df14b5e30778c7f7e85/core/src/main/scala/org/apache/spark/ContextCleaner.scala#L125
   
   
https://github.com/apache/spark/blob/cbad616d4cb0c58993a88df14b5e30778c7f7e85/core/src/main/scala/org/apache/spark/ContextCleaner.scala#L190
   
   
https://github.com/apache/spark/blob/cbad616d4cb0c58993a88df14b5e30778c7f7e85/core/src/main/scala/org/apache/spark/ContextCleaner.scala#L220-L230
   
   
https://github.com/apache/spark/blob/cbad616d4cb0c58993a88df14b5e30778c7f7e85/core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala#L353-L357
   
   
https://github.com/apache/spark/blob/cbad616d4cb0c58993a88df14b5e30778c7f7e85/core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala#L347
   
   
https://github.com/apache/spark/blob/cbad616d4cb0c58993a88df14b5e30778c7f7e85/core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala#L400-L406
   
   
https://github.com/apache/spark/blob/cbad616d4cb0c58993a88df14b5e30778c7f7e85/core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala#L475
   
   
https://github.com/apache/spark/blob/cbad616d4cb0c58993a88df14b5e30778c7f7e85/core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala#L427
   
   ### Why are the changes needed?
   
   This is a bug fix.
   
   
   ### Does this PR introduce any user-facing change?
   
   It prevents the exception:
   
   ```
   19/08/21 06:44:01 ERROR AsyncEventQueue: Listener ExecutorMonitor threw an 
exception
   java.lang.NullPointerException
        at 
org.apache.spark.scheduler.dynalloc.ExecutorMonitor$Tracker.removeShuffle(ExecutorMonitor.scala:479)
        at 
org.apache.spark.scheduler.dynalloc.ExecutorMonitor.$anonfun$cleanupShuffle$2(ExecutorMonitor.scala:408)
        at 
org.apache.spark.scheduler.dynalloc.ExecutorMonitor.$anonfun$cleanupShuffle$2$adapted(ExecutorMonitor.scala:407)
        at scala.collection.Iterator.foreach(Iterator.scala:941)
        at scala.collection.Iterator.foreach$(Iterator.scala:941)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
        at scala.collection.IterableLike.foreach(IterableLike.scala:74)
        at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
        at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
        at 
org.apache.spark.scheduler.dynalloc.ExecutorMonitor.cleanupShuffle(ExecutorMonitor.scala:407)
        at 
org.apache.spark.scheduler.dynalloc.ExecutorMonitor.onOtherEvent(ExecutorMonitor.sc
   ```
   
   
   ### How was this patch test
   
   Unittest was added.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to