eejbyfeldt commented on PR #37206:
URL: https://github.com/apache/spark/pull/37206#issuecomment-1494171167

   We are also seeing this failure on Spark 3.3.1 with Scala 2.13 on Ubuntu 
22.04.
   
   I used one of the spark applications seeing this issue to further debug when 
we are deserializing  `AccumulatorV2` which would making us suspicable to the 
race. And I found an example that does not seem to involve any client code. 
Here is my understanding:
   
   If the `Task` is a `ShuffleMapTask` it will deserialize a the rddAndDep as 
part of the `runTask`:
   
   
https://github.com/apache/spark/blob/v3.3.1/core/src/main/scala/org/apache/spark/scheduler/ShuffleMapTask.scala#L85
   
   which is called after the Task deserilization happens. This combined with a 
`ShuffleDependency` 
   
   
https://github.com/apache/spark/blob/v3.3.1/core/src/main/scala/org/apache/spark/Dependency.scala#L85
   
    with a `ShuffleWriteProcessor` that comes from 
    
    
https://github.com/apache/spark/blob/v3.3.1/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchangeExec.scala#L411-L422
    
    the our `ShuffleDependency` will contain a reference to the map `metrics: 
Map[String, SQLMetric]` which will cause the `SQLMetric` to be registered while 
the task is already running and we have our race condition. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to