eejbyfeldt commented on PR #37206: URL: https://github.com/apache/spark/pull/37206#issuecomment-1494171167
We are also seeing this failure on Spark 3.3.1 with Scala 2.13 on Ubuntu 22.04. I used one of the spark applications seeing this issue to further debug when we are deserializing `AccumulatorV2` which would making us suspicable to the race. And I found an example that does not seem to involve any client code. Here is my understanding: If the `Task` is a `ShuffleMapTask` it will deserialize a the rddAndDep as part of the `runTask`: https://github.com/apache/spark/blob/v3.3.1/core/src/main/scala/org/apache/spark/scheduler/ShuffleMapTask.scala#L85 which is called after the Task deserilization happens. This combined with a `ShuffleDependency` https://github.com/apache/spark/blob/v3.3.1/core/src/main/scala/org/apache/spark/Dependency.scala#L85 with a `ShuffleWriteProcessor` that comes from https://github.com/apache/spark/blob/v3.3.1/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchangeExec.scala#L411-L422 the our `ShuffleDependency` will contain a reference to the map `metrics: Map[String, SQLMetric]` which will cause the `SQLMetric` to be registered while the task is already running and we have our race condition. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
