ankurdave opened a new pull request, #43466:
URL: https://github.com/apache/spark/pull/43466
### What changes were proposed in this pull request?
`CastSuiteBase` and `ExpressionInfoSuite` use `ParVector.foreach()` to run
Spark SQL queries in parallel. They incorrectly assume that each parallel
operation will inherit the main thread’s active SparkSession. This is only true
when these parallel operations run in freshly-created threads. However, when
other code has already run some parallel operations before Spark was started,
then there may be existing threads that do not have an active SparkSession. In
that case, these tests fail with NullPointerExceptions when creating SparkPlans
or running SQL queries.
The fix is to use the existing method `ThreadUtils.parmap()`. This method
creates fresh threads that inherit the current active SparkSession, and it
propagates the Spark ThreadLocals.
This PR also adds a scalastyle warning against use of ParVector.
### Why are the changes needed?
This change makes `CastSuiteBase` and `ExpressionInfoSuite` less brittle to
future changes that may run parallel operations during test startup.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Reproduced the test failures by running a ParVector operation before Spark
starts. Verified that this PR fixes the test failures in this condition.
```scala
protected override def beforeAll(): Unit = {
// Run a ParVector operation before initializing the SparkSession. This
starts some Scala
// execution context threads that have no active SparkSession. These
threads will be reused for
// later ParVector operations, reproducing SPARK-45616.
new ParVector((0 until 100).toVector).foreach { _ => }
super.beforeAll()
}
```
### Was this patch authored or co-authored using generative AI tooling?
No.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]