ankurdave opened a new pull request, #43466:
URL: https://github.com/apache/spark/pull/43466

   ### What changes were proposed in this pull request?
   `CastSuiteBase` and `ExpressionInfoSuite` use `ParVector.foreach()` to run 
Spark SQL queries in parallel. They incorrectly assume that each parallel 
operation will inherit the main thread’s active SparkSession. This is only true 
when these parallel operations run in freshly-created threads. However, when 
other code has already run some parallel operations before Spark was started, 
then there may be existing threads that do not have an active SparkSession. In 
that case, these tests fail with NullPointerExceptions when creating SparkPlans 
or running SQL queries.
   
   The fix is to use the existing method `ThreadUtils.parmap()`. This method 
creates fresh threads that inherit the current active SparkSession, and it 
propagates the Spark ThreadLocals.
   
   This PR also adds a scalastyle warning against use of ParVector.
   
   
   ### Why are the changes needed?
   This change makes `CastSuiteBase` and `ExpressionInfoSuite` less brittle to 
future changes that may run parallel operations during test startup.
   
   
   ### Does this PR introduce _any_ user-facing change?
   No.
   
   
   ### How was this patch tested?
   Reproduced the test failures by running a ParVector operation before Spark 
starts. Verified that this PR fixes the test failures in this condition.
   
   ```scala
     protected override def beforeAll(): Unit = {
       // Run a ParVector operation before initializing the SparkSession. This 
starts some Scala
       // execution context threads that have no active SparkSession. These 
threads will be reused for
       // later ParVector operations, reproducing SPARK-45616.
       new ParVector((0 until 100).toVector).foreach { _ => }
   
       super.beforeAll()
     }
   ```
   
   ### Was this patch authored or co-authored using generative AI tooling?
   No.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to