Yicong-Huang opened a new pull request, #54291:
URL: https://github.com/apache/spark/pull/54291
### What changes were proposed in this pull request?
This PR fixes a flaky `NumberFormatException` in
`SQLExecution.withNewExecutionId0` by avoiding a redundant re-read of
`EXECUTION_ROOT_ID_KEY` from local properties.
The current code checks if `EXECUTION_ROOT_ID_KEY` is null, sets it if so,
then re-reads it assuming it is non-null:
```scala
if (sc.getLocalProperty(EXECUTION_ROOT_ID_KEY) == null) {
sc.setLocalProperty(EXECUTION_ROOT_ID_KEY, executionId.toString)
sc.addJobTag(executionIdJobTag(sparkSession, executionId))
}
val rootExecutionId = sc.getLocalProperty(EXECUTION_ROOT_ID_KEY).toLong //
crashes here
```
The fix reads the property once and uses the value directly:
```scala
val existingRootId = sc.getLocalProperty(EXECUTION_ROOT_ID_KEY)
val rootExecutionId = if (existingRootId != null) {
existingRootId.toLong
} else {
sc.setLocalProperty(EXECUTION_ROOT_ID_KEY, executionId.toString)
sc.addJobTag(executionIdJobTag(sparkSession, executionId))
executionId
}
```
### Why are the changes needed?
Under high-concurrency scenarios involving nested thread pools (e.g.,
`CrossValidator(parallelism=4)` with `OneVsRest(parallelism=2)` running from a
Python `ThreadPoolExecutor`), the re-read of `EXECUTION_ROOT_ID_KEY` at line
115 can return null, causing `NumberFormatException: Cannot parse null string`.
CI failure:
https://github.com/Yicong-Huang/spark/actions/runs/21961599500/attempts/1
(Attempt 1 failed, attempt 2 passed. Failed test:
`test_save_load_pipeline_estimator` in `CrossValidatorIOPipelineTests`)
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Existing tests. The flaky test `test_save_load_pipeline_estimator` in
`CrossValidatorIOPipelineTests` exercises the concurrent scenario that triggers
this bug.
### Was this patch authored or co-authored using generative AI tooling?
No
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]