anishshri-db commented on PR #42098:
URL: https://github.com/apache/spark/pull/42098#issuecomment-1644845829
> Looks like CI is failing. Could you please look into this? Feels like it
might be related.
yea so looks like it exposes another race condition.
Basically in this case, the maintenance task is calling `close` but without
any lock held. So it goes ahead and clears `db`. But the task for the partition
might still be executing. In this case, we fail with NPE when trying to access
`db` to get a system property.
```
org.apache.spark.sql.streaming.StreamingQueryException: [STREAM_FAILED]
Query [id = 2fb452cb-0813-4e7f-8212-1ed26d4b9488, runId =
18c3030d-4767-4d6e-b349-7fa74c6867d1] terminated with exception: Job aborted
due to stage failure: Task 0 in stage 295.0 failed 1 times, most recent
failure: Lost task 0.0 in stage 295.0 (TID 868) (localhost executor driver):
java.lang.NullPointerException
at
org.apache.spark.sql.execution.streaming.state.RocksDB.getDBProperty(RocksDB.scala:575)
at
org.apache.spark.sql.execution.streaming.state.RocksDB.metrics(RocksDB.scala:491)
at
org.apache.spark.sql.execution.streaming.state.RocksDB.$anonfun$commit$12(RocksDB.scala:395)
at org.apache.spark.internal.Logging.logInfo(Logging.scala:60)
at org.apache.spark.internal.Logging.logInfo$(Logging.scala:59)
```
Updated code to require `close` to acquire and release the DB instance lock
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]