[GitHub] [spark] anishshri-db commented on pull request #42098: [SPARK-44504][SS] Unload provider thereby forcing DB instance close and releasing resources on maintenance task error

via GitHub Thu, 20 Jul 2023 17:55:53 -0700


anishshri-db commented on PR #42098:
URL: https://github.com/apache/spark/pull/42098#issuecomment-1644845829


   > Looks like CI is failing. Could you please look into this? Feels like it 
might be related.
   
   yea so looks like it exposes another race condition.
   
   Basically in this case, the maintenance task is calling `close` but without 
any lock held. So it goes ahead and clears `db`. But the task for the partition 
might still be executing. In this case, we fail with NPE when trying to access 
`db` to get a system property.
   
   ```
   org.apache.spark.sql.streaming.StreamingQueryException: [STREAM_FAILED] 
Query [id = 2fb452cb-0813-4e7f-8212-1ed26d4b9488, runId = 
18c3030d-4767-4d6e-b349-7fa74c6867d1] terminated with exception: Job aborted 
due to stage failure: Task 0 in stage 295.0 failed 1 times, most recent 
failure: Lost task 0.0 in stage 295.0 (TID 868) (localhost executor driver): 
java.lang.NullPointerException
        at 
org.apache.spark.sql.execution.streaming.state.RocksDB.getDBProperty(RocksDB.scala:575)
        at 
org.apache.spark.sql.execution.streaming.state.RocksDB.metrics(RocksDB.scala:491)
        at 
org.apache.spark.sql.execution.streaming.state.RocksDB.$anonfun$commit$12(RocksDB.scala:395)
        at org.apache.spark.internal.Logging.logInfo(Logging.scala:60)
        at org.apache.spark.internal.Logging.logInfo$(Logging.scala:59)
   ```
   
   Updated code to require `close` to acquire and release the DB instance lock


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] anishshri-db commented on pull request #42098: [SPARK-44504][SS] Unload provider thereby forcing DB instance close and releasing resources on maintenance task error

Reply via email to