shrirangmhalgi commented on code in PR #56332:
URL: https://github.com/apache/spark/pull/56332#discussion_r3364758245


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala:
##########
@@ -149,7 +149,10 @@ private[sql] class HDFSBackedStateStoreProvider extends 
StateStoreProvider with
     private val newVersion = version + 1
     @volatile private var state: STATE = UPDATING
     private val finalDeltaFile: Path = deltaFile(newVersion)
-    private lazy val deltaFileStream = fm.createAtomic(finalDeltaFile, 
overwriteIfPossible = true)
+    private lazy val deltaFileStream = {
+      createBaseDirIfNotExists()

Review Comment:
   Nit: `baseDirChecked` is `@volatile` but not atomically guarded - two 
concurrent tasks on the same partition can both enter the `if 
(!baseDirChecked)` block and issue parallel `mkdirs`. On HDFS this is 
idempotent, but on some object stores a racing `mkdirs` against an 
already-created marker can fail. 
   
   Maybe consider `AtomicBoolean.compareAndSet(false, true)` here.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to