shrirangmhalgi commented on code in PR #56332:
URL: https://github.com/apache/spark/pull/56332#discussion_r3364758245
##########
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala:
##########
@@ -149,7 +149,10 @@ private[sql] class HDFSBackedStateStoreProvider extends
StateStoreProvider with
private val newVersion = version + 1
@volatile private var state: STATE = UPDATING
private val finalDeltaFile: Path = deltaFile(newVersion)
- private lazy val deltaFileStream = fm.createAtomic(finalDeltaFile,
overwriteIfPossible = true)
+ private lazy val deltaFileStream = {
+ createBaseDirIfNotExists()
Review Comment:
Nit: `baseDirChecked` is `@volatile` but not atomically guarded - two
concurrent tasks on the same partition can both enter the `if
(!baseDirChecked)` block and issue parallel `mkdirs`. On HDFS this is
idempotent, but on some object stores a racing `mkdirs` against an
already-created marker can fail.
Maybe consider `AtomicBoolean.compareAndSet(false, true)` here.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]