HyukjinKwon commented on code in PR #56722:
URL: https://github.com/apache/spark/pull/56722#discussion_r3466449300


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/ManifestFileCommitProtocol.scala:
##########
@@ -78,7 +78,18 @@ class ManifestFileCommitProtocol(jobId: String, path: String)
     if (fileLog.add(batchId, fileStatuses)) {
       logInfo(log"Committed batch ${MDC(BATCH_ID, batchId)}")
     } else {
-      throw new IllegalStateException(s"Race while writing batch $batchId")
+      // Reaching here means `fileLog.add` found this batchId already 
committed to the sink
+      // metadata log at `path`. This is almost always two concurrent 
streaming queries writing
+      // to the same output path: they share a single `_spark_metadata` log 
and cannot coexist.
+      // Log the path + batchId at ERROR so a recurrence in scheduled jobs is 
diagnosable from the
+      // logs alone, without re-reproducing the race.
+      logError(log"Race while writing batch ${MDC(BATCH_ID, batchId)} to the 
file sink metadata " +
+        log"log at ${MDC(PATH, path)}: another writer already committed this 
batch. This usually " +
+        log"means multiple concurrent streaming queries are writing to the 
same output path.")
+      throw new IllegalStateException(
+        s"Race while writing batch $batchId to the file sink metadata log at 
'$path'. Another " +
+        "writer already committed this batch, which usually means multiple 
concurrent streaming " +
+        "queries are writing to the same output path (they share one 
_spark_metadata log).")

Review Comment:
   Good point — done in df7d9a3. Built the message once via the structured 
logging API and reused its rendered text (`errorMsg.message`) for the 
`IllegalStateException`, so the log and the exception can't drift. Thanks for 
the review!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to