dongjoon-hyun commented on code in PR #56722:
URL: https://github.com/apache/spark/pull/56722#discussion_r3464812510
##########
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/ManifestFileCommitProtocol.scala:
##########
@@ -78,7 +78,18 @@ class ManifestFileCommitProtocol(jobId: String, path: String)
if (fileLog.add(batchId, fileStatuses)) {
logInfo(log"Committed batch ${MDC(BATCH_ID, batchId)}")
} else {
- throw new IllegalStateException(s"Race while writing batch $batchId")
+ // Reaching here means `fileLog.add` found this batchId already
committed to the sink
+ // metadata log at `path`. This is almost always two concurrent
streaming queries writing
+ // to the same output path: they share a single `_spark_metadata` log
and cannot coexist.
+ // Log the path + batchId at ERROR so a recurrence in scheduled jobs is
diagnosable from the
+ // logs alone, without re-reproducing the race.
+ logError(log"Race while writing batch ${MDC(BATCH_ID, batchId)} to the
file sink metadata " +
+ log"log at ${MDC(PATH, path)}: another writer already committed this
batch. This usually " +
+ log"means multiple concurrent streaming queries are writing to the
same output path.")
+ throw new IllegalStateException(
+ s"Race while writing batch $batchId to the file sink metadata log at
'$path'. Another " +
+ "writer already committed this batch, which usually means multiple
concurrent streaming " +
+ "queries are writing to the same output path (they share one
_spark_metadata log).")
Review Comment:
This seems to be almost identical information. Can we share the same error
message for both `logError` and `IllegalStateException`?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]