HeartSaVioR opened a new pull request #25407: [SPARK-28650][SS][DOC] Correct 
explanation of guarantee for ForeachWriter
URL: https://github.com/apache/spark/pull/25407
 
 
   #  What changes were proposed in this pull request?
   
   This patch modifies the explanation of guarantee for ForeachWriter as it 
doesn't guarantee same output for `(partitionId, epochId)`. Refer the 
description of [SPARK-28650](https://issues.apache.org/jira/browse/SPARK-28650) 
for more details.
   
   Spark itself still guarantees same output for same epochId (batch) if the 
precondition is met, 1) source is always providing the same input records for 
same offset request. 2) the query is idempotent in overall (indeterministic 
calculation like now(), random() can break this). 
   
   Assuming this as exceptional case, we still can describe the guarantee with 
`epochId`, though it will be  harder to leverage the guarantee: 1) 
ForeachWriter should implement a feature to track whether all the partitions 
are written successfully for given `epochId` 2) There's pretty less chance to 
leverage the fact, as the chance for Spark to successfully write all partitions 
and fail to checkpoint the batch is small.
   
   Credit to @zsxwing on discovering the broken guarantee.
   
   ## How was this patch tested?
   
   This is just a change on javadoc.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to