[GitHub] [spark] cloud-fan commented on issue #25795: [SPARK-29037][Core] Spark gives duplicate result when an application was killed

GitBox Tue, 17 Sep 2019 03:56:15 -0700

cloud-fan commented on issue #25795: [SPARK-29037][Core] Spark gives duplicate 
result when an application was killed
URL: https://github.com/apache/spark/pull/25795#issuecomment-532169891
 
 
   thanks @steveloughran for the detailed explanation!
   
   Seems it's very complicated to make Spark support concurrent and 
object-store friendly files writing. We should ask users to try 
object-store-first Data structures such as Delta and Iceberg.
   
   I'd like to narrow down the scope. Spark should at least guarantee data 
correctness. Duplicated data should not happen. @turboFei @advancedxy is it 
possible to detect concurrent writes and fail the query?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] cloud-fan commented on issue #25795: [SPARK-29037][Core] Spark gives duplicate result when an application was killed

Reply via email to