cloud-fan commented on issue #25795: [SPARK-29037][Core] Spark gives duplicate result when an application was killed URL: https://github.com/apache/spark/pull/25795#issuecomment-532169891 thanks @steveloughran for the detailed explanation! Seems it's very complicated to make Spark support concurrent and object-store friendly files writing. We should ask users to try object-store-first Data structures such as Delta and Iceberg. I'd like to narrow down the scope. Spark should at least guarantee data correctness. Duplicated data should not happen. @turboFei @advancedxy is it possible to detect concurrent writes and fail the query?
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
