[GitHub] [spark] vanzin commented on a change in pull request #24970: [SPARK-23977][SQL] Support High Performance S3A committers [test-hadoop3.2]

GitBox Thu, 01 Aug 2019 13:36:45 -0700

vanzin commented on a change in pull request #24970: [SPARK-23977][SQL] Support 
High Performance S3A committers [test-hadoop3.2]
URL: https://github.com/apache/spark/pull/24970#discussion_r309878239


 ##########
 File path: docs/cloud-integration.md
 ##########
 @@ -190,15 +212,50 @@ while they are still being written. Applications can 
write straight to the monit
 atomic `rename()` operation.
 Otherwise the checkpointing may be slow and potentially unreliable.
 
+## Committing work into cloud storage safely and fast.
+
+As covered earlier, commit-by-rename is dangerous on any object store which
+exhibits eventual consistency (example: S3), and often slower than classic
+filesystem renames.
+
+Some object store connectors provide custom committers to commit tasks and
+jobs without using rename. In versions of Spark built with Hadoop-3.1 or later,
 
 Review comment:
   "Hadoop 3.1"

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] vanzin commented on a change in pull request #24970: [SPARK-23977][SQL] Support High Performance S3A committers [test-hadoop3.2]

Reply via email to