[GitHub] [spark] steveloughran commented on a change in pull request #30714: [SPARK-33739] [SQL] Jobs committed through the S3A Magic committer don't track bytes

GitBox Wed, 17 Feb 2021 03:32:46 -0800


steveloughran commented on a change in pull request #30714:
URL: https://github.com/apache/spark/pull/30714#discussion_r577537838




##########
File path: docs/cloud-integration.md
##########
@@ -176,12 +181,19 @@ different stores and connectors when renaming directories:
 | Amazon S3     | s3a       | Unsafe                  | O(data) |
 | Azure Storage | wasb      | Safe                    | O(files) |
 | Azure Datalake Gen 2 | abfs | Safe                  | O(1) |
-| Google Cloud Storage | gs        | Safe                    | O(1) |
+| Google Cloud Storage | gs        | Mixed                    | O(files) |
 
 1. As storing temporary files can run up charges; delete
 directories called `"_temporary"` on a regular basis.
 1. For AWS S3, set a limit on how long multipart uploads can remain 
outstanding.
 This avoids incurring bills from incompleted uploads.
+1. For Google cloud, directory rename is file-by-file. Consider using the v2 
committer
+and only write code which generates idemportent output -including filenames,
+as it is *no more unsafe* than the v1 committer, and faster.
+
+```
+spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version 2
+```

Review comment:
       done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] steveloughran commented on a change in pull request #30714: [SPARK-33739] [SQL] Jobs committed through the S3A Magic committer don't track bytes

Reply via email to