GitHub user alunarbeach opened a pull request:
https://github.com/apache/spark/pull/16339
[SPARK-18917][SQL] Add Skip Partition Check Flag to avoid list all leaf
files in append mode
## What changes were proposed in this pull request?
Currently saving a dataframe in append mode lists all leaf files in save
directory. When the directory is in object stores object stores (S3 / Google
Storage) and has many subfolders due to partitioning, the writes are taking a
long time to write or they result in read time out.
This pull request introduces a skip flag that is false by default and can
be enabled by users to skip partition checking.
## How was this patch tested?
This patch was tested using manual tests.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/alunarbeach/spark spark-18917
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/16339.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #16339
----
commit 885b721003923f701643ca7d93ef9edddc8a9961
Author: Anbu Cheeralan <[email protected]>
Date: 2016-12-19T18:12:11Z
add skip flag to skip partition check
commit 43e599eac828fae630d0ac0acd00255ac6c77ae4
Author: Anbu Cheeralan <[email protected]>
Date: 2016-12-19T18:23:03Z
change description
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]