dongjoon-hyun edited a comment on pull request #29089:
URL: https://github.com/apache/spark/pull/29089#issuecomment-661344352
@hvanhovell . Thank you for your feedback. The following looks a little
wrong to me because the above optimization was one of the recommendations for
many
dongjoon-hyun edited a comment on pull request #29089:
URL: https://github.com/apache/spark/pull/29089#issuecomment-661344352
@hvanhovell . The following is complete wrong because the above optimization
was one of the recommendations for many Hortonworks customers to save their
HDFS
dongjoon-hyun edited a comment on pull request #29089:
URL: https://github.com/apache/spark/pull/29089#issuecomment-661344352
@hvanhovell . The following is complete wrong because the above optimization
was one of the recommendations for many Hortonworks customers to save their
HDFS
dongjoon-hyun edited a comment on pull request #29089:
URL: https://github.com/apache/spark/pull/29089#issuecomment-661344352
@hvanhovell . The following is complete wrong because the above optimization
was one of the recommendations for many Hortonworks customers to save their
HDFS
dongjoon-hyun edited a comment on pull request #29089:
URL: https://github.com/apache/spark/pull/29089#issuecomment-661344352
@hvanhovell . The following is complete wrong because the above optimization
was one of the recommendations for many Hortonworks customers to save their
HDFS
dongjoon-hyun edited a comment on pull request #29089:
URL: https://github.com/apache/spark/pull/29089#issuecomment-661344352
@hvanhovell . The following is complete wrong because the above optimization
was one of the recommendations for many Hortonworks customers to save their
HDFS
dongjoon-hyun edited a comment on pull request #29089:
URL: https://github.com/apache/spark/pull/29089#issuecomment-661344352
@hvanhovell . The following is complete wrong because the above optimization
was one of the recommendations for many Hortonworks customers to save their
HDFS
dongjoon-hyun edited a comment on pull request #29089:
URL: https://github.com/apache/spark/pull/29089#issuecomment-658831236
BTW, @aokolnychyi . I merged the corner case test case PR,
https://github.com/apache/spark/pull/29118. Could you rebase this PR to the
master? Then, we can discuss
dongjoon-hyun edited a comment on pull request #29089:
URL: https://github.com/apache/spark/pull/29089#issuecomment-658828708
@hvanhovell . I agree with you for the followings.
> AFAIK nested ordering can be ignored from a relation algebra point of
view.
> Regarding the shuffles.
dongjoon-hyun edited a comment on pull request #29089:
URL: https://github.com/apache/spark/pull/29089#issuecomment-658559706
The most big factor is file formats instead of Spark side.
For example, in the above example, ORC files are small because it supports a
special encoding when the
dongjoon-hyun edited a comment on pull request #29089:
URL: https://github.com/apache/spark/pull/29089#issuecomment-658559706
No~ It depends on file formats instead of Spark side.
For example, in the above example, ORC files are small because it supports a
special encoding when the
dongjoon-hyun edited a comment on pull request #29089:
URL: https://github.com/apache/spark/pull/29089#issuecomment-658544475
To generate small final Parquet/ORC files, we do the above tricks, don't we?
This may cause a regression on the size of output storage.
dongjoon-hyun edited a comment on pull request #29089:
URL: https://github.com/apache/spark/pull/29089#issuecomment-658544475
To generate small final Parquet/ORC files, we do the above tricks, don't we?
This PR may cause a regression on the size of output storage.
dongjoon-hyun edited a comment on pull request #29089:
URL: https://github.com/apache/spark/pull/29089#issuecomment-658544475
To generate small final Parquet/ORC files, we do the above tricks, don't we?
This is an automated
14 matches
Mail list logo