[GitHub] [spark] dongjoon-hyun edited a comment on pull request #29118: [SPARK-32318][SQL][TESTS] Add a test case to EliminateSortsSuite for ORDER BY in DISTRIBUTE BY

2020-07-15 Thread GitBox


dongjoon-hyun edited a comment on pull request #29118:
URL: https://github.com/apache/spark/pull/29118#issuecomment-658810297


   Thanks, @dilipbiswal . You're also right. Initially, it was intentional. To 
be more clear, I will add a comment to refer SPARK-32318. Technically, if the 
user writes a code `sql("select * from t distribute by 
a").write.orc("/tmp/SPARK-32276")` without inner order by, Spark should 
generate a large file.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun edited a comment on pull request #29118: [SPARK-32318][SQL][TESTS] Add a test case to EliminateSortsSuite for ORDER BY in DISTRIBUTE BY

2020-07-14 Thread GitBox


dongjoon-hyun edited a comment on pull request #29118:
URL: https://github.com/apache/spark/pull/29118#issuecomment-658565525


   Actually, the file size check test cases are very ~flaky~ fragile. We hit 
many issues before when we added `Spark Version` metadata on Parquet/ORC/Avro.
   > Do you think it is easy to add a test that checks file size like in the 
description? Or current one is enough?
   
   I believe this one is enough because file generations cost us 
write/read/full execution time in Jenkins and GitHub~



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun edited a comment on pull request #29118: [SPARK-32318][SQL][TESTS] Add a test case to EliminateSortsSuite for ORDER BY in DISTRIBUTE BY

2020-07-14 Thread GitBox


dongjoon-hyun edited a comment on pull request #29118:
URL: https://github.com/apache/spark/pull/29118#issuecomment-658565525







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun edited a comment on pull request #29118: [SPARK-32318][SQL][TESTS] Add a test case to EliminateSortsSuite for ORDER BY in DISTRIBUTE BY

2020-07-14 Thread GitBox


dongjoon-hyun edited a comment on pull request #29118:
URL: https://github.com/apache/spark/pull/29118#issuecomment-658560629


   Also, cc @cloud-fan , @HyukjinKwon , @maropu 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun edited a comment on pull request #29118: [SPARK-32318][SQL][TESTS] Add a test case to EliminateSortsSuite for ORDER BY in DISTRIBUTE BY

2020-07-14 Thread GitBox


dongjoon-hyun edited a comment on pull request #29118:
URL: https://github.com/apache/spark/pull/29118#issuecomment-658560339


   Could you review this, @viirya ? This will protect us from the future 
regression. This part is tricky.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org