dongjoon-hyun opened a new pull request #29118:
URL: https://github.com/apache/spark/pull/29118


   ### What changes were proposed in this pull request?
   
   This PR aims to add a test case to EliminateSortsSuite to protect a valid 
use case which is using ORDER BY in DISTRIBUTE BY statement.
   
   ### Why are the changes needed?
   
   ```
   scala> scala.util.Random.shuffle((1 to 100000).map(x => (x % 2, 
x))).toDF("a", "b").repartition(2).createOrReplaceTempView("t")
   
   scala> sql("select * from (select * from t order by b) distribute by 
a").write.orc("/tmp/master")
   
   $ ls -al /tmp/master/
   total 56
   drwxr-xr-x  10 dongjoon  wheel  320 Jul 14 22:12 ./
   drwxrwxrwt  15 root      wheel  480 Jul 14 22:12 ../
   -rw-r--r--   1 dongjoon  wheel    8 Jul 14 22:12 ._SUCCESS.crc
   -rw-r--r--   1 dongjoon  wheel   12 Jul 14 22:12 
.part-00000-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc.crc
   -rw-r--r--   1 dongjoon  wheel   16 Jul 14 22:12 
.part-00043-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc.crc
   -rw-r--r--   1 dongjoon  wheel   16 Jul 14 22:12 
.part-00191-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc.crc
   -rw-r--r--   1 dongjoon  wheel    0 Jul 14 22:12 _SUCCESS
   -rw-r--r--   1 dongjoon  wheel  119 Jul 14 22:12 
part-00000-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc
   -rw-r--r--   1 dongjoon  wheel  932 Jul 14 22:12 
part-00043-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc
   -rw-r--r--   1 dongjoon  wheel  939 Jul 14 22:12 
part-00191-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc
   ```
   
   If we remove the inner `ORDER BY`, the file size increases.
   ```
   scala> scala.util.Random.shuffle((1 to 100000).map(x => (x % 2, 
x))).toDF("a", "b").repartition(2).createOrReplaceTempView("t")
   
   scala> sql("select * from (select * from t order by b) distribute by 
a").write.orc("/tmp/SPARK-32276")
   
   $ ls -al /tmp/SPARK-32276/
   total 632
   drwxr-xr-x  10 dongjoon  wheel     320 Jul 14 22:08 ./
   drwxrwxrwt  14 root      wheel     448 Jul 14 22:08 ../
   -rw-r--r--   1 dongjoon  wheel       8 Jul 14 22:08 ._SUCCESS.crc
   -rw-r--r--   1 dongjoon  wheel      12 Jul 14 22:08 
.part-00000-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc.crc
   -rw-r--r--   1 dongjoon  wheel    1188 Jul 14 22:08 
.part-00043-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc.crc
   -rw-r--r--   1 dongjoon  wheel    1188 Jul 14 22:08 
.part-00191-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc.crc
   -rw-r--r--   1 dongjoon  wheel       0 Jul 14 22:08 _SUCCESS
   -rw-r--r--   1 dongjoon  wheel     119 Jul 14 22:08 
part-00000-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc
   -rw-r--r--   1 dongjoon  wheel  150735 Jul 14 22:08 
part-00043-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc
   -rw-r--r--   1 dongjoon  wheel  150741 Jul 14 22:08 
part-00191-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc
   ```
   
   ### Does this PR introduce _any_ user-facing change?
   
   No. This only improves the test coverage.
   
   ### How was this patch tested?
   
   Pass the GitHub Action or Jenkins.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to