[GitHub] [spark] maropu commented on a change in pull request #29804: [SPARK-32859][SQL] Introduce physical rule to decide bucketing dynamically

GitBox Wed, 23 Sep 2020 18:45:46 -0700


maropu commented on a change in pull request #29804:
URL: https://github.com/apache/spark/pull/29804#discussion_r493991276




##########
File path: 
sql/core/src/test/scala/org/apache/spark/sql/sources/BucketedReadSuite.scala
##########
@@ -1012,4 +1014,43 @@ abstract class BucketedReadSuite extends QueryTest with 
SQLTestUtils {
       }
     }
   }
+
+  test("SPARK-32859: disable unnecessary bucketed table scan based on query 
plan") {
+    withTable("t1", "t2") {
+      df1.write.format("parquet").bucketBy(8, "i").saveAsTable("t1")
+      df2.write.format("parquet").bucketBy(4, "i").saveAsTable("t2")
+
+      def checkNumBucketedScan(query: String, expectedNumBucketedScan: Int): 
Unit = {
+        val plan = sql(query).queryExecution.executedPlan
+        val bucketedScan = plan.collect { case s: FileSourceScanExec if 
s.bucketedScan => s }
+        assert(bucketedScan.length == expectedNumBucketedScan)
+      }
+
+      Seq(
+        ("SELECT * FROM t1 JOIN t2 ON t1.i = t2.i", 1, 2),
+        ("SELECT * FROM t1 JOIN t2 ON t1.i = t2.j", 1, 2),
+        ("SELECT * FROM t1 JOIN t2 ON t1.j = t2.j", 0, 2),
+        ("SELECT SUM(i) FROM t1 GROUP BY i", 1, 1),
+        ("SELECT SUM(i) FROM t1 GROUP BY j", 0, 1),
+        ("SELECT * FROM t1 WHERE i = 1", 1, 1),
+        ("SELECT * FROM t1 WHERE j = 1", 0, 1),

Review comment:
       I left two comments about the test;
    - Could you add more test cases, e.g., multiple join cases, multiple bucket 
column cases, ...?
    - Could you split this single test unit into multiple ones having 
meaningful test titles?, e.g., `test("SPARK-32859: disable unnecessary bucketed 
table scan based on query plan - multiple join test")`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] maropu commented on a change in pull request #29804: [SPARK-32859][SQL] Introduce physical rule to decide bucketing dynamically

Reply via email to