[GitHub] [spark] c21 commented on a change in pull request #33711: Fix bug in disable unnecessary bucketed scan

GitBox Wed, 11 Aug 2021 12:26:12 -0700


c21 commented on a change in pull request #33711:
URL: https://github.com/apache/spark/pull/33711#discussion_r687111865




##########
File path: 
sql/core/src/test/scala/org/apache/spark/sql/sources/DisableUnnecessaryBucketedScanSuite.scala
##########
@@ -258,4 +258,30 @@ abstract class DisableUnnecessaryBucketedScanSuite
       }
     }
   }
+
+  test("Aggregates with no groupby over tables having 1 BUCKET, return 
multiple rows") {
+    withTable("t1") {
+      withSQLConf(SQLConf.AUTO_BUCKETED_SCAN_ENABLED.key -> "true") {
+        spark.sql(
+          """
+            | CREATE TABLE t1 (
+            |     `id` BIGINT,
+            |     `event_date` DATE)
+            |     USING PARQUET
+            |     CLUSTERED BY (id)
+            |     INTO 1 BUCKETS
+            |""".stripMargin)
+        spark.sql(
+          """
+            |INSERT INTO TABLE t1 VALUES(1.23, cast("2021-07-07" as date))
+            |""".stripMargin)
+        spark.sql(
+          """
+            |INSERT INTO TABLE t1 VALUES(2.28, cast("2021-08-08" as date))
+            |""".stripMargin)
+        val result = spark.sql("select sum(id) from t1").count()
+        assert(result == 1)

Review comment:
       how about checking the result itself as well?

##########
File path: 
sql/core/src/test/scala/org/apache/spark/sql/sources/DisableUnnecessaryBucketedScanSuite.scala
##########
@@ -258,4 +258,30 @@ abstract class DisableUnnecessaryBucketedScanSuite
       }
     }
   }
+
+  test("Aggregates with no groupby over tables having 1 BUCKET, return 
multiple rows") {
+    withTable("t1") {
+      withSQLConf(SQLConf.AUTO_BUCKETED_SCAN_ENABLED.key -> "true") {
+        spark.sql(
+          """
+            | CREATE TABLE t1 (

Review comment:
       nit: indentation seems off.

##########
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/bucketing/DisableUnnecessaryBucketedScan.scala
##########
@@ -120,7 +120,7 @@ object DisableUnnecessaryBucketedScan extends 
Rule[SparkPlan] {
 
   private def hasInterestingPartition(plan: SparkPlan): Boolean = {
     plan.requiredChildDistribution.exists {
-      case _: ClusteredDistribution | _: HashClusteredDistribution => true
+      case _: ClusteredDistribution | _: HashClusteredDistribution | AllTuples 
=> true

Review comment:
       `AllTuples` is only interesting when the underlying file scan operator 
has only 1 bucket. So I think we need have a way to check the scan operator to 
have only 1 bucket as well.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] c21 commented on a change in pull request #33711: Fix bug in disable unnecessary bucketed scan

Reply via email to