c21 commented on a change in pull request #33711:
URL: https://github.com/apache/spark/pull/33711#discussion_r687111865
##########
File path:
sql/core/src/test/scala/org/apache/spark/sql/sources/DisableUnnecessaryBucketedScanSuite.scala
##########
@@ -258,4 +258,30 @@ abstract class DisableUnnecessaryBucketedScanSuite
}
}
}
+
+ test("Aggregates with no groupby over tables having 1 BUCKET, return
multiple rows") {
+ withTable("t1") {
+ withSQLConf(SQLConf.AUTO_BUCKETED_SCAN_ENABLED.key -> "true") {
+ spark.sql(
+ """
+ | CREATE TABLE t1 (
+ | `id` BIGINT,
+ | `event_date` DATE)
+ | USING PARQUET
+ | CLUSTERED BY (id)
+ | INTO 1 BUCKETS
+ |""".stripMargin)
+ spark.sql(
+ """
+ |INSERT INTO TABLE t1 VALUES(1.23, cast("2021-07-07" as date))
+ |""".stripMargin)
+ spark.sql(
+ """
+ |INSERT INTO TABLE t1 VALUES(2.28, cast("2021-08-08" as date))
+ |""".stripMargin)
+ val result = spark.sql("select sum(id) from t1").count()
+ assert(result == 1)
Review comment:
how about checking the result itself as well?
##########
File path:
sql/core/src/test/scala/org/apache/spark/sql/sources/DisableUnnecessaryBucketedScanSuite.scala
##########
@@ -258,4 +258,30 @@ abstract class DisableUnnecessaryBucketedScanSuite
}
}
}
+
+ test("Aggregates with no groupby over tables having 1 BUCKET, return
multiple rows") {
+ withTable("t1") {
+ withSQLConf(SQLConf.AUTO_BUCKETED_SCAN_ENABLED.key -> "true") {
+ spark.sql(
+ """
+ | CREATE TABLE t1 (
Review comment:
nit: indentation seems off.
##########
File path:
sql/core/src/main/scala/org/apache/spark/sql/execution/bucketing/DisableUnnecessaryBucketedScan.scala
##########
@@ -120,7 +120,7 @@ object DisableUnnecessaryBucketedScan extends
Rule[SparkPlan] {
private def hasInterestingPartition(plan: SparkPlan): Boolean = {
plan.requiredChildDistribution.exists {
- case _: ClusteredDistribution | _: HashClusteredDistribution => true
+ case _: ClusteredDistribution | _: HashClusteredDistribution | AllTuples
=> true
Review comment:
`AllTuples` is only interesting when the underlying file scan operator
has only 1 bucket. So I think we need have a way to check the scan operator to
have only 1 bucket as well.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]