[GitHub] [spark] maropu commented on a change in pull request #29804: [SPARK-32859][SQL] Introduce physical rule to decide bucketing dynamically

GitBox Wed, 23 Sep 2020 18:18:46 -0700


maropu commented on a change in pull request #29804:
URL: https://github.com/apache/spark/pull/29804#discussion_r493984368




##########
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala
##########
@@ -348,20 +352,22 @@ case class FileSourceScanExec(
         "DataFilters" -> seqToString(dataFilters),
         "Location" -> locationDesc)
 
-    val withSelectedBucketsCount = relation.bucketSpec.map { spec =>
-      val numSelectedBuckets = optionalBucketSet.map { b =>
-        b.cardinality()
-      } getOrElse {
-        spec.numBuckets
+    if (bucketedScan) {
+      relation.bucketSpec.map { spec =>
+        val numSelectedBuckets = optionalBucketSet.map { b =>
+          b.cardinality()
+        } getOrElse {
+          spec.numBuckets
+        }
+        metadata += ("SelectedBucketsCount" ->
+          (s"$numSelectedBuckets out of ${spec.numBuckets}" +
+            optionalNumCoalescedBuckets.map { b => s" (Coalesced to 
$b)"}.getOrElse("")))
       }
-      metadata + ("SelectedBucketsCount" ->
-        (s"$numSelectedBuckets out of ${spec.numBuckets}" +
-          optionalNumCoalescedBuckets.map { b => s" (Coalesced to 
$b)"}.getOrElse("")))
-    } getOrElse {
-      metadata
+    } else if (disableBucketedScan) {
+      metadata += ("DisableBucketedScan" -> "true")

Review comment:
       > It's kind of the reason why there is no bucket scan in this node. The 
reason can be: 1. the table is not bucketed. 2. the bucket column is not read. 
3. the planner decides to disable it as it has no benefits.
   
   The intention in my comment meant that users need to be able to see why 
bucket scans are disabled as Wenchen pointed it out above. Anyway, the followup 
looks okay.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] maropu commented on a change in pull request #29804: [SPARK-32859][SQL] Introduce physical rule to decide bucketing dynamically

Reply via email to