maropu commented on a change in pull request #29804:
URL: https://github.com/apache/spark/pull/29804#discussion_r493984368
##########
File path:
sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala
##########
@@ -348,20 +352,22 @@ case class FileSourceScanExec(
"DataFilters" -> seqToString(dataFilters),
"Location" -> locationDesc)
- val withSelectedBucketsCount = relation.bucketSpec.map { spec =>
- val numSelectedBuckets = optionalBucketSet.map { b =>
- b.cardinality()
- } getOrElse {
- spec.numBuckets
+ if (bucketedScan) {
+ relation.bucketSpec.map { spec =>
+ val numSelectedBuckets = optionalBucketSet.map { b =>
+ b.cardinality()
+ } getOrElse {
+ spec.numBuckets
+ }
+ metadata += ("SelectedBucketsCount" ->
+ (s"$numSelectedBuckets out of ${spec.numBuckets}" +
+ optionalNumCoalescedBuckets.map { b => s" (Coalesced to
$b)"}.getOrElse("")))
}
- metadata + ("SelectedBucketsCount" ->
- (s"$numSelectedBuckets out of ${spec.numBuckets}" +
- optionalNumCoalescedBuckets.map { b => s" (Coalesced to
$b)"}.getOrElse("")))
- } getOrElse {
- metadata
+ } else if (disableBucketedScan) {
+ metadata += ("DisableBucketedScan" -> "true")
Review comment:
> It's kind of the reason why there is no bucket scan in this node. The
reason can be: 1. the table is not bucketed. 2. the bucket column is not read.
3. the planner decides to disable it as it has no benefits.
The intention in my comment meant that users need to be able to see why
bucket scans are disabled as Wenchen pointed it out above. Anyway, the followup
looks okay.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]