[GitHub] [spark] cloud-fan commented on pull request #36733: [SPARK-39344][SQL] Only disable bucketing when autoBucketedScan is enabled if bucket columns are not in scan output

GitBox Wed, 01 Jun 2022 06:52:42 -0700


cloud-fan commented on PR #36733:
URL: https://github.com/apache/spark/pull/36733#issuecomment-1143641829


   > What changes were proposed in this pull request?
   > Currently, bucketed scan is disabled if bucket columns are not in scan 
output. This PR proposes to move the check into DisableUnnecessaryBucketedScan 
and only disable bucketing when autoBucketedScan is enabled.
   
   The PR description doesn't say so. First, we need a user-facing explanation 
about the change, i.e. in which cases the bucket scan will be enabled while it 
was disabled before this PR. Second, the current approach is not simple enough 
and I'm a bit hesitant to merge it. Have you considered 
https://github.com/apache/spark/pull/36733#issuecomment-1143605953 ?
   
   In the end, let me make my position clear: Spark should not have correctness 
bugs in whatever configurations. IIUC you want to fix a performance regression, 
so there shouldn't be any user-facing behavior changes. Please revisit the 
`Does this PR introduce any user-facing change?` section.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] cloud-fan commented on pull request #36733: [SPARK-39344][SQL] Only disable bucketing when autoBucketedScan is enabled if bucket columns are not in scan output

Reply via email to