cloud-fan commented on PR #36733: URL: https://github.com/apache/spark/pull/36733#issuecomment-1143641829
> What changes were proposed in this pull request? > Currently, bucketed scan is disabled if bucket columns are not in scan output. This PR proposes to move the check into DisableUnnecessaryBucketedScan and only disable bucketing when autoBucketedScan is enabled. The PR description doesn't say so. First, we need a user-facing explanation about the change, i.e. in which cases the bucket scan will be enabled while it was disabled before this PR. Second, the current approach is not simple enough and I'm a bit hesitant to merge it. Have you considered https://github.com/apache/spark/pull/36733#issuecomment-1143605953 ? In the end, let me make my position clear: Spark should not have correctness bugs in whatever configurations. IIUC you want to fix a performance regression, so there shouldn't be any user-facing behavior changes. Please revisit the `Does this PR introduce any user-facing change?` section. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
