mihailotim-db commented on PR #56417: URL: https://github.com/apache/spark/pull/56417#issuecomment-4855970280
> Thanks for the quick turnaround @yadavay-amzn -- the cleanups all look right: `groupingAnalyticsResolver.resolve` is now called unconditionally (it no-ops when there are no `BaseGroupingSets`), the `skipGroupingExprChecks` parameter is gone so `assertValidAggregation` keeps its original contract, the redundant LCA-path re-validation is removed, and the unsupported cases now have direct assertions in `ExplicitlyUnsupportedResolverFeatureSuite`. > > @mihailotim-db you're right that throwing `ExplicitlyUnsupportedResolverFeature` disables these cases rather than fixing them, and resolving them is the end state we want. But properly resolving ORDER BY / HAVING / LCA over grouping analytics is a fair amount of new machinery, and your e2e fix is the right vehicle for it. Given that, I'd prefer to **land this PR now as an incremental step** and do the real resolution as a follow-up: > > * Today these queries either crash (`UNSUPPORTED_CALL` on `BaseGroupingSets.dataType`) or, for multi-column ROLLUP + HAVING, silently return wrong results ([SPARK-57346](https://issues.apache.org/jira/browse/SPARK-57346)). This PR turns both into an explicit `ExplicitlyUnsupportedResolverFeature`, so in tentative/dual-run mode they fall back to the fixed-point analyzer and produce correct results. That's a strict improvement over the status quo, and it's off-by-default + internal, so the interim throw carries no user-facing risk. > * The throws are self-contained and easy to remove: when the follow-up wires up real resolution, it just deletes the three guards. > > For the follow-up, the shape should be to **resolve the grouping-analytics expressions instead of throwing**, mirroring how the non-analytics ORDER BY/HAVING paths already work: > > * **ORDER BY**: `SortResolver` should detect when the order expressions carry grouping-analytics expressions and route them through `GroupingAnalyticsResolver.handleSortOrderExpressionsWithGroupingAnalytics` -- which already exists but currently has no caller -- rather than bailing out. That needs `resolveOrderExpressions` to report back whether a grouping-analytics expression was seen. > * **HAVING**: `HavingResolver` needs the analogous handler for the filter condition, plus scope marking so missing grouping/aggregate expressions can be pulled from the base `Aggregate` (the same "resolve against the aggregate below" machinery `handleAggregateBelowHaving` already does for the non-analytics case). > * **LCA**: the harder one -- the `Expand`-minted attribute IDs need to stay in sync with the `Project` the LCA rewrite builds, which is the desync the current comment calls out. > > @mihailotim-db does landing this as an interim step conflict with your e2e fix, or is it fine to merge now and let your PR remove the guards when it lands? If it's close, we can also just wait -- but I lean toward taking the incremental correctness win now. @cloud-fan I would like to understand how we encounter this error in the first place, given that single-pass analyzer is not yet enabled (until https://github.com/apache/spark/pull/56468 is merged) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
