mihailotim-db commented on PR #56417:
URL: https://github.com/apache/spark/pull/56417#issuecomment-4855970280

   > Thanks for the quick turnaround @yadavay-amzn -- the cleanups all look 
right: `groupingAnalyticsResolver.resolve` is now called unconditionally (it 
no-ops when there are no `BaseGroupingSets`), the `skipGroupingExprChecks` 
parameter is gone so `assertValidAggregation` keeps its original contract, the 
redundant LCA-path re-validation is removed, and the unsupported cases now have 
direct assertions in `ExplicitlyUnsupportedResolverFeatureSuite`.
   > 
   > @mihailotim-db you're right that throwing 
`ExplicitlyUnsupportedResolverFeature` disables these cases rather than fixing 
them, and resolving them is the end state we want. But properly resolving ORDER 
BY / HAVING / LCA over grouping analytics is a fair amount of new machinery, 
and your e2e fix is the right vehicle for it. Given that, I'd prefer to **land 
this PR now as an incremental step** and do the real resolution as a follow-up:
   > 
   > * Today these queries either crash (`UNSUPPORTED_CALL` on 
`BaseGroupingSets.dataType`) or, for multi-column ROLLUP + HAVING, silently 
return wrong results 
([SPARK-57346](https://issues.apache.org/jira/browse/SPARK-57346)). This PR 
turns both into an explicit `ExplicitlyUnsupportedResolverFeature`, so in 
tentative/dual-run mode they fall back to the fixed-point analyzer and produce 
correct results. That's a strict improvement over the status quo, and it's 
off-by-default + internal, so the interim throw carries no user-facing risk.
   > * The throws are self-contained and easy to remove: when the follow-up 
wires up real resolution, it just deletes the three guards.
   > 
   > For the follow-up, the shape should be to **resolve the grouping-analytics 
expressions instead of throwing**, mirroring how the non-analytics ORDER 
BY/HAVING paths already work:
   > 
   > * **ORDER BY**: `SortResolver` should detect when the order expressions 
carry grouping-analytics expressions and route them through 
`GroupingAnalyticsResolver.handleSortOrderExpressionsWithGroupingAnalytics` -- 
which already exists but currently has no caller -- rather than bailing out. 
That needs `resolveOrderExpressions` to report back whether a 
grouping-analytics expression was seen.
   > * **HAVING**: `HavingResolver` needs the analogous handler for the filter 
condition, plus scope marking so missing grouping/aggregate expressions can be 
pulled from the base `Aggregate` (the same "resolve against the aggregate 
below" machinery `handleAggregateBelowHaving` already does for the 
non-analytics case).
   > * **LCA**: the harder one -- the `Expand`-minted attribute IDs need to 
stay in sync with the `Project` the LCA rewrite builds, which is the desync the 
current comment calls out.
   > 
   > @mihailotim-db does landing this as an interim step conflict with your e2e 
fix, or is it fine to merge now and let your PR remove the guards when it 
lands? If it's close, we can also just wait -- but I lean toward taking the 
incremental correctness win now.
   
   @cloud-fan I would like to understand how we encounter this error in the 
first place, given that single-pass analyzer is not yet enabled (until 
https://github.com/apache/spark/pull/56468 is merged)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to