jchen5 opened a new pull request, #45673:
URL: https://github.com/apache/spark/pull/45673

   ### What changes were proposed in this pull request?
   Currently, when a subquery is correlated on a condition like `outer_map[1] = 
inner_map[1]`, DecorrelateInnerQuery may generate a join on the map itself, 
which is unsupported, so the query cannot run - for example:
   
   ```
   select * from x where (select sum(y2) from y where xm[1] = ym[1]) > 2;
   org.apache.spark.sql.AnalysisException: 
[UNSUPPORTED_SUBQUERY_EXPRESSION_CATEGORY.UNSUPPORTED_CORRELATED_REFERENCE_DATA_TYPE]
 Unsupported subquery expression: Correlated column reference 'x.xm' cannot be 
map type.
   ```
   
   However, if we rewrite the query to pull out the map access `outer_map[1]` 
into the outer plan, it succeeds.
   
   See the comments in the code at PullOutNestedDataOuterRefExpressions for 
more details and an example of the rewrite.
   
   ### Why are the changes needed?
   Enable query to run successfully
   
   ### Does this PR introduce _any_ user-facing change?
   Yes, enables queries to run that previously errored.
   
   ### How was this patch tested?
   Add tests
   
   ### Was this patch authored or co-authored using generative AI tooling?
   No


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to