ilicmarkodb opened a new pull request, #54284:
URL: https://github.com/apache/spark/pull/54284

   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: 
https://spark.apache.org/contributing.html
     2. Ensure you have added or run the appropriate tests for your PR: 
https://spark.apache.org/developer-tools.html
     3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., 
'[WIP][SPARK-XXXX] Your PR title ...'.
     4. Be sure to keep the PR description updated to reflect all changes.
     5. Please write your PR title to summarize what this PR proposes.
     6. If possible, provide a concise example to reproduce the issue for a 
faster review.
     7. If you want to add a new configuration, please read the guideline first 
for naming configurations in
        
'core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala'.
     8. If you want to add or modify an error type or message, please read the 
guideline first in
        'common/utils/src/main/resources/error/README.md'.
   -->
   
   ### What changes were proposed in this pull request?
   Call `CollationTypeCasts` immediately to avoid cycle between 
`ApplyDefaultCollation`, `ExtractWindowExpressions` and `CollationTypeCasts`.
                                                                                
                                                                                
                                                                                
    
   Example:                                                                     
                                                                                
                                                                                
  
     ```sql                                                                     
                                                                                
                                                                                
    
     CREATE TABLE t (c1 STRING, c2 STRING);  -- c1 is UTF8_BINARY               
                                                                                
                                                                                
    
     CREATE TABLE t2 DEFAULT COLLATION UTF8_LCASE AS
       SELECT c1 = 'HELLO', ROW_NUMBER() OVER (PARTITION BY c1 ORDER BY c2) 
FROM t;
   ```
   
     Analyzer runs rules in batches sequentially, and the rule order is:
     `ApplyDefaultCollation` -> `ExtractWindowExpressions` -> 
`CollationTypeCasts`.
   
   Iteration 1:
     - `ApplyDefaultCollation` applies `UTF8_LCASE` collation to the literal 
`'HELLO'`. Expression `EqualTo` (`c1 = 'HELLO'`) is not resolved after this 
because c1 and the literal have different types.
     - `ExtractWindowExpressions` tries to extract the window expressions, but 
it can't because it expects the whole projectList to be resolved (both 
`EqualTo` and `ROW_NUMBER()`). `EqualTo` is not resolved, so it can't apply the 
rule.
     - `CollationTypeCasts` applies coercion rules, changing the type of the 
literal to the default `StringType` again (because that's the type of the 
column). Now `EqualTo` is resolved.
   
   Iteration 2:
     - `ApplyDefaultCollation` again applies `UTF8_LCASE` collation to the 
literal, because its type is default `StringType` again.
     - `ExtractWindowExpressions` again can't extract the window expressions 
for the same reason as before.
     - `CollationTypeCasts` applies the same coercion rules again, changing the 
type of the literal back to the default `StringType`.
   
    This cycle continues, and the plan never gets resolved. By calling 
`CollationTypeCasts` right after this rule, we ensure that other rules like 
`ExtractWindowExpressions` that expect resolved expressions can be applied.
   
   
   ### Why are the changes needed?
   Bug fix.
   
   
   ### Does this PR introduce _any_ user-facing change?
   No.
   
   
   ### How was this patch tested?
   New tests.
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   No.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to