[PR] [SPARK-47776][SS] Disallow binary inequality collation be used in key schema of stateful operator [spark]

via GitHub Tue, 09 Apr 2024 00:37:36 -0700


HeartSaVioR opened a new pull request, #45951:
URL: https://github.com/apache/spark/pull/45951


   ### What changes were proposed in this pull request?
   
   This PR proposes to disallow using binary inequality collation column in the 
key schema of stateful operator. Worth noting that changing the collation for 
the same string column during the query restart was already disallowed at the 
time of introduction of collation.
   
   ### Why are the changes needed?
   
   state store API is heavily relying on the fact that provider implementation 
performs O(1)-like get and put operation. While the actual implementation would 
be dependent on the state store provider, it is intuitive to assume that these 
providers only do lookup of the key based on binary format (implying binary 
equality).
   
   That said, even though the column spec is case insensitive, state store API 
wouldn't take this into consideration, and lead to produce the wrong result. 
e.g. Counting 'a' and 'A' differently while the column is case insensitive.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No, as it wasn't released yet.
   
   ### How was this patch tested?
   
   New UTs.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   No.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] [SPARK-47776][SS] Disallow binary inequality collation be used in key schema of stateful operator [spark]

Reply via email to