uros-db commented on PR #45422:
URL: https://github.com/apache/spark/pull/45422#issuecomment-1994394143

   @cloud-fan that makes a lot of sense, to combat this - now new case classes 
should handle this. essentially:
   - `StringType` no longer accepts all collationIds, but only the default 
collationId (0) - i.e. UTF8_BINARY
   - `StringTypeBinary` is added to allow binary collations (for now: 
UTF8_BINARY and UNICODE), but at this time we need this for full lockdown 
because casting is not ready (casting is a separate effort, and when it's done, 
we can have `StringType` accept all binary collations directly; for now, it's 
incorrect)
   - `StringTypeBinaryLcase` is added to allow binary & lowercase 
(UTF8_BINARY_LCASE, UTF8_BINARY, UNICODE) - this class is important because 
some expressions will support binary & lowercase, but not other collations at a 
given time
   - `StringTypeAllCollations` is added to allow all collations (for now this 
is supported only in StringPredicate expressions: Contains, StartsWith, 
EndsWith) - note that these expressions handle all collations, but can't 
guarantee that all string arguments have exactly the same collation type, so we 
still need `checkCollationCompatibility` in CollationTypeConstraints) once 
casting is ready, we will delete this


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to