uros-db commented on PR #45422: URL: https://github.com/apache/spark/pull/45422#issuecomment-1994394143
@cloud-fan that makes a lot of sense, to combat this - now new case classes should handle this. essentially: - `StringType` no longer accepts all collationIds, but only the default collationId (0) - i.e. UTF8_BINARY - `StringTypeBinary` is added to allow binary collations (for now: UTF8_BINARY and UNICODE), but at this time we need this for full lockdown because casting is not ready (casting is a separate effort, and when it's done, we can have `StringType` accept all binary collations directly; for now, it's incorrect) - `StringTypeBinaryLcase` is added to allow binary & lowercase (UTF8_BINARY_LCASE, UTF8_BINARY, UNICODE) - this class is important because some expressions will support binary & lowercase, but not other collations at a given time - `StringTypeAllCollations` is added to allow all collations (for now this is supported only in StringPredicate expressions: Contains, StartsWith, EndsWith) - note that these expressions handle all collations, but can't guarantee that all string arguments have exactly the same collation type, so we still need `checkCollationCompatibility` in CollationTypeConstraints) once casting is ready, we will delete this -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
