cloud-fan commented on PR #48737: URL: https://github.com/apache/spark/pull/48737#issuecomment-2467520611
Since we can't reach an agreement, maybe we should pick a different approach. For StringType with default/undetermined collation, we want it to be the same as utf8 collation so that we won't break anything, but we also want it to have a special annotation so that we can determine the actual default collation later on. With the above requirement in mind, I think we should keep `object StringType` unchanged so that it's guaranteed that we won't break anything if users do not use string collation. We should mark StringType with explicit utf8 collation with a special annotation so that we don't change it afterward. My new proposal is: in the parser, we return `object StringType` if the collation is not explicitly given, and return `new StringType(...)` when the collation is explicitly given. Later on, when we need to assign the actual default collation, we should find out string types that `stringType.eq(StringType) == true`, instead of `stringType.collationId == 0`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
