srielau commented on code in PR #48737:
URL: https://github.com/apache/spark/pull/48737#discussion_r1829784021
##########
common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationFactory.java:
##########
@@ -547,23 +587,41 @@ private static CollationSpecUTF8 fromCollationId(int
collationId) {
// Extract case sensitivity from collation ID.
int caseConversionOrdinal = SpecifierUtils.getSpecValue(collationId,
CASE_SENSITIVITY_OFFSET, CASE_SENSITIVITY_MASK);
+ // Extract utf8 binary collation type from collation ID.
+ int utf8BinaryCollationType = SpecifierUtils.getSpecValue(collationId,
+ UTF8BINARY_COLLATION_TYPE_OFFSET, UTF8BINARY_COLLATION_TYPE_MASK);
// Extract space trimming from collation ID.
int spaceTrimmingOrdinal = getSpaceTrimming(collationId).ordinal();
assert(isValidCollationId(collationId));
return new CollationSpecUTF8(
CaseSensitivity.values()[caseConversionOrdinal],
+ Utf8BinaryCollationType.values()[utf8BinaryCollationType],
SpaceTrimming.values()[spaceTrimmingOrdinal]);
}
private static boolean isValidCollationId(int collationId) {
- collationId = SpecifierUtils.removeSpec(
- collationId,
- SPACE_TRIMMING_OFFSET,
- SPACE_TRIMMING_MASK);
- collationId = SpecifierUtils.removeSpec(
- collationId,
- CASE_SENSITIVITY_OFFSET,
- CASE_SENSITIVITY_MASK);
+ if (SpecifierUtils.getSpecValue(collationId,
UTF8BINARY_COLLATION_TYPE_OFFSET,
+ UTF8BINARY_COLLATION_TYPE_MASK) != 0) {
Review Comment:
Anytime we are attributed UTF8_BINARY with anything special we're likely
making a mistake.
Especially when it's related to strength or default collation.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]