nikolamand-db commented on code in PR #46180:
URL: https://github.com/apache/spark/pull/46180#discussion_r1606716285
##########
common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationFactory.java:
##########
@@ -117,76 +119,445 @@ public Collation(
}
/**
- * Constructor with comparators that are inherited from the given collator.
+ * collation id (32-bit integer) layout:
+ * bit 31: 0 = predefined collation, 1 = user-defined collation
+ * bit 30-29: 00 = utf8-binary, 01 = ICU, 10 = indeterminate (without spec
implementation)
+ * bit 28: 0 for utf8-binary / 0 = case-sensitive, 1 = case-insensitive
for ICU
+ * bit 27: 0 for utf8-binary / 0 = accent-sensitive, 1 =
accent-insensitive for ICU
+ * bit 26-25: zeroes, reserved for punctuation sensitivity
+ * bit 24-23: zeroes, reserved for first letter preference
+ * bit 22-21: 00 = unspecified, 01 = to-lower, 10 = to-upper
+ * bit 20-19: zeroes, reserved for space trimming
+ * bit 18-17: zeroes, reserved for version
Review Comment:
That's right, we reserve space for maximum of 4 versions which is enough for
phase-in and phase-out ICU version bumping.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]