nikolamand-db commented on code in PR #46180:
URL: https://github.com/apache/spark/pull/46180#discussion_r1593910153
##########
common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationFactory.java:
##########
@@ -117,76 +119,445 @@ public Collation(
}
/**
- * Constructor with comparators that are inherited from the given collator.
+ * collation id (32-bit integer) layout:
+ * bit 31: 0 = predefined collation, 1 = user-defined collation
+ * bit 30-29: 00 = utf8-binary, 01 = ICU, 10 = indeterminate (without spec
implementation)
Review Comment:
My opinion is that it would be better to stick with this naming with
following reasoning.
If we used bit 31 for indeterminate collation, we would shrink user
collation space because we use additional bit to distinguish between predefined
and user collations. It's more convenient to distinguish this at first bit
since indeterminate collation falls into predefined space.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]