Re: [PR] [SPARK-50206][SQL] Added separate collation id for UTF8_BINARY and non-collated strings [spark]

via GitHub Tue, 05 Nov 2024 08:11:19 -0800


stefankandic commented on code in PR #48737:
URL: https://github.com/apache/spark/pull/48737#discussion_r1829631229



##########
common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationFactory.java:
##########
@@ -231,9 +231,10 @@ public Collation(
      * UTF8_BINARY collation ID binary layout:
      * bit 31-24: Zeroes.
      * bit 23-22: Zeroes, reserved for version.
-     * bit 21-19 Zeros, reserved for future trimmings.
-     * bit 18 0 = none, 1 = right trim.
-     * bit 17-3:  Zeroes.
+     * bit 21-19: Zeros, reserved for future trimmings.
+     * bit 18:    0 = none, 1 = right trim.
+     * bit 17:    0 = none, 1 = utf8_binary.
+     * bit 16-3:  Zeroes.
      * bit 2:     0, reserved for accent sensitivity.
      * bit 1:     0, reserved for uppercase and case-insensitive.
      * bit 0:     0 = case-sensitive, 1 = lowercase.

Review Comment:
   The whole point of the change is to have the `StringType` object have the id 
of the implicit UTF8 collation, so with the current implementation we would 
have to change that



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-50206][SQL] Added separate collation id for UTF8_BINARY and non-collated strings [spark]

Reply via email to