Re: [PR] [SPARK-50206][SQL] Added separate collation id for UTF8_BINARY and non-collated strings [spark]

via GitHub Tue, 05 Nov 2024 06:03:55 -0800


vladanvasi-db commented on code in PR #48737:
URL: https://github.com/apache/spark/pull/48737#discussion_r1829393798



##########
common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationFactory.java:
##########
@@ -231,9 +231,10 @@ public Collation(
      * UTF8_BINARY collation ID binary layout:
      * bit 31-24: Zeroes.
      * bit 23-22: Zeroes, reserved for version.
-     * bit 21-19 Zeros, reserved for future trimmings.
-     * bit 18 0 = none, 1 = right trim.
-     * bit 17-3:  Zeroes.
+     * bit 21-19: Zeros, reserved for future trimmings.
+     * bit 18:    0 = none, 1 = right trim.
+     * bit 17:    0 = none, 1 = utf8_binary.
+     * bit 16-3:  Zeroes.
      * bit 2:     0, reserved for accent sensitivity.
      * bit 1:     0, reserved for uppercase and case-insensitive.
      * bit 0:     0 = case-sensitive, 1 = lowercase.

Review Comment:
   That would be a lot more refactoring to do, also in our case, we are also 
not changing the line of code that you mentioned. I think this is also clear, 
without moving the other collation ids, but if you recommend, I can refactor 
all of the ids in the UTF8 collations



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-50206][SQL] Added separate collation id for UTF8_BINARY and non-collated strings [spark]

Reply via email to