Re: [PR] [SPARK-50206][SQL] Added separate collation id for UTF8_BINARY and non-collated strings [spark]

via GitHub Tue, 05 Nov 2024 08:20:52 -0800


vladanvasi-db commented on code in PR #48737:
URL: https://github.com/apache/spark/pull/48737#discussion_r1829645097



##########
common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationFactory.java:
##########
@@ -231,9 +231,10 @@ public Collation(
      * UTF8_BINARY collation ID binary layout:
      * bit 31-24: Zeroes.
      * bit 23-22: Zeroes, reserved for version.
-     * bit 21-19 Zeros, reserved for future trimmings.
-     * bit 18 0 = none, 1 = right trim.
-     * bit 17-3:  Zeroes.
+     * bit 21-19: Zeros, reserved for future trimmings.
+     * bit 18:    0 = none, 1 = right trim.
+     * bit 17:    0 = none, 1 = utf8_binary.
+     * bit 16-3:  Zeroes.
      * bit 2:     0, reserved for accent sensitivity.
      * bit 1:     0, reserved for uppercase and case-insensitive.
      * bit 0:     0 = case-sensitive, 1 = lowercase.

Review Comment:
   But 0 is the id of the implicit UTF8_BINARY collation and 1<<17 is the id of 
the explicit UTF8_BINARY collation, I do not see why we have to change the 
`StringType` object here. I will refactor the code and change this comment so 
it is better for understanding this case.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-50206][SQL] Added separate collation id for UTF8_BINARY and non-collated strings [spark]

Reply via email to