nikolamand-db commented on code in PR #46180:
URL: https://github.com/apache/spark/pull/46180#discussion_r1593910153


##########
common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationFactory.java:
##########
@@ -117,76 +119,445 @@ public Collation(
     }
 
     /**
-     * Constructor with comparators that are inherited from the given collator.
+     * collation id (32-bit integer) layout:
+     * bit 31:    0 = predefined collation, 1 = user-defined collation
+     * bit 30-29: 00 = utf8-binary, 01 = ICU, 10 = indeterminate (without spec 
implementation)

Review Comment:
   My opinion is that it would be better to stick with this naming with 
following reasoning.
   
   If we used bit 31 for indeterminate collation, we would shrink user 
collation space because we use additional bit to distinguish between predefined 
and user collations. It's more convenient to distinguish this at first bit 
since indeterminate collation falls into predefined space.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to