Re: [PR] [SPARK-46841][SQL] Add collation support for ICU locales and collation specifiers [spark]

via GitHub Wed, 22 May 2024 06:15:49 -0700


nikolamand-db commented on code in PR #46180:
URL: https://github.com/apache/spark/pull/46180#discussion_r1609933009



##########
connector/connect/common/src/main/scala/org/apache/spark/sql/connect/common/DataTypeProtoConverter.scala:
##########
@@ -177,7 +178,11 @@ object DataTypeProtoConverter {
       case s: StringType =>
         proto.DataType
           .newBuilder()
-          
.setString(proto.DataType.String.newBuilder().setCollationId(s.collationId).build())
+          .setString(
+            proto.DataType.String
+              .newBuilder()
+              
.setCollation(CollationFactory.fetchCollation(s.collationId).collationName)

Review Comment:
   It's true that we use collation name in protobuf messages, this is also part 
of the changes (before we sent collation id).
   
   We decided to go with this approach in order to remove collation id logic 
from PySpark side because if collation id is part of the protobuf message, we 
would need to have collation name parser logic in Python as well besides Scala 
(`CollationFactory`) which would be unnecessary overhead.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-46841][SQL] Add collation support for ICU locales and collation specifiers [spark]

Reply via email to