yaooqinn commented on code in PR #47320:
URL: https://github.com/apache/spark/pull/47320#discussion_r1677265633


##########
connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/functions.scala:
##########
@@ -3840,8 +3840,8 @@ object functions {
 
   /**
    * Computes the first argument into a string from a binary using the 
provided character set (one
-   * of 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16'). 
If either argument
-   * is null, the result will also be null.
+   * of 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16', 
'UTF-32', 'GB2312',
+   * 'GBK', 'GB18030', 'BIG5'). If either argument is null, the result will 
also be null.

Review Comment:
   Hi @dongjoon-hyun,
   
   I understand your concern. The existing policies originate from the 
description of encode/decode functions. If we look back further in time, we can 
see that they are copied from Apache Hive. However, the documentation is 
inconsistent with its implementation in Hive or Spark when the legacy config is 
turned on. Thus, I don't think it can even be considered as our charset 
policies.
   
   >  if we allow these Chinese entensions, we need to end up to support all 
European and Japan and Korean
   
   I guess it's necessary to add other extensions in the future for our data 
sources to read characters encoded by these charsets.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to