yaooqinn commented on code in PR #47320:
URL: https://github.com/apache/spark/pull/47320#discussion_r1677265633
##########
connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/functions.scala:
##########
@@ -3840,8 +3840,8 @@ object functions {
/**
* Computes the first argument into a string from a binary using the
provided character set (one
- * of 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16').
If either argument
- * is null, the result will also be null.
+ * of 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16',
'UTF-32', 'GB2312',
+ * 'GBK', 'GB18030', 'BIG5'). If either argument is null, the result will
also be null.
Review Comment:
Hi @dongjoon-hyun,
I understand your concern. The existing policies originate from the
description of encode/decode functions. If we look back further in time, we can
see that they are copied from Apache Hive. However, the documentation is
inconsistent with its implementation in Hive or Spark when the legacy config is
turned on. Thus, I don't think it can even be considered as our charset
policies.
> if we allow these Chinese entensions, we need to end up to support all
European and Japan and Korean
I guess it's necessary to add other extensions in the future for our data
sources to read characters encoded by these charsets.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]