Github user chenghao-intel commented on a diff in the pull request:
https://github.com/apache/spark/pull/6843#discussion_r32837622
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
@@ -1347,11 +1348,103 @@ object functions {
/**
* Computes the length of a given string column
+ *
* @group string_funcs
* @since 1.5.0
*/
def strlen(columnName: String): Column = strlen(Column(columnName))
+ /**
+ * Computes the numeric value of the first character of the specified
string value.
+ *
+ * @group string_funcs
+ * @since 1.5.0
+ */
+ def ascii(e: Column): Column = Ascii(e.expr)
+
+ /**
+ * Computes the numeric value of the first character of the specified
string column.
+ *
+ * @group string_funcs
+ * @since 1.5.0
+ */
+ def ascii(columnName: String): Column = ascii(Column(columnName))
+
+ /**
+ * Computes the specified value from binary to a base 64 string.
+ *
+ * @group string_funcs
+ * @since 1.5.0
+ */
+ def base64(e: Column): Column = Base64(e.expr)
+
+ /**
+ * Computes the specified column from binary to a base 64 string.
+ *
+ * @group string_funcs
+ * @since 1.5.0
+ */
+ def base64(columnName: String): Column = base64(Column(columnName))
+
+ /**
+ * Computes the specified value from a base 64 string to binary.
+ *
+ * @group string_funcs
+ * @since 1.5.0
+ */
+ def unbase64(e: Column): Column = UnBase64(e.expr)
+
+ /**
+ * Computes the specified column from a base 64 string to binary.
+ *
+ * @group string_funcs
+ * @since 1.5.0
+ */
+ def unbase64(columnName: String): Column = unbase64(Column(columnName))
+
+ /**
+ * Computes the first argument into a binary from a string using the
provided character set
+ * (one of 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE',
'UTF-16').
+ * If either argument is null, the result will also be null.
+ *
+ * @group string_funcs
+ * @since 1.5.0
+ */
+ def encode(value: Column, charset: Column): Column = Encode(value.expr,
charset.expr)
+
+ /**
+ * Computes the first argument into a binary from a string using the
provided character set
+ * (one of 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE',
'UTF-16').
+ * If either argument is null, the result will also be null.
+ *
+ * @group string_funcs
+ * @since 1.5.0
+ */
+ def encode(columnName: String, charsetColumnName: String): Column =
+ encode(Column(columnName), Column(charsetColumnName))
+
+ /**
+ * Computes the first argument into a string from a binary using the
provided character set
+ * (one of 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE',
'UTF-16').
+ * If either argument is null, the result will also be null.
+ *
+ * @group string_funcs
+ * @since 1.5.0
+ */
+ def decode(value: Column, charset: Column): Column = Decode(value.expr,
charset.expr)
+
+ /**
+ * Computes the first argument into a string from a binary using the
provided character set
+ * (one of 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE',
'UTF-16').
+ * If either argument is null, the result will also be null.
+ *
+ * @group string_funcs
+ * @since 1.5.0
+ */
+ def decode(columnName: String, charsetColumnName: String): Column =
--- End diff --
Yes, in most of existed DF api, we take the string as the column name,
should we break this pattern? Actually, it seems redundant for most of DF
functions, which take the string columns as parameters, as well as the `Column`
types. Of course this is a big change to the existed user code, we probably
don't want to do the clean up right now, but we can stop adding the `string`
(column name) version of DF functions during the Hive UDF rewriting, what do
you think?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]