[GitHub] spark pull request #21313: [SPARK-24187][R][SQL]Add array_join function to S...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21313 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21313: [SPARK-24187][R][SQL]Add array_join function to S...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/21313#discussion_r192925340 --- Diff: R/pkg/R/functions.R --- @@ -3006,6 +3008,27 @@ setMethod("array_contains", column(jc) }) +#' @details +#' \code{array_join}: Concatenates the elements of column using the delimiter. +#' Null values are replaced with nullReplacement if set, otherwise they are ignored. +#' +#' @param delimiter a character string that is used to concatenate the elements of column. +#' @param nullReplacement a character string that is used to replace the Null values. --- End diff -- could you change this to `nullReplacement an optional character string ` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21313: [SPARK-24187][R][SQL]Add array_join function to S...
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/21313#discussion_r192814246 --- Diff: R/pkg/R/functions.R --- @@ -3006,6 +3008,27 @@ setMethod("array_contains", column(jc) }) +#' @details +#' \code{array_join}: Concatenates the elements of column using the delimiter. +#' Null values are replaced with nullReplacement if set, otherwise they are ignored. +#' +#' @param delimiter a character string that is used to concatenate the elements of column. +#' @param nullReplacement a character string that is used to replace the Null values. +#' @rdname column_collection_functions +#' @aliases array_join array_join,Column-method +#' @note array_join since 2.4.0 +setMethod("array_join", + signature(x = "Column", delimiter = "character"), + function(x, delimiter, nullReplacement = NULL) { + jc <- if (is.null(nullReplacement)) { + callJStatic("org.apache.spark.sql.functions", "array_join", x@jc, delimiter) + } else { + callJStatic("org.apache.spark.sql.functions", "array_join", x@jc, delimiter, + nullReplacement) --- End diff -- It's ```Hello#FooBeautiful#FooWorld!``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21313: [SPARK-24187][R][SQL]Add array_join function to S...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/21313#discussion_r192627641 --- Diff: R/pkg/R/functions.R --- @@ -3006,6 +3008,27 @@ setMethod("array_contains", column(jc) }) +#' @details +#' \code{array_join}: Concatenates the elements of column using the delimiter. +#' Null values are replaced with nullReplacement if set, otherwise they are ignored. +#' +#' @param delimiter a character string that is used to concatenate the elements of column. +#' @param nullReplacement a character string that is used to replace the Null values. +#' @rdname column_collection_functions +#' @aliases array_join array_join,Column-method +#' @note array_join since 2.4.0 +setMethod("array_join", + signature(x = "Column", delimiter = "character"), + function(x, delimiter, nullReplacement = NULL) { + jc <- if (is.null(nullReplacement)) { + callJStatic("org.apache.spark.sql.functions", "array_join", x@jc, delimiter) + } else { + callJStatic("org.apache.spark.sql.functions", "array_join", x@jc, delimiter, + nullReplacement) --- End diff -- `as.character(nullReplacement)` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21313: [SPARK-24187][R][SQL]Add array_join function to S...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/21313#discussion_r192627948 --- Diff: R/pkg/R/functions.R --- @@ -3006,6 +3008,27 @@ setMethod("array_contains", column(jc) }) +#' @details +#' \code{array_join}: Concatenates the elements of column using the delimiter. +#' Null values are replaced with nullReplacement if set, otherwise they are ignored. +#' +#' @param delimiter a character string that is used to concatenate the elements of column. +#' @param nullReplacement a character string that is used to replace the Null values. +#' @rdname column_collection_functions +#' @aliases array_join array_join,Column-method +#' @note array_join since 2.4.0 +setMethod("array_join", + signature(x = "Column", delimiter = "character"), + function(x, delimiter, nullReplacement = NULL) { + jc <- if (is.null(nullReplacement)) { + callJStatic("org.apache.spark.sql.functions", "array_join", x@jc, delimiter) + } else { + callJStatic("org.apache.spark.sql.functions", "array_join", x@jc, delimiter, + nullReplacement) --- End diff -- re https://github.com/apache/spark/pull/21313#discussion_r192578750 so what's the behavior if delimiter is more than one character? like `array_join(df$a, "#Foo", "Beautiful")`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21313: [SPARK-24187][R][SQL]Add array_join function to S...
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/21313#discussion_r192578796 --- Diff: R/pkg/R/functions.R --- @@ -3006,6 +3008,27 @@ setMethod("array_contains", column(jc) }) +#' @details +#' \code{array_join}: Concatenates the elements of column using the delimiter. +#' Null values are replaced with nullReplacement if set, otherwise they are ignored. +#' +#' @param delimiter character(s) to use to concatenate the elements of column. +#' @param nullReplacement character(s) to use to replace the Null values. +#' @rdname column_collection_functions +#' @aliases array_join array_join,Column-method +#' @note array_join since 2.4.0 +setMethod("array_join", + signature(x = "Column"), --- End diff -- @felixcheung I will add the type so it will be ``` signature(x = "Column", delimiter = "character"), ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21313: [SPARK-24187][R][SQL]Add array_join function to S...
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/21313#discussion_r192578774 --- Diff: R/pkg/R/functions.R --- @@ -3006,6 +3008,27 @@ setMethod("array_contains", column(jc) }) +#' @details +#' \code{array_join}: Concatenates the elements of column using the delimiter. +#' Null values are replaced with nullReplacement if set, otherwise they are ignored. +#' +#' @param delimiter character(s) to use to concatenate the elements of column. +#' @param nullReplacement character(s) to use to replace the Null values. +#' @rdname column_collection_functions +#' @aliases array_join array_join,Column-method +#' @note array_join since 2.4.0 +setMethod("array_join", + signature(x = "Column"), + function(x, delimiter, nullReplacement = NA) { --- End diff -- @felixcheung I will change the default to NULL. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21313: [SPARK-24187][R][SQL]Add array_join function to S...
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/21313#discussion_r192578750 --- Diff: R/pkg/R/functions.R --- @@ -3006,6 +3008,27 @@ setMethod("array_contains", column(jc) }) +#' @details +#' \code{array_join}: Concatenates the elements of column using the delimiter. +#' Null values are replaced with nullReplacement if set, otherwise they are ignored. +#' +#' @param delimiter character(s) to use to concatenate the elements of column. --- End diff -- @felixcheung scala doesn't have a doc for param delimiter. I added this myself. What I am trying to say is "one or more characters". I will change to "a character string" so it will be ``` @param delimiter a character string to use to concatenate the elements of column. ``` Does this look ok to you? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21313: [SPARK-24187][R][SQL]Add array_join function to S...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/21313#discussion_r192577048 --- Diff: R/pkg/R/functions.R --- @@ -3006,6 +3008,27 @@ setMethod("array_contains", column(jc) }) +#' @details +#' \code{array_join}: Concatenates the elements of column using the delimiter. +#' Null values are replaced with nullReplacement if set, otherwise they are ignored. +#' +#' @param delimiter character(s) to use to concatenate the elements of column. --- End diff -- I didn't check scala - what's "(s)" here in "character(s)" mean? I ask because "character" refers to the type in R --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21313: [SPARK-24187][R][SQL]Add array_join function to S...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/21313#discussion_r192576992 --- Diff: R/pkg/R/functions.R --- @@ -3006,6 +3008,27 @@ setMethod("array_contains", column(jc) }) +#' @details +#' \code{array_join}: Concatenates the elements of column using the delimiter. +#' Null values are replaced with nullReplacement if set, otherwise they are ignored. +#' +#' @param delimiter character(s) to use to concatenate the elements of column. +#' @param nullReplacement character(s) to use to replace the Null values. +#' @rdname column_collection_functions +#' @aliases array_join array_join,Column-method +#' @note array_join since 2.4.0 +setMethod("array_join", + signature(x = "Column"), + function(x, delimiter, nullReplacement = NA) { --- End diff -- wait.. why is `nullReplacement` default to NA? that's a bit unusual --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21313: [SPARK-24187][R][SQL]Add array_join function to S...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/21313#discussion_r192577008 --- Diff: R/pkg/R/functions.R --- @@ -3006,6 +3008,27 @@ setMethod("array_contains", column(jc) }) +#' @details +#' \code{array_join}: Concatenates the elements of column using the delimiter. +#' Null values are replaced with nullReplacement if set, otherwise they are ignored. +#' +#' @param delimiter character(s) to use to concatenate the elements of column. +#' @param nullReplacement character(s) to use to replace the Null values. +#' @rdname column_collection_functions +#' @aliases array_join array_join,Column-method +#' @note array_join since 2.4.0 +setMethod("array_join", + signature(x = "Column"), --- End diff -- is `delimiter` supposed to be a string? it's better to have its type in the signature --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21313: [SPARK-24187][R][SQL]Add array_join function to S...
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/21313#discussion_r192574515 --- Diff: R/pkg/tests/fulltests/test_sparkSQL.R --- @@ -1518,6 +1518,16 @@ test_that("column functions", { result <- collect(select(df, arrays_overlap(df[[1]], df[[2]])))[[1]] expect_equal(result, c(TRUE, FALSE, NA)) + # Test array_join() + df <- createDataFrame(list(list(list("Hello", "World!" + result <- collect(select(df, array_join(df[[1]], "#")))[[1]] + expect_equal(result, "Hello#World!") + df2 <- createDataFrame(list(list(list("Hello", NA, "World!" --- End diff -- @HyukjinKwon Thank you very much for your review. I will add a test for NULL and also change the }. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21313: [SPARK-24187][R][SQL]Add array_join function to S...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21313#discussion_r192564730 --- Diff: R/pkg/tests/fulltests/test_sparkSQL.R --- @@ -1518,6 +1518,16 @@ test_that("column functions", { result <- collect(select(df, arrays_overlap(df[[1]], df[[2]])))[[1]] expect_equal(result, c(TRUE, FALSE, NA)) + # Test array_join() + df <- createDataFrame(list(list(list("Hello", "World!" + result <- collect(select(df, array_join(df[[1]], "#")))[[1]] + expect_equal(result, "Hello#World!") + df2 <- createDataFrame(list(list(list("Hello", NA, "World!" --- End diff -- How does it work with NULL? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21313: [SPARK-24187][R][SQL]Add array_join function to S...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21313#discussion_r192564618 --- Diff: R/pkg/R/functions.R --- @@ -3006,6 +3008,28 @@ setMethod("array_contains", column(jc) }) +#' @details +#' \code{array_join}: Concatenates the elements of column using the delimiter. +#' Null values are replaced with nullReplacement if set, otherwise they are ignored. +#' +#' @param delimiter character(s) to use to concatenate the elements of column. +#' @param nullReplacement character(s) to use to replace the Null values. +#' @rdname column_collection_functions +#' @aliases array_join array_join,Column-method +#' @note array_join since 2.4.0 +setMethod("array_join", + signature(x = "Column"), + function(x, delimiter, nullReplacement = NA) { + jc <- if (is.na(nullReplacement)) { + callJStatic("org.apache.spark.sql.functions", "array_join", x@jc, delimiter) + } + else { --- End diff -- nit: `} else {` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21313: [SPARK-24187][R][SQL]Add array_join function to S...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21313#discussion_r192564574 --- Diff: R/pkg/R/functions.R --- @@ -3006,6 +3008,28 @@ setMethod("array_contains", column(jc) }) +#' @details +#' \code{array_join}: Concatenates the elements of column using the delimiter. +#' Null values are replaced with nullReplacement if set, otherwise they are ignored. --- End diff -- Null -> NA? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21313: [SPARK-24187][R][SQL]Add array_join function to S...
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/21313#discussion_r187814160 --- Diff: R/pkg/R/functions.R --- @@ -3006,6 +3008,28 @@ setMethod("array_contains", column(jc) }) +#' @details +#' \code{array_join}: Concatenates the elements of column using the delimiter. +#' Null values are replaced with null_replacement if set, otherwise they are ignored. +#' +#' @param delimiter character(s) to use to concatenate the elements of column. +#' @param null_replacement character(s) to use to replace the Null values. +#' @rdname column_collection_functions +#' @aliases array_join array_join,Column-method +#' @note array_join since 2.4.0 +setMethod("array_join", + signature(x = "Column"), + function(x, delimiter, null_replacement = NA) { --- End diff -- @felixcheung Python uses null_replacement but Scala uses nullReplacement. I will change to nullReplacement. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21313: [SPARK-24187][R][SQL]Add array_join function to S...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/21313#discussion_r187810848 --- Diff: R/pkg/R/functions.R --- @@ -3006,6 +3008,28 @@ setMethod("array_contains", column(jc) }) +#' @details +#' \code{array_join}: Concatenates the elements of column using the delimiter. +#' Null values are replaced with null_replacement if set, otherwise they are ignored. +#' +#' @param delimiter character(s) to use to concatenate the elements of column. +#' @param null_replacement character(s) to use to replace the Null values. +#' @rdname column_collection_functions +#' @aliases array_join array_join,Column-method +#' @note array_join since 2.4.0 +setMethod("array_join", + signature(x = "Column"), + function(x, delimiter, null_replacement = NA) { --- End diff -- generally we don't use `_` in variable name in R. what's the variable name in Scala? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21313: [SPARK-24187][R][SQL]Add array_join function to S...
GitHub user huaxingao opened a pull request: https://github.com/apache/spark/pull/21313 [SPARK-24187][R][SQL]Add array_join function to SparkR ## What changes were proposed in this pull request? This PR adds array_join function to SparkR ## How was this patch tested? Add unit test in test_sparkSQL.R You can merge this pull request into a Git repository by running: $ git pull https://github.com/huaxingao/spark spark-24187 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21313.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21313 commit 0c21160751c82c4c453efea72e0553f6802f24a2 Author: Huaxin GaoDate: 2018-05-13T16:07:14Z [SPARK-24187][R][SQL]Adding array_join function to SparkR --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org