[GitHub] spark pull request #21313: [SPARK-24187][R][SQL]Add array_join function to S...

2018-06-05 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21313


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21313: [SPARK-24187][R][SQL]Add array_join function to S...

2018-06-04 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/21313#discussion_r192925340
  
--- Diff: R/pkg/R/functions.R ---
@@ -3006,6 +3008,27 @@ setMethod("array_contains",
 column(jc)
   })
 
+#' @details
+#' \code{array_join}: Concatenates the elements of column using the 
delimiter.
+#' Null values are replaced with nullReplacement if set, otherwise they 
are ignored.
+#'
+#' @param delimiter a character string that is used to concatenate the 
elements of column.
+#' @param nullReplacement a character string that is used to replace the 
Null values.
--- End diff --

could you change this to
`nullReplacement an optional character string `


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21313: [SPARK-24187][R][SQL]Add array_join function to S...

2018-06-04 Thread huaxingao
Github user huaxingao commented on a diff in the pull request:

https://github.com/apache/spark/pull/21313#discussion_r192814246
  
--- Diff: R/pkg/R/functions.R ---
@@ -3006,6 +3008,27 @@ setMethod("array_contains",
 column(jc)
   })
 
+#' @details
+#' \code{array_join}: Concatenates the elements of column using the 
delimiter.
+#' Null values are replaced with nullReplacement if set, otherwise they 
are ignored.
+#'
+#' @param delimiter a character string that is used to concatenate the 
elements of column.
+#' @param nullReplacement a character string that is used to replace the 
Null values.
+#' @rdname column_collection_functions
+#' @aliases array_join array_join,Column-method
+#' @note array_join since 2.4.0
+setMethod("array_join",
+ signature(x = "Column", delimiter = "character"),
+ function(x, delimiter, nullReplacement = NULL) {
+   jc <- if (is.null(nullReplacement)) {
+ callJStatic("org.apache.spark.sql.functions", "array_join", 
x@jc, delimiter)
+   } else {
+ callJStatic("org.apache.spark.sql.functions", "array_join", 
x@jc, delimiter,
+ nullReplacement)
--- End diff --

It's ```Hello#FooBeautiful#FooWorld!```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21313: [SPARK-24187][R][SQL]Add array_join function to S...

2018-06-03 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/21313#discussion_r192627641
  
--- Diff: R/pkg/R/functions.R ---
@@ -3006,6 +3008,27 @@ setMethod("array_contains",
 column(jc)
   })
 
+#' @details
+#' \code{array_join}: Concatenates the elements of column using the 
delimiter.
+#' Null values are replaced with nullReplacement if set, otherwise they 
are ignored.
+#'
+#' @param delimiter a character string that is used to concatenate the 
elements of column.
+#' @param nullReplacement a character string that is used to replace the 
Null values.
+#' @rdname column_collection_functions
+#' @aliases array_join array_join,Column-method
+#' @note array_join since 2.4.0
+setMethod("array_join",
+ signature(x = "Column", delimiter = "character"),
+ function(x, delimiter, nullReplacement = NULL) {
+   jc <- if (is.null(nullReplacement)) {
+ callJStatic("org.apache.spark.sql.functions", "array_join", 
x@jc, delimiter)
+   } else {
+ callJStatic("org.apache.spark.sql.functions", "array_join", 
x@jc, delimiter,
+ nullReplacement)
--- End diff --

`as.character(nullReplacement)`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21313: [SPARK-24187][R][SQL]Add array_join function to S...

2018-06-03 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/21313#discussion_r192627948
  
--- Diff: R/pkg/R/functions.R ---
@@ -3006,6 +3008,27 @@ setMethod("array_contains",
 column(jc)
   })
 
+#' @details
+#' \code{array_join}: Concatenates the elements of column using the 
delimiter.
+#' Null values are replaced with nullReplacement if set, otherwise they 
are ignored.
+#'
+#' @param delimiter a character string that is used to concatenate the 
elements of column.
+#' @param nullReplacement a character string that is used to replace the 
Null values.
+#' @rdname column_collection_functions
+#' @aliases array_join array_join,Column-method
+#' @note array_join since 2.4.0
+setMethod("array_join",
+ signature(x = "Column", delimiter = "character"),
+ function(x, delimiter, nullReplacement = NULL) {
+   jc <- if (is.null(nullReplacement)) {
+ callJStatic("org.apache.spark.sql.functions", "array_join", 
x@jc, delimiter)
+   } else {
+ callJStatic("org.apache.spark.sql.functions", "array_join", 
x@jc, delimiter,
+ nullReplacement)
--- End diff --

re https://github.com/apache/spark/pull/21313#discussion_r192578750
 so what's the behavior if delimiter is more than one character?
like `array_join(df$a, "#Foo", "Beautiful")`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21313: [SPARK-24187][R][SQL]Add array_join function to S...

2018-06-03 Thread huaxingao
Github user huaxingao commented on a diff in the pull request:

https://github.com/apache/spark/pull/21313#discussion_r192578796
  
--- Diff: R/pkg/R/functions.R ---
@@ -3006,6 +3008,27 @@ setMethod("array_contains",
 column(jc)
   })
 
+#' @details
+#' \code{array_join}: Concatenates the elements of column using the 
delimiter.
+#' Null values are replaced with nullReplacement if set, otherwise they 
are ignored.
+#'
+#' @param delimiter character(s) to use to concatenate the elements of 
column.
+#' @param nullReplacement character(s) to use to replace the Null values.
+#' @rdname column_collection_functions
+#' @aliases array_join array_join,Column-method
+#' @note array_join since 2.4.0
+setMethod("array_join",
+ signature(x = "Column"),
--- End diff --

@felixcheung 
I will add the type so it will be
```
signature(x = "Column", delimiter = "character"),
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21313: [SPARK-24187][R][SQL]Add array_join function to S...

2018-06-03 Thread huaxingao
Github user huaxingao commented on a diff in the pull request:

https://github.com/apache/spark/pull/21313#discussion_r192578774
  
--- Diff: R/pkg/R/functions.R ---
@@ -3006,6 +3008,27 @@ setMethod("array_contains",
 column(jc)
   })
 
+#' @details
+#' \code{array_join}: Concatenates the elements of column using the 
delimiter.
+#' Null values are replaced with nullReplacement if set, otherwise they 
are ignored.
+#'
+#' @param delimiter character(s) to use to concatenate the elements of 
column.
+#' @param nullReplacement character(s) to use to replace the Null values.
+#' @rdname column_collection_functions
+#' @aliases array_join array_join,Column-method
+#' @note array_join since 2.4.0
+setMethod("array_join",
+ signature(x = "Column"),
+ function(x, delimiter, nullReplacement = NA) {
--- End diff --

@felixcheung 
I will change the default to NULL.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21313: [SPARK-24187][R][SQL]Add array_join function to S...

2018-06-03 Thread huaxingao
Github user huaxingao commented on a diff in the pull request:

https://github.com/apache/spark/pull/21313#discussion_r192578750
  
--- Diff: R/pkg/R/functions.R ---
@@ -3006,6 +3008,27 @@ setMethod("array_contains",
 column(jc)
   })
 
+#' @details
+#' \code{array_join}: Concatenates the elements of column using the 
delimiter.
+#' Null values are replaced with nullReplacement if set, otherwise they 
are ignored.
+#'
+#' @param delimiter character(s) to use to concatenate the elements of 
column.
--- End diff --

@felixcheung scala doesn't have a doc for param delimiter. I added this 
myself. What I am trying to say is "one or more characters". I will change to 
"a character string" so it will be
```
@param delimiter a character string to use to concatenate the elements of 
column.
```
Does this look ok to you?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21313: [SPARK-24187][R][SQL]Add array_join function to S...

2018-06-02 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/21313#discussion_r192577048
  
--- Diff: R/pkg/R/functions.R ---
@@ -3006,6 +3008,27 @@ setMethod("array_contains",
 column(jc)
   })
 
+#' @details
+#' \code{array_join}: Concatenates the elements of column using the 
delimiter.
+#' Null values are replaced with nullReplacement if set, otherwise they 
are ignored.
+#'
+#' @param delimiter character(s) to use to concatenate the elements of 
column.
--- End diff --

I didn't check scala - what's "(s)" here in "character(s)" mean? I ask 
because "character" refers to the type in R


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21313: [SPARK-24187][R][SQL]Add array_join function to S...

2018-06-02 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/21313#discussion_r192576992
  
--- Diff: R/pkg/R/functions.R ---
@@ -3006,6 +3008,27 @@ setMethod("array_contains",
 column(jc)
   })
 
+#' @details
+#' \code{array_join}: Concatenates the elements of column using the 
delimiter.
+#' Null values are replaced with nullReplacement if set, otherwise they 
are ignored.
+#'
+#' @param delimiter character(s) to use to concatenate the elements of 
column.
+#' @param nullReplacement character(s) to use to replace the Null values.
+#' @rdname column_collection_functions
+#' @aliases array_join array_join,Column-method
+#' @note array_join since 2.4.0
+setMethod("array_join",
+ signature(x = "Column"),
+ function(x, delimiter, nullReplacement = NA) {
--- End diff --

wait.. why is `nullReplacement` default to NA? that's a bit unusual


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21313: [SPARK-24187][R][SQL]Add array_join function to S...

2018-06-02 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/21313#discussion_r192577008
  
--- Diff: R/pkg/R/functions.R ---
@@ -3006,6 +3008,27 @@ setMethod("array_contains",
 column(jc)
   })
 
+#' @details
+#' \code{array_join}: Concatenates the elements of column using the 
delimiter.
+#' Null values are replaced with nullReplacement if set, otherwise they 
are ignored.
+#'
+#' @param delimiter character(s) to use to concatenate the elements of 
column.
+#' @param nullReplacement character(s) to use to replace the Null values.
+#' @rdname column_collection_functions
+#' @aliases array_join array_join,Column-method
+#' @note array_join since 2.4.0
+setMethod("array_join",
+ signature(x = "Column"),
--- End diff --

is `delimiter` supposed to be a string? it's better to have its type in the 
signature


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21313: [SPARK-24187][R][SQL]Add array_join function to S...

2018-06-02 Thread huaxingao
Github user huaxingao commented on a diff in the pull request:

https://github.com/apache/spark/pull/21313#discussion_r192574515
  
--- Diff: R/pkg/tests/fulltests/test_sparkSQL.R ---
@@ -1518,6 +1518,16 @@ test_that("column functions", {
   result <- collect(select(df, arrays_overlap(df[[1]], df[[2]])))[[1]]
   expect_equal(result, c(TRUE, FALSE, NA))
 
+  # Test array_join()
+  df <- createDataFrame(list(list(list("Hello", "World!"
+  result <- collect(select(df, array_join(df[[1]], "#")))[[1]]
+  expect_equal(result, "Hello#World!")
+  df2 <- createDataFrame(list(list(list("Hello", NA, "World!"
--- End diff --

@HyukjinKwon Thank you very much for your review. I will add a test for 
NULL and also change the }. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21313: [SPARK-24187][R][SQL]Add array_join function to S...

2018-06-02 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21313#discussion_r192564730
  
--- Diff: R/pkg/tests/fulltests/test_sparkSQL.R ---
@@ -1518,6 +1518,16 @@ test_that("column functions", {
   result <- collect(select(df, arrays_overlap(df[[1]], df[[2]])))[[1]]
   expect_equal(result, c(TRUE, FALSE, NA))
 
+  # Test array_join()
+  df <- createDataFrame(list(list(list("Hello", "World!"
+  result <- collect(select(df, array_join(df[[1]], "#")))[[1]]
+  expect_equal(result, "Hello#World!")
+  df2 <- createDataFrame(list(list(list("Hello", NA, "World!"
--- End diff --

How does it work with NULL?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21313: [SPARK-24187][R][SQL]Add array_join function to S...

2018-06-02 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21313#discussion_r192564618
  
--- Diff: R/pkg/R/functions.R ---
@@ -3006,6 +3008,28 @@ setMethod("array_contains",
 column(jc)
   })
 
+#' @details
+#' \code{array_join}: Concatenates the elements of column using the 
delimiter.
+#' Null values are replaced with nullReplacement if set, otherwise they 
are ignored.
+#'
+#' @param delimiter character(s) to use to concatenate the elements of 
column.
+#' @param nullReplacement character(s) to use to replace the Null values.
+#' @rdname column_collection_functions
+#' @aliases array_join array_join,Column-method
+#' @note array_join since 2.4.0
+setMethod("array_join",
+ signature(x = "Column"),
+ function(x, delimiter, nullReplacement = NA) {
+   jc <- if (is.na(nullReplacement)) {
+ callJStatic("org.apache.spark.sql.functions", "array_join", 
x@jc, delimiter)
+   }
+   else {
--- End diff --

nit: `} else {`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21313: [SPARK-24187][R][SQL]Add array_join function to S...

2018-06-02 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21313#discussion_r192564574
  
--- Diff: R/pkg/R/functions.R ---
@@ -3006,6 +3008,28 @@ setMethod("array_contains",
 column(jc)
   })
 
+#' @details
+#' \code{array_join}: Concatenates the elements of column using the 
delimiter.
+#' Null values are replaced with nullReplacement if set, otherwise they 
are ignored.
--- End diff --

Null -> NA?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21313: [SPARK-24187][R][SQL]Add array_join function to S...

2018-05-13 Thread huaxingao
Github user huaxingao commented on a diff in the pull request:

https://github.com/apache/spark/pull/21313#discussion_r187814160
  
--- Diff: R/pkg/R/functions.R ---
@@ -3006,6 +3008,28 @@ setMethod("array_contains",
 column(jc)
   })
 
+#' @details
+#' \code{array_join}: Concatenates the elements of column using the 
delimiter.
+#' Null values are replaced with null_replacement if set, otherwise they 
are ignored.
+#'
+#' @param delimiter character(s) to use to concatenate the elements of 
column.
+#' @param null_replacement character(s) to use to replace the Null values.
+#' @rdname column_collection_functions
+#' @aliases array_join array_join,Column-method
+#' @note array_join since 2.4.0
+setMethod("array_join",
+ signature(x = "Column"),
+ function(x, delimiter, null_replacement = NA) {
--- End diff --

@felixcheung Python uses null_replacement but Scala uses nullReplacement. I 
will change to nullReplacement. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21313: [SPARK-24187][R][SQL]Add array_join function to S...

2018-05-13 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/21313#discussion_r187810848
  
--- Diff: R/pkg/R/functions.R ---
@@ -3006,6 +3008,28 @@ setMethod("array_contains",
 column(jc)
   })
 
+#' @details
+#' \code{array_join}: Concatenates the elements of column using the 
delimiter.
+#' Null values are replaced with null_replacement if set, otherwise they 
are ignored.
+#'
+#' @param delimiter character(s) to use to concatenate the elements of 
column.
+#' @param null_replacement character(s) to use to replace the Null values.
+#' @rdname column_collection_functions
+#' @aliases array_join array_join,Column-method
+#' @note array_join since 2.4.0
+setMethod("array_join",
+ signature(x = "Column"),
+ function(x, delimiter, null_replacement = NA) {
--- End diff --

generally we don't use `_` in variable name in R. what's the variable name 
in Scala?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21313: [SPARK-24187][R][SQL]Add array_join function to S...

2018-05-13 Thread huaxingao
GitHub user huaxingao opened a pull request:

https://github.com/apache/spark/pull/21313

[SPARK-24187][R][SQL]Add array_join function to SparkR

## What changes were proposed in this pull request?

This PR adds array_join function to SparkR

## How was this patch tested?

Add unit test in test_sparkSQL.R


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/huaxingao/spark spark-24187

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21313.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21313


commit 0c21160751c82c4c453efea72e0553f6802f24a2
Author: Huaxin Gao 
Date:   2018-05-13T16:07:14Z

[SPARK-24187][R][SQL]Adding array_join function to SparkR




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org