[GitHub] spark pull request #21294: [SPARK-24197][SparkR][SQL] Adding array_sort func...
Github user mn-mikke commented on a diff in the pull request: https://github.com/apache/spark/pull/21294#discussion_r189244494 --- Diff: R/pkg/tests/fulltests/test_sparkSQL.R --- @@ -1497,10 +1496,16 @@ test_that("column functions", { result <- collect(select(df, element_at(df[[1]], 1L)))[[1]] expect_equal(result, c(1, 6)) + # Test array_sort() and sort_array() + df <- createDataFrame(list(list(list(2L, 1L, 3L, NA)), list(list(NA, 6L, 5L, NA, 4L + + result <- collect(select(df, array_sort(df[[1]])))[[1]] + expect_equal(result, list(list(1L, 2L, 3L, NA), list(4L, 5L, 6L, NA, NA))) + result <- collect(select(df, sort_array(df[[1]], FALSE)))[[1]] - expect_equal(result, list(list(3L, 2L, 1L), list(6L, 5L, 4L))) + expect_equal(result, list(list(3L, 2L, 1L, NA), list(6L, 5L, 4L, NA, NA))) result <- collect(select(df, sort_array(df[[1]])))[[1]] - expect_equal(result, list(list(1L, 2L, 3L), list(4L, 5L, 6L))) + expect_equal(result, list(list(NA, 1L, 2L, 3L), list(NA, NA, 4L, 5L, 6L))) --- End diff -- It took, me a while what the error message actually says since the target represents result and the current expected lists. From [R documentation](https://www.rdocumentation.org/packages/testthat/versions/0.11.0/topics/equivalence): ``` expect_equal(object, expected, ..., info = NULL, label = NULL, expected.label = NULL) ``` but: ``` > expect_equal(list(NA, 1, 2, 3), list(NA_integer_, 1, 2, 3)) Error: list(NA, 1, 2, 3) not equal to list(NA_integer_, 1, 2, 3). Component 1: Modes: logical, numeric Component 1: target is logical, current is numeric ``` Still don't understand why you get result with `NA_integer_` and I on my linux laptop and the build server `NA`. I created a [PR](https://github.com/apache/spark/pull/21362) to work around the problem. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21294: [SPARK-24197][SparkR][SQL] Adding array_sort func...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21294#discussion_r189063233 --- Diff: R/pkg/tests/fulltests/test_sparkSQL.R --- @@ -1497,10 +1496,16 @@ test_that("column functions", { result <- collect(select(df, element_at(df[[1]], 1L)))[[1]] expect_equal(result, c(1, 6)) + # Test array_sort() and sort_array() + df <- createDataFrame(list(list(list(2L, 1L, 3L, NA)), list(list(NA, 6L, 5L, NA, 4L + + result <- collect(select(df, array_sort(df[[1]])))[[1]] + expect_equal(result, list(list(1L, 2L, 3L, NA), list(4L, 5L, 6L, NA, NA))) + result <- collect(select(df, sort_array(df[[1]], FALSE)))[[1]] - expect_equal(result, list(list(3L, 2L, 1L), list(6L, 5L, 4L))) + expect_equal(result, list(list(3L, 2L, 1L, NA), list(6L, 5L, 4L, NA, NA))) result <- collect(select(df, sort_array(df[[1]])))[[1]] - expect_equal(result, list(list(1L, 2L, 3L), list(4L, 5L, 6L))) + expect_equal(result, list(list(NA, 1L, 2L, 3L), list(NA, NA, 4L, 5L, 6L))) --- End diff -- ``` Failed - 1. Failure: column functions (@test_sparkSQL.R#1502) --- `result` not equal to list(list(1L, 2L, 3L, NA), list(4L, 5L, 6L, NA, NA)). Component 1: Component 4: Modes: numeric, logical Component 1: Component 4: target is numeric, current is logical Component 2: Component 4: Modes: numeric, logical Component 2: Component 4: target is numeric, current is logical Component 2: Component 5: Modes: numeric, logical Component 2: Component 5: target is numeric, current is logical 2. Failure: column functions (@test_sparkSQL.R#1505) --- `result` not equal to list(list(3L, 2L, 1L, NA), list(6L, 5L, 4L, NA, NA)). Component 1: Component 4: Modes: numeric, logical Component 1: Component 4: target is numeric, current is logical Component 2: Component 4: Modes: numeric, logical Component 2: Component 4: target is numeric, current is logical Component 2: Component 5: Modes: numeric, logical Component 2: Component 5: target is numeric, current is logical 3. Failure: column functions (@test_sparkSQL.R#1507) --- `result` not equal to list(list(NA, 1L, 2L, 3L), list(NA, NA, 4L, 5L, 6L)). Component 1: Component 1: Modes: numeric, logical Component 1: Component 1: target is numeric, current is logical Component 2: Component 1: Modes: numeric, logical Component 2: Component 1: target is numeric, current is logical Component 2: Component 2: Modes: numeric, logical Component 2: Component 2: target is numeric, current is logical ``` In my laptop, I hit this issue. How to make the type compatible? cc @HyukjinKwon @felixcheung --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21294: [SPARK-24197][SparkR][SQL] Adding array_sort func...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21294 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21294: [SPARK-24197][SparkR][SQL] Adding array_sort func...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/21294#discussion_r187470081 --- Diff: R/pkg/R/functions.R --- @@ -208,6 +208,7 @@ NULL #' head(select(tmp, array_contains(tmp$v1, 21), size(tmp$v1))) #' head(select(tmp, array_max(tmp$v1), array_min(tmp$v1))) #' head(select(tmp, array_position(tmp$v1, 21))) +#' head(select(tmp, array_sort(tmp$v1))) --- End diff -- nit: don't need separate line for each example, let's merge this with array_position? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21294: [SPARK-24197][SparkR][SQL] Adding array_sort func...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21294#discussion_r187404332 --- Diff: R/pkg/R/functions.R --- @@ -3118,8 +3133,9 @@ setMethod("size", }) #' @details -#' \code{sort_array}: Sorts the input array in ascending or descending order according -#' to the natural ordering of the array elements. +#' \code{sort_array}: Sorts the input array in ascending or descending order according to +#' the natural ordering of the array elements. Null elements will be placed at the beginning of --- End diff -- null -> NA --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21294: [SPARK-24197][SparkR][SQL] Adding array_sort func...
Github user mn-mikke commented on a diff in the pull request: https://github.com/apache/spark/pull/21294#discussion_r187381390 --- Diff: R/pkg/tests/fulltests/test_sparkSQL.R --- @@ -1497,12 +1496,18 @@ test_that("column functions", { result <- collect(select(df, element_at(df[[1]], 1L)))[[1]] expect_equal(result, c(1, 6)) + # Test array_sort() and sort_array() + df <- createDataFrame(list(list(list(2L, 1L, 3L, NULL)), list(list(NULL, 6L, 5L, NULL, 4L + + result <- collect(select(df, array_sort(df[[1]])))[[1]] + expect_equal(result, list(list(1L, 2L, 3L, NULL), list(4L, 5L, 6L, NULL, NULL))) + result <- collect(select(df, sort_array(df[[1]], FALSE)))[[1]] - expect_equal(result, list(list(3L, 2L, 1L), list(6L, 5L, 4L))) + expect_equal(result, list(list(3L, 2L, 1L, NULL), list(6L, 5L, 4L, NULL, NULL))) result <- collect(select(df, sort_array(df[[1]])))[[1]] - expect_equal(result, list(list(1L, 2L, 3L), list(4L, 5L, 6L))) + expect_equal(result, list(list(NULL, 1L, 2L, 3L), list(NULL, NULL, 4L, 5L, 6L))) - # Test flattern + # Test flatten --- End diff -- Oh, OK. I didn't know about that. Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21294: [SPARK-24197][SparkR][SQL] Adding array_sort func...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21294#discussion_r187380861 --- Diff: R/pkg/R/functions.R --- @@ -3118,8 +3133,9 @@ setMethod("size", }) #' @details -#' \code{sort_array}: Sorts the input array in ascending or descending order according -#' to the natural ordering of the array elements. +#' \code{sort_array}: Sorts the input array in ascending or descending order according to +#' the natural ordering of the array elements. Null elements will be placed at the beginning of +#' the returned array in ascending order or at the end of the returned array in descending order. --- End diff -- nice! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21294: [SPARK-24197][SparkR][SQL] Adding array_sort func...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21294#discussion_r187380230 --- Diff: R/pkg/tests/fulltests/test_sparkSQL.R --- @@ -1497,12 +1496,18 @@ test_that("column functions", { result <- collect(select(df, element_at(df[[1]], 1L)))[[1]] expect_equal(result, c(1, 6)) + # Test array_sort() and sort_array() + df <- createDataFrame(list(list(list(2L, 1L, 3L, NULL)), list(list(NULL, 6L, 5L, NULL, 4L + + result <- collect(select(df, array_sort(df[[1]])))[[1]] + expect_equal(result, list(list(1L, 2L, 3L, NULL), list(4L, 5L, 6L, NULL, NULL))) + result <- collect(select(df, sort_array(df[[1]], FALSE)))[[1]] - expect_equal(result, list(list(3L, 2L, 1L), list(6L, 5L, 4L))) + expect_equal(result, list(list(3L, 2L, 1L, NULL), list(6L, 5L, 4L, NULL, NULL))) result <- collect(select(df, sort_array(df[[1]])))[[1]] - expect_equal(result, list(list(1L, 2L, 3L), list(4L, 5L, 6L))) + expect_equal(result, list(list(NULL, 1L, 2L, 3L), list(NULL, NULL, 4L, 5L, 6L))) - # Test flattern + # Test flatten --- End diff -- I would leave this out to prevent a conflict. It's being fixed in #21255. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21294: [SPARK-24197][SparkR][SQL] Adding array_sort func...
GitHub user mn-mikke opened a pull request: https://github.com/apache/spark/pull/21294 [SPARK-24197][SparkR][SQL] Adding array_sort function to SparkR ## What changes were proposed in this pull request? The PR adds array_sort function to SparkR. ## How was this patch tested? Tests added into R/pkg/tests/fulltests/test_sparkSQL.R You can merge this pull request into a Git repository by running: $ git pull https://github.com/mn-mikke/spark SPARK-24197 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21294.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21294 commit 7e7c69f29bbf4f1c535c69e6f2e2b36891020e0c Author: Marek NovotnyDate: 2018-05-10T16:07:33Z [SPARK-24197][SparkR][SQL] Adding array_sort function to SparkR --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org