[GitHub] spark pull request #21294: [SPARK-24197][SparkR][SQL] Adding array_sort func...

2018-05-18 Thread mn-mikke
Github user mn-mikke commented on a diff in the pull request:

https://github.com/apache/spark/pull/21294#discussion_r189244494
  
--- Diff: R/pkg/tests/fulltests/test_sparkSQL.R ---
@@ -1497,10 +1496,16 @@ test_that("column functions", {
   result <- collect(select(df, element_at(df[[1]], 1L)))[[1]]
   expect_equal(result, c(1, 6))
 
+  # Test array_sort() and sort_array()
+  df <- createDataFrame(list(list(list(2L, 1L, 3L, NA)), list(list(NA, 6L, 
5L, NA, 4L
+
+  result <- collect(select(df, array_sort(df[[1]])))[[1]]
+  expect_equal(result, list(list(1L, 2L, 3L, NA), list(4L, 5L, 6L, NA, 
NA)))
+
   result <- collect(select(df, sort_array(df[[1]], FALSE)))[[1]]
-  expect_equal(result, list(list(3L, 2L, 1L), list(6L, 5L, 4L)))
+  expect_equal(result, list(list(3L, 2L, 1L, NA), list(6L, 5L, 4L, NA, 
NA)))
   result <- collect(select(df, sort_array(df[[1]])))[[1]]
-  expect_equal(result, list(list(1L, 2L, 3L), list(4L, 5L, 6L)))
+  expect_equal(result, list(list(NA, 1L, 2L, 3L), list(NA, NA, 4L, 5L, 
6L)))
--- End diff --

It took, me a while what the error message actually says since the target 
represents result and the current expected lists. From [R 
documentation](https://www.rdocumentation.org/packages/testthat/versions/0.11.0/topics/equivalence):
```
expect_equal(object, expected, ..., info = NULL, label = NULL, 
expected.label = NULL)
```
but:
```
> expect_equal(list(NA, 1, 2, 3), list(NA_integer_, 1, 2, 3))
Error: list(NA, 1, 2, 3) not equal to list(NA_integer_, 1, 2, 3).
Component 1: Modes: logical, numeric
Component 1: target is logical, current is numeric
``` 

Still don't understand why you get result with `NA_integer_` and I on my 
linux laptop and the build server `NA`.  I created a 
[PR](https://github.com/apache/spark/pull/21362) to work around the problem. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21294: [SPARK-24197][SparkR][SQL] Adding array_sort func...

2018-05-17 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21294#discussion_r189063233
  
--- Diff: R/pkg/tests/fulltests/test_sparkSQL.R ---
@@ -1497,10 +1496,16 @@ test_that("column functions", {
   result <- collect(select(df, element_at(df[[1]], 1L)))[[1]]
   expect_equal(result, c(1, 6))
 
+  # Test array_sort() and sort_array()
+  df <- createDataFrame(list(list(list(2L, 1L, 3L, NA)), list(list(NA, 6L, 
5L, NA, 4L
+
+  result <- collect(select(df, array_sort(df[[1]])))[[1]]
+  expect_equal(result, list(list(1L, 2L, 3L, NA), list(4L, 5L, 6L, NA, 
NA)))
+
   result <- collect(select(df, sort_array(df[[1]], FALSE)))[[1]]
-  expect_equal(result, list(list(3L, 2L, 1L), list(6L, 5L, 4L)))
+  expect_equal(result, list(list(3L, 2L, 1L, NA), list(6L, 5L, 4L, NA, 
NA)))
   result <- collect(select(df, sort_array(df[[1]])))[[1]]
-  expect_equal(result, list(list(1L, 2L, 3L), list(4L, 5L, 6L)))
+  expect_equal(result, list(list(NA, 1L, 2L, 3L), list(NA, NA, 4L, 5L, 
6L)))
--- End diff --

```
Failed 
-
1. Failure: column functions (@test_sparkSQL.R#1502) 
---
`result` not equal to list(list(1L, 2L, 3L, NA), list(4L, 5L, 6L, NA, NA)).
Component 1: Component 4: Modes: numeric, logical
Component 1: Component 4: target is numeric, current is logical
Component 2: Component 4: Modes: numeric, logical
Component 2: Component 4: target is numeric, current is logical
Component 2: Component 5: Modes: numeric, logical
Component 2: Component 5: target is numeric, current is logical


2. Failure: column functions (@test_sparkSQL.R#1505) 
---
`result` not equal to list(list(3L, 2L, 1L, NA), list(6L, 5L, 4L, NA, NA)).
Component 1: Component 4: Modes: numeric, logical
Component 1: Component 4: target is numeric, current is logical
Component 2: Component 4: Modes: numeric, logical
Component 2: Component 4: target is numeric, current is logical
Component 2: Component 5: Modes: numeric, logical
Component 2: Component 5: target is numeric, current is logical


3. Failure: column functions (@test_sparkSQL.R#1507) 
---
`result` not equal to list(list(NA, 1L, 2L, 3L), list(NA, NA, 4L, 5L, 6L)).
Component 1: Component 1: Modes: numeric, logical
Component 1: Component 1: target is numeric, current is logical
Component 2: Component 1: Modes: numeric, logical
Component 2: Component 1: target is numeric, current is logical
Component 2: Component 2: Modes: numeric, logical
Component 2: Component 2: target is numeric, current is logical
```

In my laptop, I hit this issue. How to make the type compatible? cc 
@HyukjinKwon @felixcheung 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21294: [SPARK-24197][SparkR][SQL] Adding array_sort func...

2018-05-10 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21294


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21294: [SPARK-24197][SparkR][SQL] Adding array_sort func...

2018-05-10 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/21294#discussion_r187470081
  
--- Diff: R/pkg/R/functions.R ---
@@ -208,6 +208,7 @@ NULL
 #' head(select(tmp, array_contains(tmp$v1, 21), size(tmp$v1)))
 #' head(select(tmp, array_max(tmp$v1), array_min(tmp$v1)))
 #' head(select(tmp, array_position(tmp$v1, 21)))
+#' head(select(tmp, array_sort(tmp$v1)))
--- End diff --

nit: don't need separate line for each example, let's merge this with 
array_position?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21294: [SPARK-24197][SparkR][SQL] Adding array_sort func...

2018-05-10 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21294#discussion_r187404332
  
--- Diff: R/pkg/R/functions.R ---
@@ -3118,8 +3133,9 @@ setMethod("size",
   })
 
 #' @details
-#' \code{sort_array}: Sorts the input array in ascending or descending 
order according
-#' to the natural ordering of the array elements.
+#' \code{sort_array}: Sorts the input array in ascending or descending 
order according to
+#' the natural ordering of the array elements. Null elements will be 
placed at the beginning of
--- End diff --

null -> NA


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21294: [SPARK-24197][SparkR][SQL] Adding array_sort func...

2018-05-10 Thread mn-mikke
Github user mn-mikke commented on a diff in the pull request:

https://github.com/apache/spark/pull/21294#discussion_r187381390
  
--- Diff: R/pkg/tests/fulltests/test_sparkSQL.R ---
@@ -1497,12 +1496,18 @@ test_that("column functions", {
   result <- collect(select(df, element_at(df[[1]], 1L)))[[1]]
   expect_equal(result, c(1, 6))
 
+  # Test array_sort() and sort_array()
+  df <- createDataFrame(list(list(list(2L, 1L, 3L, NULL)), list(list(NULL, 
6L, 5L, NULL, 4L
+
+  result <- collect(select(df, array_sort(df[[1]])))[[1]]
+  expect_equal(result, list(list(1L, 2L, 3L, NULL), list(4L, 5L, 6L, NULL, 
NULL)))
+
   result <- collect(select(df, sort_array(df[[1]], FALSE)))[[1]]
-  expect_equal(result, list(list(3L, 2L, 1L), list(6L, 5L, 4L)))
+  expect_equal(result, list(list(3L, 2L, 1L, NULL), list(6L, 5L, 4L, NULL, 
NULL)))
   result <- collect(select(df, sort_array(df[[1]])))[[1]]
-  expect_equal(result, list(list(1L, 2L, 3L), list(4L, 5L, 6L)))
+  expect_equal(result, list(list(NULL, 1L, 2L, 3L), list(NULL, NULL, 4L, 
5L, 6L)))
 
-  # Test flattern
+  # Test flatten
--- End diff --

Oh, OK. I didn't know about that. Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21294: [SPARK-24197][SparkR][SQL] Adding array_sort func...

2018-05-10 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21294#discussion_r187380861
  
--- Diff: R/pkg/R/functions.R ---
@@ -3118,8 +3133,9 @@ setMethod("size",
   })
 
 #' @details
-#' \code{sort_array}: Sorts the input array in ascending or descending 
order according
-#' to the natural ordering of the array elements.
+#' \code{sort_array}: Sorts the input array in ascending or descending 
order according to
+#' the natural ordering of the array elements. Null elements will be 
placed at the beginning of
+#' the returned array in ascending order or at the end of the returned 
array in descending order.
--- End diff --

nice!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21294: [SPARK-24197][SparkR][SQL] Adding array_sort func...

2018-05-10 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21294#discussion_r187380230
  
--- Diff: R/pkg/tests/fulltests/test_sparkSQL.R ---
@@ -1497,12 +1496,18 @@ test_that("column functions", {
   result <- collect(select(df, element_at(df[[1]], 1L)))[[1]]
   expect_equal(result, c(1, 6))
 
+  # Test array_sort() and sort_array()
+  df <- createDataFrame(list(list(list(2L, 1L, 3L, NULL)), list(list(NULL, 
6L, 5L, NULL, 4L
+
+  result <- collect(select(df, array_sort(df[[1]])))[[1]]
+  expect_equal(result, list(list(1L, 2L, 3L, NULL), list(4L, 5L, 6L, NULL, 
NULL)))
+
   result <- collect(select(df, sort_array(df[[1]], FALSE)))[[1]]
-  expect_equal(result, list(list(3L, 2L, 1L), list(6L, 5L, 4L)))
+  expect_equal(result, list(list(3L, 2L, 1L, NULL), list(6L, 5L, 4L, NULL, 
NULL)))
   result <- collect(select(df, sort_array(df[[1]])))[[1]]
-  expect_equal(result, list(list(1L, 2L, 3L), list(4L, 5L, 6L)))
+  expect_equal(result, list(list(NULL, 1L, 2L, 3L), list(NULL, NULL, 4L, 
5L, 6L)))
 
-  # Test flattern
+  # Test flatten
--- End diff --

I would leave this out to prevent a conflict. It's being fixed in #21255.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21294: [SPARK-24197][SparkR][SQL] Adding array_sort func...

2018-05-10 Thread mn-mikke
GitHub user mn-mikke opened a pull request:

https://github.com/apache/spark/pull/21294

[SPARK-24197][SparkR][SQL] Adding array_sort function to SparkR

## What changes were proposed in this pull request?

The PR adds array_sort function to SparkR.

## How was this patch tested?

Tests added into R/pkg/tests/fulltests/test_sparkSQL.R


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mn-mikke/spark SPARK-24197

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21294.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21294


commit 7e7c69f29bbf4f1c535c69e6f2e2b36891020e0c
Author: Marek Novotny 
Date:   2018-05-10T16:07:33Z

[SPARK-24197][SparkR][SQL] Adding array_sort function to SparkR




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org