[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

2018-10-25 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22455


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

2018-10-23 Thread adrian555
GitHub user adrian555 reopened a pull request:

https://github.com/apache/spark/pull/22455

[SPARK-24572][SPARKR] "eager execution" for R shell, IDE

## What changes were proposed in this pull request?

Check the `spark.sql.repl.eagerEval.enabled` configuration property in 
SparkDataFrame `show()` method. If the `SparkSession` has eager execution 
enabled, the data will be returned to the R client when the data frame is 
created. So instead of seeing this
```
> df <- createDataFrame(faithful)
> df
SparkDataFrame[eruptions:double, waiting:double]
```
you will see
```
> df <- createDataFrame(faithful)
> df
+-+---+ 

|eruptions|waiting|
+-+---+
|  3.6|   79.0|
|  1.8|   54.0|
|3.333|   74.0|
|2.283|   62.0|
|4.533|   85.0|
|2.883|   55.0|
|  4.7|   88.0|
|  3.6|   85.0|
| 1.95|   51.0|
| 4.35|   85.0|
|1.833|   54.0|
|3.917|   84.0|
|  4.2|   78.0|
| 1.75|   47.0|
|  4.7|   83.0|
|2.167|   52.0|
| 1.75|   62.0|
|  4.8|   84.0|
|  1.6|   52.0|
| 4.25|   79.0|
+-+---+
only showing top 20 rows
```

## How was this patch tested?
Manual tests as well as unit tests (one new test case is added).


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/adrian555/spark eager_execution

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22455.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22455


commit b3e014c4e8050e8a0b3da190bb327347f9136b7e
Author: adrian555 
Date:   2018-09-18T20:48:56Z

support eager execution

commit d0be3a8582de548862a78cfa5ffb58df933efa8d
Author: adrian555 
Date:   2018-09-18T20:51:43Z

Merge remote-tracking branch 'remotes/upstream/master' into eager_execution

commit cd8a7041c6eecc59d22db72f4f2065ed9f06640a
Author: adrian555 
Date:   2018-09-18T21:09:48Z

add newline

commit a89bf37e7fbfa089a39e0fb91cdd4a1bdd409b9f
Author: adrian555 
Date:   2018-09-20T20:58:02Z

address review comment

commit a083d64ea3210c367cc4d63ad51a1c6beab129e7
Author: adrian555 
Date:   2018-09-20T22:03:30Z

address review comment

commit 7b121e65b99e177d8870b4e098797f1f1e86ce65
Author: adrian555 
Date:   2018-09-21T00:02:31Z

address review comment

commit 4492b278ba5a4721d6a5dc836436191ad155dfc6
Author: adrian555 
Date:   2018-09-21T17:36:33Z

address review comment

commit 100bac715ee3fc6baee960dfa7c9466baa032bcb
Author: adrian555 
Date:   2018-09-24T18:16:41Z

address review comment

commit 57eb00824b072b2a326810ff42edef2ca2626eb6
Author: adrian555 
Date:   2018-09-25T17:51:34Z

address review comment

commit df582fc8d31c4e993a6e215fa0901a4728affc0e
Author: adrian555 
Date:   2018-09-26T23:06:03Z

add sparkr.SparkDataFrame.base_show_func option

commit d1562777924fd84f14655df9c7418575aeae7fbe
Author: adrian555 
Date:   2018-09-26T23:11:41Z

Merge remote-tracking branch 'remotes/upstream/master' into eager_execution

commit 44df922fb347211c9a962bf1771e2e0005634305
Author: adrian555 
Date:   2018-10-03T20:27:05Z

address review comment

commit 7ce8f9454d222b42dc3969894147461eaa850b46
Author: adrian555 
Date:   2018-10-18T17:13:32Z

Merge remote-tracking branch 'remotes/upstream/master' into eager_execution

commit 76767f9e5288d593efedda000c7fc01799d43085
Author: adrian555 
Date:   2018-10-18T18:05:47Z

address review comment

commit b5e463f8babe75463054fe1224348bb707afc195
Author: adrian555 
Date:   2018-10-23T17:52:17Z

address review comment

commit 5b39d737aea4b4a680e63e25f5df81993ced1c70
Author: adrian555 
Date:   2018-10-23T17:53:17Z

Merge remote-tracking branch 'remotes/upstream/master' into eager_execution




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

2018-10-23 Thread adrian555
Github user adrian555 closed the pull request at:

https://github.com/apache/spark/pull/22455


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

2018-10-23 Thread adrian555
Github user adrian555 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22455#discussion_r227500368
  
--- Diff: R/pkg/tests/fulltests/test_eager_execution.R ---
@@ -0,0 +1,52 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+library(testthat)
+
+context("Show SparkDataFrame when eager execution is enabled.")
+
+test_that("eager execution is not enabled", {
+  # Start Spark session without eager execution enabled
+  sparkR.session(master = sparkRTestMaster, enableHiveSupport = FALSE)
+  
+  df <- createDataFrame(faithful)
+  expect_is(df, "SparkDataFrame")
+  expected <- "eruptions:double, waiting:double"
+  expect_output(show(df), expected)
+  
+  # Stop Spark session
+  sparkR.session.stop()
+})
+
+test_that("eager execution is enabled", {
+  # Start Spark session with eager execution enabled
+  sparkConfig <- list(spark.sql.repl.eagerEval.enabled = "true",
+  spark.sql.repl.eagerEval.maxNumRows = as.integer(10))
--- End diff --

Done


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

2018-10-23 Thread adrian555
Github user adrian555 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22455#discussion_r227500418
  
--- Diff: R/pkg/tests/fulltests/test_sparkSQL.R ---
@@ -3731,6 +3731,7 @@ test_that("catalog APIs, listTables, listColumns, 
listFunctions", {
   dropTempView("cars")
 })
 
+
--- End diff --

Done


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

2018-10-23 Thread adrian555
Github user adrian555 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22455#discussion_r227500321
  
--- Diff: R/pkg/tests/fulltests/test_eager_execution.R ---
@@ -0,0 +1,52 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+library(testthat)
+
+context("Show SparkDataFrame when eager execution is enabled.")
--- End diff --

Done


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

2018-10-23 Thread adrian555
Github user adrian555 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22455#discussion_r227500277
  
--- Diff: R/pkg/tests/fulltests/test_eager_execution.R ---
@@ -0,0 +1,52 @@
+#
--- End diff --

Done.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

2018-10-23 Thread adrian555
Github user adrian555 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22455#discussion_r227500208
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -244,11 +246,33 @@ setMethod("showDF",
 #' @note show(SparkDataFrame) since 1.4.0
 setMethod("show", "SparkDataFrame",
   function(object) {
-cols <- lapply(dtypes(object), function(l) {
-  paste(l, collapse = ":")
-})
-s <- paste(cols, collapse = ", ")
-cat(paste(class(object), "[", s, "]\n", sep = ""))
+allConf <- sparkR.conf()
+prop <- allConf[["spark.sql.repl.eagerEval.enabled"]]
+if (!is.null(prop) && identical(prop, "true")) {
+  argsList <- list()
+  argsList$x <- object
+  prop <- allConf[["spark.sql.repl.eagerEval.maxNumRows"]]
+  if (!is.null(prop)) {
+numRows <- as.numeric(prop)
+if (numRows > 0) {
+  argsList$numRows <- numRows
--- End diff --

A spark session would not start if the maxNumRows config is not an integer. 
So replacing as.numeric with as.integer here won't make real difference. But I 
made the code change anyway to be clearer from code view.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

2018-10-21 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22455#discussion_r226874107
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -244,11 +246,33 @@ setMethod("showDF",
 #' @note show(SparkDataFrame) since 1.4.0
 setMethod("show", "SparkDataFrame",
   function(object) {
-cols <- lapply(dtypes(object), function(l) {
-  paste(l, collapse = ":")
-})
-s <- paste(cols, collapse = ", ")
-cat(paste(class(object), "[", s, "]\n", sep = ""))
+allConf <- sparkR.conf()
+prop <- allConf[["spark.sql.repl.eagerEval.enabled"]]
+if (!is.null(prop) && identical(prop, "true")) {
+  argsList <- list()
+  argsList$x <- object
+  prop <- allConf[["spark.sql.repl.eagerEval.maxNumRows"]]
+  if (!is.null(prop)) {
+numRows <- as.numeric(prop)
+if (numRows > 0) {
+  argsList$numRows <- numRows
--- End diff --

could you manually check if this works correctly when numRows or truncate 
(L265) is a numeric but not an integer - eg. 1.4

if not, we should replace `as.numeric` than `as.integer` in both places


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

2018-10-21 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22455#discussion_r226874469
  
--- Diff: R/pkg/tests/fulltests/test_eager_execution.R ---
@@ -0,0 +1,52 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+library(testthat)
+
+context("Show SparkDataFrame when eager execution is enabled.")
--- End diff --

change to `test show SparkDataFrame when eager execution is enabled.`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

2018-10-21 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22455#discussion_r226874142
  
--- Diff: R/pkg/tests/fulltests/test_eager_execution.R ---
@@ -0,0 +1,52 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+library(testthat)
+
+context("Show SparkDataFrame when eager execution is enabled.")
+
+test_that("eager execution is not enabled", {
+  # Start Spark session without eager execution enabled
+  sparkR.session(master = sparkRTestMaster, enableHiveSupport = FALSE)
+  
+  df <- createDataFrame(faithful)
+  expect_is(df, "SparkDataFrame")
+  expected <- "eruptions:double, waiting:double"
+  expect_output(show(df), expected)
+  
+  # Stop Spark session
+  sparkR.session.stop()
+})
+
+test_that("eager execution is enabled", {
+  # Start Spark session with eager execution enabled
+  sparkConfig <- list(spark.sql.repl.eagerEval.enabled = "true",
+  spark.sql.repl.eagerEval.maxNumRows = as.integer(10))
--- End diff --

add a test with `spark.sql.repl.eagerEval.maxNumRows` not set.
change this test to add `spark.sql.repl.eagerEval.truncate`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

2018-10-21 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22455#discussion_r226874309
  
--- Diff: R/pkg/tests/fulltests/test_eager_execution.R ---
@@ -0,0 +1,52 @@
+#
--- End diff --

one thing to note, since test runs alphebatically, this test will run 
before sparkSQL - I think we should rename this to test_sparkSQL_eager.R to 
ensure it runs after


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

2018-10-21 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22455#discussion_r226874251
  
--- Diff: R/pkg/tests/fulltests/test_sparkSQL.R ---
@@ -3731,6 +3731,7 @@ test_that("catalog APIs, listTables, listColumns, 
listFunctions", {
   dropTempView("cars")
 })
 
+
--- End diff --

remove empty line


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

2018-10-18 Thread adrian555
Github user adrian555 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22455#discussion_r226472697
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -246,30 +248,38 @@ setMethod("showDF",
 #' @note show(SparkDataFrame) since 1.4.0
 setMethod("show", "SparkDataFrame",
   function(object) {
-allConf <- sparkR.conf()
-if (!is.null(allConf[["spark.sql.repl.eagerEval.enabled"]]) &&
-identical(allConf[["spark.sql.repl.eagerEval.enabled"]], 
"true")) {
-  argsList <- list()
-  argsList$x <- object
-  if 
(!is.null(allConf[["spark.sql.repl.eagerEval.maxNumRows"]])) {
-numRows <- 
as.numeric(allConf[["spark.sql.repl.eagerEval.maxNumRows"]])
-if (numRows > 0) {
-  argsList$numRows <- numRows
+showFunc <- getOption("sparkr.SparkDataFrame.base_show_func")
--- End diff --

I have backed out the change of display function option. Also opened jira 
https://issues.apache.org/jira/browse/SPARK-25770 to track that issue.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

2018-10-08 Thread adrian555
Github user adrian555 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22455#discussion_r223257553
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -246,30 +248,38 @@ setMethod("showDF",
 #' @note show(SparkDataFrame) since 1.4.0
 setMethod("show", "SparkDataFrame",
   function(object) {
-allConf <- sparkR.conf()
-if (!is.null(allConf[["spark.sql.repl.eagerEval.enabled"]]) &&
-identical(allConf[["spark.sql.repl.eagerEval.enabled"]], 
"true")) {
-  argsList <- list()
-  argsList$x <- object
-  if 
(!is.null(allConf[["spark.sql.repl.eagerEval.maxNumRows"]])) {
-numRows <- 
as.numeric(allConf[["spark.sql.repl.eagerEval.maxNumRows"]])
-if (numRows > 0) {
-  argsList$numRows <- numRows
+showFunc <- getOption("sparkr.SparkDataFrame.base_show_func")
--- End diff --

It seems a good idea to separate out the print/show issue as I can see it 
is not 100% related to this jira. Maybe it will help with a new jira to track 
the requirement and discussion for that issue. Please confirm so that I will 
revert some related changes. Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

2018-10-06 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22455#discussion_r223197863
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -246,30 +248,38 @@ setMethod("showDF",
 #' @note show(SparkDataFrame) since 1.4.0
 setMethod("show", "SparkDataFrame",
   function(object) {
-allConf <- sparkR.conf()
-if (!is.null(allConf[["spark.sql.repl.eagerEval.enabled"]]) &&
-identical(allConf[["spark.sql.repl.eagerEval.enabled"]], 
"true")) {
-  argsList <- list()
-  argsList$x <- object
-  if 
(!is.null(allConf[["spark.sql.repl.eagerEval.maxNumRows"]])) {
-numRows <- 
as.numeric(allConf[["spark.sql.repl.eagerEval.maxNumRows"]])
-if (numRows > 0) {
-  argsList$numRows <- numRows
+showFunc <- getOption("sparkr.SparkDataFrame.base_show_func")
--- End diff --

IMO pretty print should plug in to something more R standard like 
[printr](https://yihui.name/printr/) or

[lemon](https://cran.r-project.org/web/packages/lemon/vignettes/lemon_print.html)
or

[print.x](https://stat.ethz.ch/R-manual/R-devel/library/base/html/print.dataframe.html)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

2018-10-06 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22455#discussion_r223197875
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -246,30 +248,38 @@ setMethod("showDF",
 #' @note show(SparkDataFrame) since 1.4.0
 setMethod("show", "SparkDataFrame",
   function(object) {
-allConf <- sparkR.conf()
-if (!is.null(allConf[["spark.sql.repl.eagerEval.enabled"]]) &&
-identical(allConf[["spark.sql.repl.eagerEval.enabled"]], 
"true")) {
-  argsList <- list()
-  argsList$x <- object
-  if 
(!is.null(allConf[["spark.sql.repl.eagerEval.maxNumRows"]])) {
-numRows <- 
as.numeric(allConf[["spark.sql.repl.eagerEval.maxNumRows"]])
-if (numRows > 0) {
-  argsList$numRows <- numRows
+showFunc <- getOption("sparkr.SparkDataFrame.base_show_func")
--- End diff --

could we consider leaving print/show option out? I'd like to get eager 
compute to work even in basic sparkR / R shell


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

2018-10-03 Thread adrian555
Github user adrian555 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22455#discussion_r222451616
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -246,30 +248,38 @@ setMethod("showDF",
 #' @note show(SparkDataFrame) since 1.4.0
 setMethod("show", "SparkDataFrame",
   function(object) {
-allConf <- sparkR.conf()
-if (!is.null(allConf[["spark.sql.repl.eagerEval.enabled"]]) &&
-identical(allConf[["spark.sql.repl.eagerEval.enabled"]], 
"true")) {
-  argsList <- list()
-  argsList$x <- object
-  if 
(!is.null(allConf[["spark.sql.repl.eagerEval.maxNumRows"]])) {
-numRows <- 
as.numeric(allConf[["spark.sql.repl.eagerEval.maxNumRows"]])
-if (numRows > 0) {
-  argsList$numRows <- numRows
+showFunc <- getOption("sparkr.SparkDataFrame.base_show_func")
--- End diff --

I don't have any naming preference. One thing to note is that the 
`spark.sparkr.r.command` you mentioned above is the config to the Spark while 
the one I tried to define here is a config in the R side.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

2018-10-03 Thread adrian555
Github user adrian555 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22455#discussion_r222450798
  
--- Diff: R/pkg/tests/fulltests/test_eager_execution.R ---
@@ -21,11 +21,7 @@ context("Show SparkDataFrame when eager execution is 
enabled.")
 
 test_that("eager execution is not enabled", {
   # Start Spark session without eager execution enabled
-  sparkSession <- if (windows_with_hadoop()) {
-sparkR.session(master = sparkRTestMaster)
-  } else {
-sparkR.session(master = sparkRTestMaster, enableHiveSupport = FALSE)
-  }
+  sparkR.session(master = sparkRTestMaster)
--- End diff --

Sure. Guess I misinterpreted your other comment.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

2018-09-28 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22455#discussion_r221416198
  
--- Diff: R/pkg/tests/fulltests/test_eager_execution.R ---
@@ -21,11 +21,7 @@ context("Show SparkDataFrame when eager execution is 
enabled.")
 
 test_that("eager execution is not enabled", {
   # Start Spark session without eager execution enabled
-  sparkSession <- if (windows_with_hadoop()) {
-sparkR.session(master = sparkRTestMaster)
-  } else {
-sparkR.session(master = sparkRTestMaster, enableHiveSupport = FALSE)
-  }
+  sparkR.session(master = sparkRTestMaster)
--- End diff --

as mentioned here 
https://github.com/apache/spark/pull/22455#discussion_r220030686



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

2018-09-28 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22455#discussion_r221416164
  
--- Diff: R/pkg/tests/fulltests/test_eager_execution.R ---
@@ -21,11 +21,7 @@ context("Show SparkDataFrame when eager execution is 
enabled.")
 
 test_that("eager execution is not enabled", {
   # Start Spark session without eager execution enabled
-  sparkSession <- if (windows_with_hadoop()) {
-sparkR.session(master = sparkRTestMaster)
-  } else {
-sparkR.session(master = sparkRTestMaster, enableHiveSupport = FALSE)
-  }
+  sparkR.session(master = sparkRTestMaster)
--- End diff --

you should definitely set `enableHiveSupport = FALSE` - historically this 
hasn't work well in other R tests when hive catalog is enabled


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

2018-09-28 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22455#discussion_r221415826
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -246,30 +248,38 @@ setMethod("showDF",
 #' @note show(SparkDataFrame) since 1.4.0
 setMethod("show", "SparkDataFrame",
   function(object) {
-allConf <- sparkR.conf()
-if (!is.null(allConf[["spark.sql.repl.eagerEval.enabled"]]) &&
-identical(allConf[["spark.sql.repl.eagerEval.enabled"]], 
"true")) {
-  argsList <- list()
-  argsList$x <- object
-  if 
(!is.null(allConf[["spark.sql.repl.eagerEval.maxNumRows"]])) {
-numRows <- 
as.numeric(allConf[["spark.sql.repl.eagerEval.maxNumRows"]])
-if (numRows > 0) {
-  argsList$numRows <- numRows
+showFunc <- getOption("sparkr.SparkDataFrame.base_show_func")
--- End diff --

hmm, this naming convention? typically we don't mix `.` and `_` and I don't 
think we have anything with `SparkDataFrame`

seems like we have `spark.sparkr.something` before 
https://github.com/apache/spark/blob/5264164a67df498b73facae207eda12ee133be7d/core/src/main/scala/org/apache/spark/api/r/RRunner.scala#L343



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

2018-09-25 Thread adrian555
Github user adrian555 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22455#discussion_r220292594
  
--- Diff: docs/sparkr.md ---
@@ -450,6 +450,48 @@ print(model.summaries)
 {% endhighlight %}
 
 
+### Eager execution
+
+If eager execution is enabled, the data will be returned to R client 
immediately when the `SparkDataFrame` is created. By default, eager execution 
is not enabled and can be enabled by setting the configuration property 
`spark.sql.repl.eagerEval.enabled` to `true` when the `SparkSession` is started 
up.
+
+Maximum number of rows and maximum number of characters per column of data 
to display can be controlled by `spark.sql.repl.eagerEval.maxNumRows` and 
`spark.sql.repl.eagerEval.truncate` configuration properties, respectively. 
These properties are only effective when eager execution is enabled. If these 
properties are not set explicitly, by default, data up to 20 rows and up to 20 
characters per column will be showed.
+
+
+{% highlight r %}
+
+# Start up spark session with eager execution enabled
+sparkR.session(master = "local[*]",
+   sparkConfig = list(spark.sql.repl.eagerEval.enabled = 
"true",
+  spark.sql.repl.eagerEval.maxNumRows = 
as.integer(10)))
+
+# Create a grouped and sorted SparkDataFrame
+df <- createDataFrame(faithful)
+df2 <- arrange(summarize(groupBy(df, df$waiting), count = n(df$waiting)), 
"waiting")
+
+# Similar to R data.frame, displays the data returned, instead of 
SparkDataFrame class string
+df2
+
+##+---+-+
+##|waiting|count|
+##+---+-+
+##|   43.0|1|
+##|   45.0|3|
+##|   46.0|5|
+##|   47.0|4|
+##|   48.0|3|
+##|   49.0|5|
+##|   50.0|5|
+##|   51.0|6|
+##|   52.0|5|
+##|   53.0|7|
+##+---+-+
+##only showing top 10 rows
+
+{% endhighlight %} 
+
+
+Note that to enable eager execution through `sparkR` command, add 
`spark.sql.repl.eagerEval.enabled=true` configuration property to the `--conf` 
option.
--- End diff --

In the same doc "From Data Sources", it has `either be added by specifying 
--packages with spark-submit or sparkR commands`, that is why I used `command` 
instead of `shell`. I would think that `script`, `shell` and `command` are 
exchangeable here. But if viewed by the angle that `sparkR` ends with a R 
execution env, maybe `shell` makes more sense. :)

So I made the change.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

2018-09-25 Thread adrian555
Github user adrian555 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22455#discussion_r220292560
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -1491,9 +1491,10 @@ object SQLConf {
   val REPL_EAGER_EVAL_ENABLED = 
buildConf("spark.sql.repl.eagerEval.enabled")
 .doc("Enables eager evaluation or not. When true, the top K rows of 
Dataset will be " +
   "displayed if and only if the REPL supports the eager evaluation. 
Currently, the " +
-  "eager evaluation is only supported in PySpark. For the notebooks 
like Jupyter, " +
-  "the HTML table (generated by _repr_html_) will be returned. For 
plain Python REPL, " +
-  "the returned outputs are formatted like dataframe.show().")
+  "eager evaluation is supported in PySpark and SparkR. In PySpark, 
for the notebooks like " +
+  "Jupyter, the HTML table (generated by _repr_html_) will be 
returned. For plain Python " +
+  "REPL, the returned outputs are formatted like dataframe.show(). In 
SparkR, the returned " +
+  "outputs are showed as R data.frame.")
--- End diff --

Done


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

2018-09-25 Thread adrian555
Github user adrian555 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22455#discussion_r220292513
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -244,11 +246,31 @@ setMethod("showDF",
 #' @note show(SparkDataFrame) since 1.4.0
 setMethod("show", "SparkDataFrame",
   function(object) {
-cols <- lapply(dtypes(object), function(l) {
-  paste(l, collapse = ":")
-})
-s <- paste(cols, collapse = ", ")
-cat(paste(class(object), "[", s, "]\n", sep = ""))
+allConf <- sparkR.conf()
+if (!is.null(allConf[["spark.sql.repl.eagerEval.enabled"]]) &&
--- End diff --

Done


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

2018-09-25 Thread adrian555
Github user adrian555 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22455#discussion_r220292390
  
--- Diff: R/pkg/tests/fulltests/test_eager_execution.R ---
@@ -0,0 +1,60 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+library(testthat)
+
+context("Show SparkDataFrame when eager execution is enabled.")
+
+test_that("eager execution is not enabled", {
+  # Start Spark session without eager execution enabled
+  sparkSession <- if (windows_with_hadoop()) {
+sparkR.session(master = sparkRTestMaster)
+  } else {
+sparkR.session(master = sparkRTestMaster, enableHiveSupport = FALSE)
--- End diff --

Done


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

2018-09-24 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22455#discussion_r220034035
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -244,11 +246,25 @@ setMethod("showDF",
 #' @note show(SparkDataFrame) since 1.4.0
 setMethod("show", "SparkDataFrame",
   function(object) {
-cols <- lapply(dtypes(object), function(l) {
-  paste(l, collapse = ":")
-})
-s <- paste(cols, collapse = ", ")
-cat(paste(class(object), "[", s, "]\n", sep = ""))
+if (identical(sparkR.conf("spark.sql.repl.eagerEval.enabled", 
"false")[[1]], "true")) {
+  argsList <- list()
+  argsList$x <- object
+  numRows <- 
as.numeric(sparkR.conf("spark.sql.repl.eagerEval.maxNumRows", "0")[[1]])
+  if (numRows > 0) {
+argsList$numRows <- numRows
+  }
+  truncate <- 
as.numeric(sparkR.conf("spark.sql.repl.eagerEval.truncate", "0")[[1]])
--- End diff --

just asking since this variation was not in unit test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

2018-09-24 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22455#discussion_r220033577
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -244,11 +246,31 @@ setMethod("showDF",
 #' @note show(SparkDataFrame) since 1.4.0
 setMethod("show", "SparkDataFrame",
   function(object) {
-cols <- lapply(dtypes(object), function(l) {
-  paste(l, collapse = ":")
-})
-s <- paste(cols, collapse = ", ")
-cat(paste(class(object), "[", s, "]\n", sep = ""))
+allConf <- sparkR.conf()
+if (!is.null(allConf[["spark.sql.repl.eagerEval.enabled"]]) &&
--- End diff --

you can also save off `foo <- 
allConf[["spark.sql.repl.eagerEval.enabled"]]` and reuse it, ditto for other 
cases


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

2018-09-24 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22455#discussion_r220033775
  
--- Diff: docs/sparkr.md ---
@@ -450,6 +450,48 @@ print(model.summaries)
 {% endhighlight %}
 
 
+### Eager execution
+
+If eager execution is enabled, the data will be returned to R client 
immediately when the `SparkDataFrame` is created. By default, eager execution 
is not enabled and can be enabled by setting the configuration property 
`spark.sql.repl.eagerEval.enabled` to `true` when the `SparkSession` is started 
up.
+
+Maximum number of rows and maximum number of characters per column of data 
to display can be controlled by `spark.sql.repl.eagerEval.maxNumRows` and 
`spark.sql.repl.eagerEval.truncate` configuration properties, respectively. 
These properties are only effective when eager execution is enabled. If these 
properties are not set explicitly, by default, data up to 20 rows and up to 20 
characters per column will be showed.
+
+
+{% highlight r %}
+
+# Start up spark session with eager execution enabled
+sparkR.session(master = "local[*]",
+   sparkConfig = list(spark.sql.repl.eagerEval.enabled = 
"true",
+  spark.sql.repl.eagerEval.maxNumRows = 
as.integer(10)))
+
+# Create a grouped and sorted SparkDataFrame
+df <- createDataFrame(faithful)
+df2 <- arrange(summarize(groupBy(df, df$waiting), count = n(df$waiting)), 
"waiting")
+
+# Similar to R data.frame, displays the data returned, instead of 
SparkDataFrame class string
+df2
+
+##+---+-+
+##|waiting|count|
+##+---+-+
+##|   43.0|1|
+##|   45.0|3|
+##|   46.0|5|
+##|   47.0|4|
+##|   48.0|3|
+##|   49.0|5|
+##|   50.0|5|
+##|   51.0|6|
+##|   52.0|5|
+##|   53.0|7|
+##+---+-+
+##only showing top 10 rows
+
+{% endhighlight %} 
+
+
+Note that to enable eager execution through `sparkR` command, add 
`spark.sql.repl.eagerEval.enabled=true` configuration property to the `--conf` 
option.
--- End diff --

`sparkR command` - I think should be `sparkR shell`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

2018-09-24 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22455#discussion_r220033684
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -1491,9 +1491,10 @@ object SQLConf {
   val REPL_EAGER_EVAL_ENABLED = 
buildConf("spark.sql.repl.eagerEval.enabled")
 .doc("Enables eager evaluation or not. When true, the top K rows of 
Dataset will be " +
   "displayed if and only if the REPL supports the eager evaluation. 
Currently, the " +
-  "eager evaluation is only supported in PySpark. For the notebooks 
like Jupyter, " +
-  "the HTML table (generated by _repr_html_) will be returned. For 
plain Python REPL, " +
-  "the returned outputs are formatted like dataframe.show().")
+  "eager evaluation is supported in PySpark and SparkR. In PySpark, 
for the notebooks like " +
+  "Jupyter, the HTML table (generated by _repr_html_) will be 
returned. For plain Python " +
+  "REPL, the returned outputs are formatted like dataframe.show(). In 
SparkR, the returned " +
+  "outputs are showed as R data.frame.")
--- End diff --

` are showed as R data.frame` - it's not actually the exact format, so how 
about we say ` are showed similar to R data.frame would`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

2018-09-24 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22455#discussion_r220030686
  
--- Diff: R/pkg/tests/fulltests/test_eager_execution.R ---
@@ -0,0 +1,60 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+library(testthat)
+
+context("Show SparkDataFrame when eager execution is enabled.")
+
+test_that("eager execution is not enabled", {
+  # Start Spark session without eager execution enabled
+  sparkSession <- if (windows_with_hadoop()) {
+sparkR.session(master = sparkRTestMaster)
+  } else {
+sparkR.session(master = sparkRTestMaster, enableHiveSupport = FALSE)
--- End diff --

actually you can use `, enableHiveSupport = FALSE` for this file in all 
cases - we only need the opposite for SQL tests


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

2018-09-24 Thread adrian555
Github user adrian555 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22455#discussion_r219971196
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -244,11 +246,25 @@ setMethod("showDF",
 #' @note show(SparkDataFrame) since 1.4.0
 setMethod("show", "SparkDataFrame",
   function(object) {
-cols <- lapply(dtypes(object), function(l) {
-  paste(l, collapse = ":")
-})
-s <- paste(cols, collapse = ", ")
-cat(paste(class(object), "[", s, "]\n", sep = ""))
+if (identical(sparkR.conf("spark.sql.repl.eagerEval.enabled", 
"false")[[1]], "true")) {
+  argsList <- list()
+  argsList$x <- object
+  numRows <- 
as.numeric(sparkR.conf("spark.sql.repl.eagerEval.maxNumRows", "0")[[1]])
+  if (numRows > 0) {
+argsList$numRows <- numRows
+  }
+  truncate <- 
as.numeric(sparkR.conf("spark.sql.repl.eagerEval.truncate", "0")[[1]])
--- End diff --

I did test and it worked. Any specific concern? Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

2018-09-24 Thread adrian555
Github user adrian555 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22455#discussion_r219938610
  
--- Diff: docs/sparkr.md ---
@@ -450,6 +450,48 @@ print(model.summaries)
 {% endhighlight %}
 
 
+### Eager execution
+
+If the eager execution is enabled, the data will be returned to R client 
immediately when the `SparkDataFrame` is created. Eager execution can be 
enabled by setting the configuration property 
`spark.sql.repl.eagerEval.enabled` to `true` when the `SparkSession` is started 
up.
+
+Maximum number of rows and maximum number of characters per column of data 
to display can be controlled by `spark.sql.repl.eagerEval.maxNumRows` and 
`spark.sql.repl.eagerEval.truncate` configuration properties, respectively.
--- End diff --

Done


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

2018-09-24 Thread adrian555
Github user adrian555 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22455#discussion_r219938566
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -244,11 +246,25 @@ setMethod("showDF",
 #' @note show(SparkDataFrame) since 1.4.0
 setMethod("show", "SparkDataFrame",
   function(object) {
-cols <- lapply(dtypes(object), function(l) {
-  paste(l, collapse = ":")
-})
-s <- paste(cols, collapse = ", ")
-cat(paste(class(object), "[", s, "]\n", sep = ""))
+if (identical(sparkR.conf("spark.sql.repl.eagerEval.enabled", 
"false")[[1]], "true")) {
--- End diff --

Retrieving all conf through `sparkR.conf()` does not provide a default 
value for each conf, so extra condition check is needed. Ideally, I believe we 
should have cached all configuration for the session inside the sparkr env to 
avoid backend calls at all.

Anyway, I made the suggested change.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

2018-09-24 Thread adrian555
Github user adrian555 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22455#discussion_r219937585
  
--- Diff: R/pkg/tests/fulltests/test_eager_execution.R ---
@@ -0,0 +1,61 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+library(testthat)
+
+context("Show SparkDataFrame when eager execution is enabled.")
+
+test_that("eager execution is not enabled", {
+  # Start Spark session without eager execution enabled
+  sparkSession <- if (windows_with_hadoop()) {
+sparkR.session(master = sparkRTestMaster)
+  } else {
+sparkR.session(master = sparkRTestMaster, enableHiveSupport = FALSE)
+  }
+  
+  df <- createDataFrame(faithful)
+  expect_is(df, "SparkDataFrame")
+  expected <- "eruptions:double, waiting:double"
+  expect_output(show(df), expected)
+  
+  # Stop Spark session
+  sparkR.session.stop()
+})
+
+test_that("eager execution is enabled", {
+  # Start Spark session with eager execution enabled
+  sparkSession <- if (windows_with_hadoop()) {
+sparkR.session(master = sparkRTestMaster,
+   sparkConfig = list(spark.sql.repl.eagerEval.enabled = 
"true",
+  spark.sql.repl.eagerEval.maxNumRows 
= as.integer(10)))
+  } else {
+sparkR.session(master = sparkRTestMaster, enableHiveSupport = FALSE, 
+   sparkConfig = list(spark.sql.repl.eagerEval.enabled = 
"true",
+  spark.sql.repl.eagerEval.maxNumRows 
= as.integer(10)))
--- End diff --

Done


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

2018-09-24 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22455#discussion_r219897812
  
--- Diff: docs/sparkr.md ---
@@ -450,6 +450,48 @@ print(model.summaries)
 {% endhighlight %}
 
 
+### Eager execution
+
+If the eager execution is enabled, the data will be returned to R client 
immediately when the `SparkDataFrame` is created. Eager execution can be 
enabled by setting the configuration property 
`spark.sql.repl.eagerEval.enabled` to `true` when the `SparkSession` is started 
up.
+
+Maximum number of rows and maximum number of characters per column of data 
to display can be controlled by `spark.sql.repl.eagerEval.maxNumRows` and 
`spark.sql.repl.eagerEval.truncate` configuration properties, respectively.
--- End diff --

Let's note the default values.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

2018-09-23 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22455#discussion_r219689581
  
--- Diff: R/pkg/tests/fulltests/test_eager_execution.R ---
@@ -0,0 +1,61 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+library(testthat)
+
+context("Show SparkDataFrame when eager execution is enabled.")
+
+test_that("eager execution is not enabled", {
+  # Start Spark session without eager execution enabled
+  sparkSession <- if (windows_with_hadoop()) {
+sparkR.session(master = sparkRTestMaster)
+  } else {
+sparkR.session(master = sparkRTestMaster, enableHiveSupport = FALSE)
+  }
+  
+  df <- createDataFrame(faithful)
+  expect_is(df, "SparkDataFrame")
+  expected <- "eruptions:double, waiting:double"
+  expect_output(show(df), expected)
+  
+  # Stop Spark session
+  sparkR.session.stop()
+})
+
+test_that("eager execution is enabled", {
+  # Start Spark session with eager execution enabled
+  sparkSession <- if (windows_with_hadoop()) {
+sparkR.session(master = sparkRTestMaster,
+   sparkConfig = list(spark.sql.repl.eagerEval.enabled = 
"true",
+  spark.sql.repl.eagerEval.maxNumRows 
= as.integer(10)))
+  } else {
+sparkR.session(master = sparkRTestMaster, enableHiveSupport = FALSE, 
+   sparkConfig = list(spark.sql.repl.eagerEval.enabled = 
"true",
+  spark.sql.repl.eagerEval.maxNumRows 
= as.integer(10)))
--- End diff --

I'd set sparkConfig to a list before this and simply reuse it for both 
cases instead of having them duplicated


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

2018-09-23 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22455#discussion_r219689697
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -244,11 +246,25 @@ setMethod("showDF",
 #' @note show(SparkDataFrame) since 1.4.0
 setMethod("show", "SparkDataFrame",
   function(object) {
-cols <- lapply(dtypes(object), function(l) {
-  paste(l, collapse = ":")
-})
-s <- paste(cols, collapse = ", ")
-cat(paste(class(object), "[", s, "]\n", sep = ""))
+if (identical(sparkR.conf("spark.sql.repl.eagerEval.enabled", 
"false")[[1]], "true")) {
+  argsList <- list()
+  argsList$x <- object
+  numRows <- 
as.numeric(sparkR.conf("spark.sql.repl.eagerEval.maxNumRows", "0")[[1]])
+  if (numRows > 0) {
+argsList$numRows <- numRows
+  }
+  truncate <- 
as.numeric(sparkR.conf("spark.sql.repl.eagerEval.truncate", "0")[[1]])
--- End diff --

have you tested with the case when only `spark.sql.repl.eagerEval.truncate` 
is set, and `spark.sql.repl.eagerEval.maxNumRows` is not?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

2018-09-23 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22455#discussion_r219689694
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -244,11 +246,25 @@ setMethod("showDF",
 #' @note show(SparkDataFrame) since 1.4.0
 setMethod("show", "SparkDataFrame",
   function(object) {
-cols <- lapply(dtypes(object), function(l) {
-  paste(l, collapse = ":")
-})
-s <- paste(cols, collapse = ", ")
-cat(paste(class(object), "[", s, "]\n", sep = ""))
+if (identical(sparkR.conf("spark.sql.repl.eagerEval.enabled", 
"false")[[1]], "true")) {
--- End diff --

instead of calling sparkR.conf three times, I'm wondering if it 
cleaner/safer to get a copy like in here:

https://github.com/apache/spark/blob/95b177c8f0862c6965a7c3cd76b3935c975adee9/R/pkg/tests/fulltests/test_sparkSQL.R#L3538



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

2018-09-21 Thread adrian555
Github user adrian555 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22455#discussion_r219576011
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -226,7 +226,8 @@ setMethod("showDF",
 
 #' show
 #'
-#' Print class and type information of a Spark object.
+#' If eager evaluation is enabled and the Spark object is a 
SparkDataFrame, return the data of
--- End diff --

Done


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

2018-09-21 Thread adrian555
Github user adrian555 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22455#discussion_r219575988
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -244,11 +245,15 @@ setMethod("showDF",
 #' @note show(SparkDataFrame) since 1.4.0
 setMethod("show", "SparkDataFrame",
   function(object) {
-cols <- lapply(dtypes(object), function(l) {
-  paste(l, collapse = ":")
-})
-s <- paste(cols, collapse = ", ")
-cat(paste(class(object), "[", s, "]\n", sep = ""))
+if (identical(sparkR.conf("spark.sql.repl.eagerEval.enabled", 
"false")[[1]], "true")) {
--- End diff --

Done


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

2018-09-21 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/22455#discussion_r219410274
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -226,7 +226,8 @@ setMethod("showDF",
 
 #' show
 #'
-#' Print class and type information of a Spark object.
+#' If eager evaluation is enabled and the Spark object is a 
SparkDataFrame, return the data of
--- End diff --

return the data of the SparkDataFrame object -> evaluate the SparkDataFrame 
and print top rows of the SparkDataFrame


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

2018-09-21 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/22455#discussion_r219404319
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -244,11 +245,15 @@ setMethod("showDF",
 #' @note show(SparkDataFrame) since 1.4.0
 setMethod("show", "SparkDataFrame",
   function(object) {
-cols <- lapply(dtypes(object), function(l) {
-  paste(l, collapse = ":")
-})
-s <- paste(cols, collapse = ", ")
-cat(paste(class(object), "[", s, "]\n", sep = ""))
+if (identical(sparkR.conf("spark.sql.repl.eagerEval.enabled", 
"false")[[1]], "true")) {
--- End diff --

I see. The document generated looks like 
https://spark.apache.org/docs/latest/api/R/index.html.

Then rewriting the description is good and clear.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

2018-09-21 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22455#discussion_r219402870
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -244,11 +245,15 @@ setMethod("showDF",
 #' @note show(SparkDataFrame) since 1.4.0
 setMethod("show", "SparkDataFrame",
   function(object) {
-cols <- lapply(dtypes(object), function(l) {
-  paste(l, collapse = ":")
-})
-s <- paste(cols, collapse = ", ")
-cat(paste(class(object), "[", s, "]\n", sep = ""))
+if (identical(sparkR.conf("spark.sql.repl.eagerEval.enabled", 
"false")[[1]], "true")) {
--- End diff --

respecting `spark.sql.repl.eagerEval.maxNumRows` somewhat makes sense. but 
instead of changing `showDF` which has other cases beyond eagerEval, we could 
change where `showDF` is called by `show`, and pass the max rows.

here 
https://github.com/apache/spark/pull/22455/files#diff-508641a8bd6c6b59f3e77c80cdcfa6a9R249


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

2018-09-21 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22455#discussion_r219402281
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -244,11 +245,15 @@ setMethod("showDF",
 #' @note show(SparkDataFrame) since 1.4.0
 setMethod("show", "SparkDataFrame",
   function(object) {
-cols <- lapply(dtypes(object), function(l) {
-  paste(l, collapse = ":")
-})
-s <- paste(cols, collapse = ", ")
-cat(paste(class(object), "[", s, "]\n", sep = ""))
+if (identical(sparkR.conf("spark.sql.repl.eagerEval.enabled", 
"false")[[1]], "true")) {
--- End diff --

^ that - it's two different "overloads" for two different "class"


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

2018-09-20 Thread dilipbiswal
Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/22455#discussion_r219388335
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -244,11 +245,15 @@ setMethod("showDF",
 #' @note show(SparkDataFrame) since 1.4.0
 setMethod("show", "SparkDataFrame",
   function(object) {
-cols <- lapply(dtypes(object), function(l) {
-  paste(l, collapse = ":")
-})
-s <- paste(cols, collapse = ", ")
-cat(paste(class(object), "[", s, "]\n", sep = ""))
+if (identical(sparkR.conf("spark.sql.repl.eagerEval.enabled", 
"false")[[1]], "true")) {
--- End diff --

@adrian555 Thanks for the explanation. 
> However, my second point is that I don't think these two configs matter 
much or that important/necessary. Since the eager execution is just to show a 
snippet data of the SparkDataFrame, our default numRows = 20 and truncate = 
TRUE are good enough iMO. If users want to see more or less number of rows, 
they should call showDF().

So i just wanted to make sure if its possible to have parity with how it 
works for python. It seems to me that in python, we just get the two configs 
and call the showstring method.

> And if we think that showDF() can ignore the eager execution setting and 
still want the show() to observe eager execution config, we can certainly just 
grab the maxNumRows and truncate setting and pass to showDF() call.

What will happen if we grab these config in show() when eager execution is 
enabled and then call showDF() by passing these parameters ? 



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

2018-09-20 Thread adrian555
Github user adrian555 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22455#discussion_r219384936
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -244,11 +245,15 @@ setMethod("showDF",
 #' @note show(SparkDataFrame) since 1.4.0
 setMethod("show", "SparkDataFrame",
   function(object) {
-cols <- lapply(dtypes(object), function(l) {
-  paste(l, collapse = ":")
-})
-s <- paste(cols, collapse = ", ")
-cat(paste(class(object), "[", s, "]\n", sep = ""))
+if (identical(sparkR.conf("spark.sql.repl.eagerEval.enabled", 
"false")[[1]], "true")) {
--- End diff --

Had thought about this. First, I consider it is not in the scope of this 
jira, because I think they are conflicting with the current `showDF()` 
behavior. 

Some details: the `showDF()` already takes `numRows` and `truncate` 
arguments. So if we are going to respect those two as well, we have to decide 
what behavior is best suitable for `showDF()`. For example, whether `showDF()` 
should just ignore the eager execution, or it picks the `maxNumRows` and 
`truncate` set through eager execution like following:

```
setMethod("showDF",
  signature(x = "SparkDataFrame"),
  function(x, numRows = 20, truncate = TRUE, vertical = FALSE) {
eagerNumRows <- 
as.numeric(sparkR.conf("spark.sql.repl.eagerEval.maxNumRows", "0")[[1]])
numRows <- ifelse(eagerNumRows == 0, numRows, eagerNumRows)
eagerTruncate <- 
as.numeric(sparkR.conf("spark.sql.repl.eagerEval.truncate", "0")[[1]])
truncate <- ifelse(eagerTruncate == 0, truncate, eagerTruncate)
if (is.logical(truncate) && truncate) {
  s <- callJMethod(x@sdf, "showString", numToInt(numRows), 
numToInt(20), vertical)
} else {
  truncate2 <- as.numeric(truncate)
  s <- callJMethod(x@sdf, "showString", numToInt(numRows), 
numToInt(truncate2),
   vertical)
}
cat(s)
  })
```

And if we think that `showDF()` can ignore the eager execution setting and 
still want the `show()` to observe eager execution config, we can certainly 
just grab the `maxNumRows` and `truncate` setting and pass to `showDF() call.

However, my second point is that I don't think these two configs matter 
much or that important/necessary. Since the eager execution is just to show a 
snippet data of the SparkDataFrame, our default `numRows = 20` and `truncate = 
TRUE` are good enough iMO. If users want to see more or less number of rows, 
they should call `showDF()`.

@felixcheung, your thought?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

2018-09-20 Thread dilipbiswal
Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/22455#discussion_r219356224
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -244,11 +245,15 @@ setMethod("showDF",
 #' @note show(SparkDataFrame) since 1.4.0
 setMethod("show", "SparkDataFrame",
   function(object) {
-cols <- lapply(dtypes(object), function(l) {
-  paste(l, collapse = ":")
-})
-s <- paste(cols, collapse = ", ")
-cat(paste(class(object), "[", s, "]\n", sep = ""))
+if (identical(sparkR.conf("spark.sql.repl.eagerEval.enabled", 
"false")[[1]], "true")) {
--- End diff --

@adrian555 Please correct me on this one. Should we also respect 
`spark.sql.repl.eagerEval.maxNumRows` and `spark.sql.repl.eagerEval.truncate` ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

2018-09-20 Thread adrian555
Github user adrian555 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22455#discussion_r219355827
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -244,11 +245,15 @@ setMethod("showDF",
 #' @note show(SparkDataFrame) since 1.4.0
 setMethod("show", "SparkDataFrame",
   function(object) {
-cols <- lapply(dtypes(object), function(l) {
-  paste(l, collapse = ":")
-})
-s <- paste(cols, collapse = ", ")
-cat(paste(class(object), "[", s, "]\n", sep = ""))
+if (identical(sparkR.conf("spark.sql.repl.eagerEval.enabled", 
"false")[[1]], "true")) {
--- End diff --

I might have misunderstood your previous question. The `@param` does not 
need to change as it  is common to `SparkDataFrame` and `Column` etc for 
`show()` method. Same applies to description, so I rewrote the description. 
Please have a look.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

2018-09-20 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/22455#discussion_r219351707
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -244,11 +245,15 @@ setMethod("showDF",
 #' @note show(SparkDataFrame) since 1.4.0
 setMethod("show", "SparkDataFrame",
   function(object) {
-cols <- lapply(dtypes(object), function(l) {
-  paste(l, collapse = ":")
-})
-s <- paste(cols, collapse = ", ")
-cat(paste(class(object), "[", s, "]\n", sep = ""))
+if (identical(sparkR.conf("spark.sql.repl.eagerEval.enabled", 
"false")[[1]], "true")) {
--- End diff --

Can you also correct the param doc?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

2018-09-20 Thread adrian555
Github user adrian555 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22455#discussion_r219351194
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -244,11 +245,15 @@ setMethod("showDF",
 #' @note show(SparkDataFrame) since 1.4.0
 setMethod("show", "SparkDataFrame",
   function(object) {
-cols <- lapply(dtypes(object), function(l) {
-  paste(l, collapse = ":")
-})
-s <- paste(cols, collapse = ", ")
-cat(paste(class(object), "[", s, "]\n", sep = ""))
+if (identical(sparkR.conf("spark.sql.repl.eagerEval.enabled", 
"false")[[1]], "true")) {
--- End diff --

I think the `param` is not correct... the signature of `show()` has 
`SparkDataFrame` so in this one it won't work for `Column`. In column.R file, 
it has its own `show()` implementation.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

2018-09-20 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/22455#discussion_r219348795
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -244,11 +245,15 @@ setMethod("showDF",
 #' @note show(SparkDataFrame) since 1.4.0
 setMethod("show", "SparkDataFrame",
   function(object) {
-cols <- lapply(dtypes(object), function(l) {
-  paste(l, collapse = ":")
-})
-s <- paste(cols, collapse = ", ")
-cat(paste(class(object), "[", s, "]\n", sep = ""))
+if (identical(sparkR.conf("spark.sql.repl.eagerEval.enabled", 
"false")[[1]], "true")) {
--- End diff --

Or the param doc is not correct?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

2018-09-20 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/22455#discussion_r219347786
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -244,11 +245,15 @@ setMethod("showDF",
 #' @note show(SparkDataFrame) since 1.4.0
 setMethod("show", "SparkDataFrame",
   function(object) {
-cols <- lapply(dtypes(object), function(l) {
-  paste(l, collapse = ":")
-})
-s <- paste(cols, collapse = ", ")
-cat(paste(class(object), "[", s, "]\n", sep = ""))
+if (identical(sparkR.conf("spark.sql.repl.eagerEval.enabled", 
"false")[[1]], "true")) {
--- End diff --

`show` also works on other Spark object like `Column`, I think we only need 
to do `showDF` for `SparkDataFrame`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

2018-09-20 Thread adrian555
Github user adrian555 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22455#discussion_r219331649
  
--- Diff: R/pkg/tests/fulltests/test_eager_execution.R ---
@@ -0,0 +1,58 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+library(testthat)
+
+context("Show SparkDataFrame when eager execution is enabled.")
+
+test_that("eager execution is not enabled", {
+  # Start Spark session without eager execution enabled
+  sparkSession <- if (windows_with_hadoop()) {
+sparkR.session(master = sparkRTestMaster)
+  } else {
+sparkR.session(master = sparkRTestMaster, enableHiveSupport = FALSE)
+  }
+  
+  df <- createDataFrame(faithful)
+  expect_is(df, "SparkDataFrame")
+  expected <- "eruptions:double, waiting:double"
+  expect_output(show(df), expected)
+  
+  # Stop Spark session
+  sparkR.session.stop()
+})
+
+test_that("eager execution is enabled", {
+  # Start Spark session without eager execution enabled
--- End diff --

Done


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

2018-09-20 Thread dilipbiswal
Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/22455#discussion_r219330082
  
--- Diff: R/pkg/tests/fulltests/test_eager_execution.R ---
@@ -0,0 +1,58 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+library(testthat)
+
+context("Show SparkDataFrame when eager execution is enabled.")
+
+test_that("eager execution is not enabled", {
+  # Start Spark session without eager execution enabled
+  sparkSession <- if (windows_with_hadoop()) {
+sparkR.session(master = sparkRTestMaster)
+  } else {
+sparkR.session(master = sparkRTestMaster, enableHiveSupport = FALSE)
+  }
+  
+  df <- createDataFrame(faithful)
+  expect_is(df, "SparkDataFrame")
+  expected <- "eruptions:double, waiting:double"
+  expect_output(show(df), expected)
+  
+  # Stop Spark session
+  sparkR.session.stop()
+})
+
+test_that("eager execution is enabled", {
+  # Start Spark session without eager execution enabled
--- End diff --

Here we are testing with eager execution enabled, right ? Do we need to fix 
the comment here ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

2018-09-20 Thread adrian555
Github user adrian555 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22455#discussion_r219315429
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -244,11 +244,15 @@ setMethod("showDF",
 #' @note show(SparkDataFrame) since 1.4.0
 setMethod("show", "SparkDataFrame",
   function(object) {
-cols <- lapply(dtypes(object), function(l) {
-  paste(l, collapse = ":")
-})
-s <- paste(cols, collapse = ", ")
-cat(paste(class(object), "[", s, "]\n", sep = ""))
+if (identical(sparkR.conf("spark.sql.repl.eagerEval.enabled", 
"false")[[1]], "true")) {
--- End diff --

Done


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

2018-09-20 Thread adrian555
Github user adrian555 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22455#discussion_r219315410
  
--- Diff: R/pkg/tests/fulltests/test_eager_execution.R ---
@@ -0,0 +1,58 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+library(testthat)
+
+context("Show SparkDataFrame when eager execution is enabled.")
+
+test_that("eager execution is not enabled", {
+  # Start Spark session without eager execution enabled
+  sparkSession <- if (windows_with_hadoop()) {
+sparkR.session(master = sparkRTestMaster)
+  } else {
+sparkR.session(master = sparkRTestMaster, enableHiveSupport = FALSE)
+  }
+  
+  df <- suppressWarnings(createDataFrame(iris))
+  expect_is(df, "SparkDataFrame")
+  expected <- "Sepal_Length:double, Sepal_Width:double, 
Petal_Length:double, Petal_Width:double, Species:string"
+  expect_output(show(df), expected)
+  
+  # Stop Spark session
+  sparkR.session.stop()
+})
+
+test_that("eager execution is enabled", {
+  # Start Spark session without eager execution enabled
+  sparkSession <- if (windows_with_hadoop()) {
+sparkR.session(master = sparkRTestMaster,
+   sparkConfig = list(spark.sql.repl.eagerEval.enabled = 
"true"))
+  } else {
+sparkR.session(master = sparkRTestMaster, enableHiveSupport = FALSE, 
+   sparkConfig = list(spark.sql.repl.eagerEval.enabled = 
"true"))
+  }
+  
+  df <- suppressWarnings(createDataFrame(iris))
--- End diff --

Done


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

2018-09-20 Thread adrian555
Github user adrian555 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22455#discussion_r219315391
  
--- Diff: R/pkg/tests/fulltests/test_eager_execution.R ---
@@ -0,0 +1,58 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+library(testthat)
+
+context("Show SparkDataFrame when eager execution is enabled.")
+
+test_that("eager execution is not enabled", {
+  # Start Spark session without eager execution enabled
+  sparkSession <- if (windows_with_hadoop()) {
+sparkR.session(master = sparkRTestMaster)
+  } else {
+sparkR.session(master = sparkRTestMaster, enableHiveSupport = FALSE)
+  }
+  
+  df <- suppressWarnings(createDataFrame(iris))
--- End diff --

Done


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

2018-09-20 Thread adrian555
Github user adrian555 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22455#discussion_r219315320
  
--- Diff: docs/sparkr.md ---
@@ -450,6 +450,42 @@ print(model.summaries)
 {% endhighlight %}
 
 
+### Eager execution
+
+If the eager execution is enabled, the data will be returned to R client 
immediately when the `SparkDataFrame` is created. Eager execution can be 
enabled by setting the configuration property 
`spark.sql.repl.eagerEval.enabled` to `true` when the `SparkSession` is started 
up.
+
+
+{% highlight r %}
+
+# Start up spark session with eager execution enabled
+sparkR.session(master = "local[*]", sparkConfig = 
list(spark.sql.repl.eagerEval.enabled = "true"))
+
+df <- createDataFrame(faithful)
+
+# Instead of displaying the SparkDataFrame class, displays the data 
returned
+df
+
+##+-+---+  
   
+##|eruptions|waiting|
+##+-+---+
+##|  3.6|   79.0|
+##|  1.8|   54.0|
+##|3.333|   74.0|
+##|2.283|   62.0|
+##|4.533|   85.0|
+##|2.883|   55.0|
+##|  4.7|   88.0|
+##|  3.6|   85.0|
+##| 1.95|   51.0|
+##| 4.35|   85.0|
+##+-+---+
+##only showing top 10 rows
+
+{% endhighlight %} 
+
+
+Note that the `SparkSession` created by `sparkR` shell does not have eager 
execution enabled. You can stop the current session and start up a new session 
like above to enable.
--- End diff --

Changed to this: 
```
Note that to enable eager execution through `sparkR` command, add 
`spark.sql.repl.eagerEval.enabled=true` configuration property to the `--conf` 
option.
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

2018-09-20 Thread adrian555
Github user adrian555 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22455#discussion_r219315374
  
--- Diff: R/pkg/tests/fulltests/test_eager_execution.R ---
@@ -0,0 +1,58 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+library(testthat)
+
+context("Show SparkDataFrame when eager execution is enabled.")
+
+test_that("eager execution is not enabled", {
--- End diff --

Since this test will terminate the spark session and restart, I think it is 
better to be self contained.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

2018-09-20 Thread adrian555
Github user adrian555 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22455#discussion_r219315297
  
--- Diff: docs/sparkr.md ---
@@ -450,6 +450,42 @@ print(model.summaries)
 {% endhighlight %}
 
 
+### Eager execution
+
+If the eager execution is enabled, the data will be returned to R client 
immediately when the `SparkDataFrame` is created. Eager execution can be 
enabled by setting the configuration property 
`spark.sql.repl.eagerEval.enabled` to `true` when the `SparkSession` is started 
up.
+
+
+{% highlight r %}
+
+# Start up spark session with eager execution enabled
+sparkR.session(master = "local[*]", sparkConfig = 
list(spark.sql.repl.eagerEval.enabled = "true"))
+
+df <- createDataFrame(faithful)
+
+# Instead of displaying the SparkDataFrame class, displays the data 
returned
--- End diff --

Done


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

2018-09-20 Thread adrian555
Github user adrian555 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22455#discussion_r219315251
  
--- Diff: docs/sparkr.md ---
@@ -450,6 +450,42 @@ print(model.summaries)
 {% endhighlight %}
 
 
+### Eager execution
--- End diff --

This will be the same level as "Applying User-Defined Function" where its 
parent topic is "SparkDataFrame Operations". So the heading is with three `#`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

2018-09-20 Thread adrian555
Github user adrian555 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22455#discussion_r219315277
  
--- Diff: docs/sparkr.md ---
@@ -450,6 +450,42 @@ print(model.summaries)
 {% endhighlight %}
 
 
+### Eager execution
+
+If the eager execution is enabled, the data will be returned to R client 
immediately when the `SparkDataFrame` is created. Eager execution can be 
enabled by setting the configuration property 
`spark.sql.repl.eagerEval.enabled` to `true` when the `SparkSession` is started 
up.
+
+
+{% highlight r %}
+
+# Start up spark session with eager execution enabled
+sparkR.session(master = "local[*]", sparkConfig = 
list(spark.sql.repl.eagerEval.enabled = "true"))
+
+df <- createDataFrame(faithful)
--- End diff --

Done


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

2018-09-19 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22455#discussion_r219030775
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -244,11 +244,15 @@ setMethod("showDF",
 #' @note show(SparkDataFrame) since 1.4.0
 setMethod("show", "SparkDataFrame",
   function(object) {
-cols <- lapply(dtypes(object), function(l) {
-  paste(l, collapse = ":")
-})
-s <- paste(cols, collapse = ", ")
-cat(paste(class(object), "[", s, "]\n", sep = ""))
+if (identical(sparkR.conf("spark.sql.repl.eagerEval.enabled", 
"false")[[1]], "true")) {
--- End diff --

also not sure if it's done for python, consider adding to the doc above 
(L229) how it behaves with eagerEval


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

2018-09-19 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22455#discussion_r219030350
  
--- Diff: docs/sparkr.md ---
@@ -450,6 +450,42 @@ print(model.summaries)
 {% endhighlight %}
 
 
+### Eager execution
+
+If the eager execution is enabled, the data will be returned to R client 
immediately when the `SparkDataFrame` is created. Eager execution can be 
enabled by setting the configuration property 
`spark.sql.repl.eagerEval.enabled` to `true` when the `SparkSession` is started 
up.
+
+
+{% highlight r %}
+
+# Start up spark session with eager execution enabled
+sparkR.session(master = "local[*]", sparkConfig = 
list(spark.sql.repl.eagerEval.enabled = "true"))
+
+df <- createDataFrame(faithful)
+
+# Instead of displaying the SparkDataFrame class, displays the data 
returned
+df
+
+##+-+---+  
   
+##|eruptions|waiting|
+##+-+---+
+##|  3.6|   79.0|
+##|  1.8|   54.0|
+##|3.333|   74.0|
+##|2.283|   62.0|
+##|4.533|   85.0|
+##|2.883|   55.0|
+##|  4.7|   88.0|
+##|  3.6|   85.0|
+##| 1.95|   51.0|
+##| 4.35|   85.0|
+##+-+---+
+##only showing top 10 rows
+
+{% endhighlight %} 
+
+
+Note that the `SparkSession` created by `sparkR` shell does not have eager 
execution enabled. You can stop the current session and start up a new session 
like above to enable.
--- End diff --

actually I think the suggestion should be to set that in the `sparkR` 
command line as spark conf?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

2018-09-19 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22455#discussion_r219030512
  
--- Diff: R/pkg/tests/fulltests/test_eager_execution.R ---
@@ -0,0 +1,58 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+library(testthat)
+
+context("Show SparkDataFrame when eager execution is enabled.")
+
+test_that("eager execution is not enabled", {
+  # Start Spark session without eager execution enabled
+  sparkSession <- if (windows_with_hadoop()) {
+sparkR.session(master = sparkRTestMaster)
+  } else {
+sparkR.session(master = sparkRTestMaster, enableHiveSupport = FALSE)
+  }
+  
+  df <- suppressWarnings(createDataFrame(iris))
--- End diff --

use a different dataset that does not require `suppressWarnings`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

2018-09-19 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22455#discussion_r219030211
  
--- Diff: docs/sparkr.md ---
@@ -450,6 +450,42 @@ print(model.summaries)
 {% endhighlight %}
 
 
+### Eager execution
+
+If the eager execution is enabled, the data will be returned to R client 
immediately when the `SparkDataFrame` is created. Eager execution can be 
enabled by setting the configuration property 
`spark.sql.repl.eagerEval.enabled` to `true` when the `SparkSession` is started 
up.
+
+
+{% highlight r %}
+
+# Start up spark session with eager execution enabled
+sparkR.session(master = "local[*]", sparkConfig = 
list(spark.sql.repl.eagerEval.enabled = "true"))
+
+df <- createDataFrame(faithful)
+
+# Instead of displaying the SparkDataFrame class, displays the data 
returned
--- End diff --

we could also start here by saying "similar to R data.frame`...


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

2018-09-19 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22455#discussion_r219030277
  
--- Diff: docs/sparkr.md ---
@@ -450,6 +450,42 @@ print(model.summaries)
 {% endhighlight %}
 
 
+### Eager execution
+
+If the eager execution is enabled, the data will be returned to R client 
immediately when the `SparkDataFrame` is created. Eager execution can be 
enabled by setting the configuration property 
`spark.sql.repl.eagerEval.enabled` to `true` when the `SparkSession` is started 
up.
+
+
+{% highlight r %}
+
+# Start up spark session with eager execution enabled
+sparkR.session(master = "local[*]", sparkConfig = 
list(spark.sql.repl.eagerEval.enabled = "true"))
+
+df <- createDataFrame(faithful)
+
+# Instead of displaying the SparkDataFrame class, displays the data 
returned
+df
+
+##+-+---+  
   
+##|eruptions|waiting|
+##+-+---+
+##|  3.6|   79.0|
+##|  1.8|   54.0|
+##|3.333|   74.0|
+##|2.283|   62.0|
+##|4.533|   85.0|
+##|2.883|   55.0|
+##|  4.7|   88.0|
+##|  3.6|   85.0|
+##| 1.95|   51.0|
+##| 4.35|   85.0|
+##+-+---+
+##only showing top 10 rows
+
+{% endhighlight %} 
+
+
+Note that the `SparkSession` created by `sparkR` shell does not have eager 
execution enabled. You can stop the current session and start up a new session 
like above to enable.
--- End diff --

change to `Note that the `SparkSession` created by `sparkR` shell by 
default does not `


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

2018-09-19 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22455#discussion_r219029847
  
--- Diff: docs/sparkr.md ---
@@ -450,6 +450,42 @@ print(model.summaries)
 {% endhighlight %}
 
 
+### Eager execution
--- End diff --

should be `` I think?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

2018-09-19 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22455#discussion_r219030474
  
--- Diff: R/pkg/tests/fulltests/test_eager_execution.R ---
@@ -0,0 +1,58 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+library(testthat)
+
+context("Show SparkDataFrame when eager execution is enabled.")
+
+test_that("eager execution is not enabled", {
--- End diff --

I'm neutral, should these tests be in test_sparkSQL.R? it takes longer to 
run with many test files


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

2018-09-19 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22455#discussion_r219030085
  
--- Diff: docs/sparkr.md ---
@@ -450,6 +450,42 @@ print(model.summaries)
 {% endhighlight %}
 
 
+### Eager execution
+
+If the eager execution is enabled, the data will be returned to R client 
immediately when the `SparkDataFrame` is created. Eager execution can be 
enabled by setting the configuration property 
`spark.sql.repl.eagerEval.enabled` to `true` when the `SparkSession` is started 
up.
+
+
+{% highlight r %}
+
+# Start up spark session with eager execution enabled
+sparkR.session(master = "local[*]", sparkConfig = 
list(spark.sql.repl.eagerEval.enabled = "true"))
+
+df <- createDataFrame(faithful)
--- End diff --

perhaps a more complete example - like `summarize(groupBy(df, df$waiting), 
count = n(df$waiting))`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

2018-09-19 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22455#discussion_r219030537
  
--- Diff: R/pkg/tests/fulltests/test_eager_execution.R ---
@@ -0,0 +1,58 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+library(testthat)
+
+context("Show SparkDataFrame when eager execution is enabled.")
+
+test_that("eager execution is not enabled", {
+  # Start Spark session without eager execution enabled
+  sparkSession <- if (windows_with_hadoop()) {
+sparkR.session(master = sparkRTestMaster)
+  } else {
+sparkR.session(master = sparkRTestMaster, enableHiveSupport = FALSE)
+  }
+  
+  df <- suppressWarnings(createDataFrame(iris))
+  expect_is(df, "SparkDataFrame")
+  expected <- "Sepal_Length:double, Sepal_Width:double, 
Petal_Length:double, Petal_Width:double, Species:string"
+  expect_output(show(df), expected)
+  
+  # Stop Spark session
+  sparkR.session.stop()
+})
+
+test_that("eager execution is enabled", {
+  # Start Spark session without eager execution enabled
+  sparkSession <- if (windows_with_hadoop()) {
+sparkR.session(master = sparkRTestMaster,
+   sparkConfig = list(spark.sql.repl.eagerEval.enabled = 
"true"))
+  } else {
+sparkR.session(master = sparkRTestMaster, enableHiveSupport = FALSE, 
+   sparkConfig = list(spark.sql.repl.eagerEval.enabled = 
"true"))
+  }
+  
+  df <- suppressWarnings(createDataFrame(iris))
--- End diff --

ditto


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22455: [SPARK-24572][SparkR] "eager execution" for R she...

2018-09-18 Thread adrian555
GitHub user adrian555 opened a pull request:

https://github.com/apache/spark/pull/22455

[SPARK-24572][SparkR] "eager execution" for R shell, IDE

## What changes were proposed in this pull request?

Check the `spark.sql.repl.eagerEval.enabled` configuration property in 
SparkDataFrame `show()` method. If the `SparkSession` has eager execution 
enabled, the data will be returned to the R client when the data frame is 
created. So instead of seeing this
```
> df <- createDataFrame(faithful)
> df
SparkDataFrame[eruptions:double, waiting:double]
```
you will see
```
> df <- createDataFrame(faithful)
> df
+-+---+ 

|eruptions|waiting|
+-+---+
|  3.6|   79.0|
|  1.8|   54.0|
|3.333|   74.0|
|2.283|   62.0|
|4.533|   85.0|
|2.883|   55.0|
|  4.7|   88.0|
|  3.6|   85.0|
| 1.95|   51.0|
| 4.35|   85.0|
|1.833|   54.0|
|3.917|   84.0|
|  4.2|   78.0|
| 1.75|   47.0|
|  4.7|   83.0|
|2.167|   52.0|
| 1.75|   62.0|
|  4.8|   84.0|
|  1.6|   52.0|
| 4.25|   79.0|
+-+---+
only showing top 20 rows
```

## How was this patch tested?
Manual tests as well as unit tests (one new test case is added).


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/adrian555/spark eager_execution

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22455.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22455


commit b3e014c4e8050e8a0b3da190bb327347f9136b7e
Author: adrian555 
Date:   2018-09-18T20:48:56Z

support eager execution

commit d0be3a8582de548862a78cfa5ffb58df933efa8d
Author: adrian555 
Date:   2018-09-18T20:51:43Z

Merge remote-tracking branch 'remotes/upstream/master' into eager_execution

commit cd8a7041c6eecc59d22db72f4f2065ed9f06640a
Author: adrian555 
Date:   2018-09-18T21:09:48Z

add newline




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org