[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/22455 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...
GitHub user adrian555 reopened a pull request: https://github.com/apache/spark/pull/22455 [SPARK-24572][SPARKR] "eager execution" for R shell, IDE ## What changes were proposed in this pull request? Check the `spark.sql.repl.eagerEval.enabled` configuration property in SparkDataFrame `show()` method. If the `SparkSession` has eager execution enabled, the data will be returned to the R client when the data frame is created. So instead of seeing this ``` > df <- createDataFrame(faithful) > df SparkDataFrame[eruptions:double, waiting:double] ``` you will see ``` > df <- createDataFrame(faithful) > df +-+---+ |eruptions|waiting| +-+---+ | 3.6| 79.0| | 1.8| 54.0| |3.333| 74.0| |2.283| 62.0| |4.533| 85.0| |2.883| 55.0| | 4.7| 88.0| | 3.6| 85.0| | 1.95| 51.0| | 4.35| 85.0| |1.833| 54.0| |3.917| 84.0| | 4.2| 78.0| | 1.75| 47.0| | 4.7| 83.0| |2.167| 52.0| | 1.75| 62.0| | 4.8| 84.0| | 1.6| 52.0| | 4.25| 79.0| +-+---+ only showing top 20 rows ``` ## How was this patch tested? Manual tests as well as unit tests (one new test case is added). You can merge this pull request into a Git repository by running: $ git pull https://github.com/adrian555/spark eager_execution Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22455.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22455 commit b3e014c4e8050e8a0b3da190bb327347f9136b7e Author: adrian555 Date: 2018-09-18T20:48:56Z support eager execution commit d0be3a8582de548862a78cfa5ffb58df933efa8d Author: adrian555 Date: 2018-09-18T20:51:43Z Merge remote-tracking branch 'remotes/upstream/master' into eager_execution commit cd8a7041c6eecc59d22db72f4f2065ed9f06640a Author: adrian555 Date: 2018-09-18T21:09:48Z add newline commit a89bf37e7fbfa089a39e0fb91cdd4a1bdd409b9f Author: adrian555 Date: 2018-09-20T20:58:02Z address review comment commit a083d64ea3210c367cc4d63ad51a1c6beab129e7 Author: adrian555 Date: 2018-09-20T22:03:30Z address review comment commit 7b121e65b99e177d8870b4e098797f1f1e86ce65 Author: adrian555 Date: 2018-09-21T00:02:31Z address review comment commit 4492b278ba5a4721d6a5dc836436191ad155dfc6 Author: adrian555 Date: 2018-09-21T17:36:33Z address review comment commit 100bac715ee3fc6baee960dfa7c9466baa032bcb Author: adrian555 Date: 2018-09-24T18:16:41Z address review comment commit 57eb00824b072b2a326810ff42edef2ca2626eb6 Author: adrian555 Date: 2018-09-25T17:51:34Z address review comment commit df582fc8d31c4e993a6e215fa0901a4728affc0e Author: adrian555 Date: 2018-09-26T23:06:03Z add sparkr.SparkDataFrame.base_show_func option commit d1562777924fd84f14655df9c7418575aeae7fbe Author: adrian555 Date: 2018-09-26T23:11:41Z Merge remote-tracking branch 'remotes/upstream/master' into eager_execution commit 44df922fb347211c9a962bf1771e2e0005634305 Author: adrian555 Date: 2018-10-03T20:27:05Z address review comment commit 7ce8f9454d222b42dc3969894147461eaa850b46 Author: adrian555 Date: 2018-10-18T17:13:32Z Merge remote-tracking branch 'remotes/upstream/master' into eager_execution commit 76767f9e5288d593efedda000c7fc01799d43085 Author: adrian555 Date: 2018-10-18T18:05:47Z address review comment commit b5e463f8babe75463054fe1224348bb707afc195 Author: adrian555 Date: 2018-10-23T17:52:17Z address review comment commit 5b39d737aea4b4a680e63e25f5df81993ced1c70 Author: adrian555 Date: 2018-10-23T17:53:17Z Merge remote-tracking branch 'remotes/upstream/master' into eager_execution --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...
Github user adrian555 closed the pull request at: https://github.com/apache/spark/pull/22455 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...
Github user adrian555 commented on a diff in the pull request: https://github.com/apache/spark/pull/22455#discussion_r227500368 --- Diff: R/pkg/tests/fulltests/test_eager_execution.R --- @@ -0,0 +1,52 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +library(testthat) + +context("Show SparkDataFrame when eager execution is enabled.") + +test_that("eager execution is not enabled", { + # Start Spark session without eager execution enabled + sparkR.session(master = sparkRTestMaster, enableHiveSupport = FALSE) + + df <- createDataFrame(faithful) + expect_is(df, "SparkDataFrame") + expected <- "eruptions:double, waiting:double" + expect_output(show(df), expected) + + # Stop Spark session + sparkR.session.stop() +}) + +test_that("eager execution is enabled", { + # Start Spark session with eager execution enabled + sparkConfig <- list(spark.sql.repl.eagerEval.enabled = "true", + spark.sql.repl.eagerEval.maxNumRows = as.integer(10)) --- End diff -- Done --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...
Github user adrian555 commented on a diff in the pull request: https://github.com/apache/spark/pull/22455#discussion_r227500418 --- Diff: R/pkg/tests/fulltests/test_sparkSQL.R --- @@ -3731,6 +3731,7 @@ test_that("catalog APIs, listTables, listColumns, listFunctions", { dropTempView("cars") }) + --- End diff -- Done --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...
Github user adrian555 commented on a diff in the pull request: https://github.com/apache/spark/pull/22455#discussion_r227500321 --- Diff: R/pkg/tests/fulltests/test_eager_execution.R --- @@ -0,0 +1,52 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +library(testthat) + +context("Show SparkDataFrame when eager execution is enabled.") --- End diff -- Done --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...
Github user adrian555 commented on a diff in the pull request: https://github.com/apache/spark/pull/22455#discussion_r227500277 --- Diff: R/pkg/tests/fulltests/test_eager_execution.R --- @@ -0,0 +1,52 @@ +# --- End diff -- Done. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...
Github user adrian555 commented on a diff in the pull request: https://github.com/apache/spark/pull/22455#discussion_r227500208 --- Diff: R/pkg/R/DataFrame.R --- @@ -244,11 +246,33 @@ setMethod("showDF", #' @note show(SparkDataFrame) since 1.4.0 setMethod("show", "SparkDataFrame", function(object) { -cols <- lapply(dtypes(object), function(l) { - paste(l, collapse = ":") -}) -s <- paste(cols, collapse = ", ") -cat(paste(class(object), "[", s, "]\n", sep = "")) +allConf <- sparkR.conf() +prop <- allConf[["spark.sql.repl.eagerEval.enabled"]] +if (!is.null(prop) && identical(prop, "true")) { + argsList <- list() + argsList$x <- object + prop <- allConf[["spark.sql.repl.eagerEval.maxNumRows"]] + if (!is.null(prop)) { +numRows <- as.numeric(prop) +if (numRows > 0) { + argsList$numRows <- numRows --- End diff -- A spark session would not start if the maxNumRows config is not an integer. So replacing as.numeric with as.integer here won't make real difference. But I made the code change anyway to be clearer from code view. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/22455#discussion_r226874107 --- Diff: R/pkg/R/DataFrame.R --- @@ -244,11 +246,33 @@ setMethod("showDF", #' @note show(SparkDataFrame) since 1.4.0 setMethod("show", "SparkDataFrame", function(object) { -cols <- lapply(dtypes(object), function(l) { - paste(l, collapse = ":") -}) -s <- paste(cols, collapse = ", ") -cat(paste(class(object), "[", s, "]\n", sep = "")) +allConf <- sparkR.conf() +prop <- allConf[["spark.sql.repl.eagerEval.enabled"]] +if (!is.null(prop) && identical(prop, "true")) { + argsList <- list() + argsList$x <- object + prop <- allConf[["spark.sql.repl.eagerEval.maxNumRows"]] + if (!is.null(prop)) { +numRows <- as.numeric(prop) +if (numRows > 0) { + argsList$numRows <- numRows --- End diff -- could you manually check if this works correctly when numRows or truncate (L265) is a numeric but not an integer - eg. 1.4 if not, we should replace `as.numeric` than `as.integer` in both places --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/22455#discussion_r226874469 --- Diff: R/pkg/tests/fulltests/test_eager_execution.R --- @@ -0,0 +1,52 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +library(testthat) + +context("Show SparkDataFrame when eager execution is enabled.") --- End diff -- change to `test show SparkDataFrame when eager execution is enabled.` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/22455#discussion_r226874142 --- Diff: R/pkg/tests/fulltests/test_eager_execution.R --- @@ -0,0 +1,52 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +library(testthat) + +context("Show SparkDataFrame when eager execution is enabled.") + +test_that("eager execution is not enabled", { + # Start Spark session without eager execution enabled + sparkR.session(master = sparkRTestMaster, enableHiveSupport = FALSE) + + df <- createDataFrame(faithful) + expect_is(df, "SparkDataFrame") + expected <- "eruptions:double, waiting:double" + expect_output(show(df), expected) + + # Stop Spark session + sparkR.session.stop() +}) + +test_that("eager execution is enabled", { + # Start Spark session with eager execution enabled + sparkConfig <- list(spark.sql.repl.eagerEval.enabled = "true", + spark.sql.repl.eagerEval.maxNumRows = as.integer(10)) --- End diff -- add a test with `spark.sql.repl.eagerEval.maxNumRows` not set. change this test to add `spark.sql.repl.eagerEval.truncate` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/22455#discussion_r226874309 --- Diff: R/pkg/tests/fulltests/test_eager_execution.R --- @@ -0,0 +1,52 @@ +# --- End diff -- one thing to note, since test runs alphebatically, this test will run before sparkSQL - I think we should rename this to test_sparkSQL_eager.R to ensure it runs after --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/22455#discussion_r226874251 --- Diff: R/pkg/tests/fulltests/test_sparkSQL.R --- @@ -3731,6 +3731,7 @@ test_that("catalog APIs, listTables, listColumns, listFunctions", { dropTempView("cars") }) + --- End diff -- remove empty line --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...
Github user adrian555 commented on a diff in the pull request: https://github.com/apache/spark/pull/22455#discussion_r226472697 --- Diff: R/pkg/R/DataFrame.R --- @@ -246,30 +248,38 @@ setMethod("showDF", #' @note show(SparkDataFrame) since 1.4.0 setMethod("show", "SparkDataFrame", function(object) { -allConf <- sparkR.conf() -if (!is.null(allConf[["spark.sql.repl.eagerEval.enabled"]]) && -identical(allConf[["spark.sql.repl.eagerEval.enabled"]], "true")) { - argsList <- list() - argsList$x <- object - if (!is.null(allConf[["spark.sql.repl.eagerEval.maxNumRows"]])) { -numRows <- as.numeric(allConf[["spark.sql.repl.eagerEval.maxNumRows"]]) -if (numRows > 0) { - argsList$numRows <- numRows +showFunc <- getOption("sparkr.SparkDataFrame.base_show_func") --- End diff -- I have backed out the change of display function option. Also opened jira https://issues.apache.org/jira/browse/SPARK-25770 to track that issue. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...
Github user adrian555 commented on a diff in the pull request: https://github.com/apache/spark/pull/22455#discussion_r223257553 --- Diff: R/pkg/R/DataFrame.R --- @@ -246,30 +248,38 @@ setMethod("showDF", #' @note show(SparkDataFrame) since 1.4.0 setMethod("show", "SparkDataFrame", function(object) { -allConf <- sparkR.conf() -if (!is.null(allConf[["spark.sql.repl.eagerEval.enabled"]]) && -identical(allConf[["spark.sql.repl.eagerEval.enabled"]], "true")) { - argsList <- list() - argsList$x <- object - if (!is.null(allConf[["spark.sql.repl.eagerEval.maxNumRows"]])) { -numRows <- as.numeric(allConf[["spark.sql.repl.eagerEval.maxNumRows"]]) -if (numRows > 0) { - argsList$numRows <- numRows +showFunc <- getOption("sparkr.SparkDataFrame.base_show_func") --- End diff -- It seems a good idea to separate out the print/show issue as I can see it is not 100% related to this jira. Maybe it will help with a new jira to track the requirement and discussion for that issue. Please confirm so that I will revert some related changes. Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/22455#discussion_r223197863 --- Diff: R/pkg/R/DataFrame.R --- @@ -246,30 +248,38 @@ setMethod("showDF", #' @note show(SparkDataFrame) since 1.4.0 setMethod("show", "SparkDataFrame", function(object) { -allConf <- sparkR.conf() -if (!is.null(allConf[["spark.sql.repl.eagerEval.enabled"]]) && -identical(allConf[["spark.sql.repl.eagerEval.enabled"]], "true")) { - argsList <- list() - argsList$x <- object - if (!is.null(allConf[["spark.sql.repl.eagerEval.maxNumRows"]])) { -numRows <- as.numeric(allConf[["spark.sql.repl.eagerEval.maxNumRows"]]) -if (numRows > 0) { - argsList$numRows <- numRows +showFunc <- getOption("sparkr.SparkDataFrame.base_show_func") --- End diff -- IMO pretty print should plug in to something more R standard like [printr](https://yihui.name/printr/) or [lemon](https://cran.r-project.org/web/packages/lemon/vignettes/lemon_print.html) or [print.x](https://stat.ethz.ch/R-manual/R-devel/library/base/html/print.dataframe.html) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/22455#discussion_r223197875 --- Diff: R/pkg/R/DataFrame.R --- @@ -246,30 +248,38 @@ setMethod("showDF", #' @note show(SparkDataFrame) since 1.4.0 setMethod("show", "SparkDataFrame", function(object) { -allConf <- sparkR.conf() -if (!is.null(allConf[["spark.sql.repl.eagerEval.enabled"]]) && -identical(allConf[["spark.sql.repl.eagerEval.enabled"]], "true")) { - argsList <- list() - argsList$x <- object - if (!is.null(allConf[["spark.sql.repl.eagerEval.maxNumRows"]])) { -numRows <- as.numeric(allConf[["spark.sql.repl.eagerEval.maxNumRows"]]) -if (numRows > 0) { - argsList$numRows <- numRows +showFunc <- getOption("sparkr.SparkDataFrame.base_show_func") --- End diff -- could we consider leaving print/show option out? I'd like to get eager compute to work even in basic sparkR / R shell --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...
Github user adrian555 commented on a diff in the pull request: https://github.com/apache/spark/pull/22455#discussion_r222451616 --- Diff: R/pkg/R/DataFrame.R --- @@ -246,30 +248,38 @@ setMethod("showDF", #' @note show(SparkDataFrame) since 1.4.0 setMethod("show", "SparkDataFrame", function(object) { -allConf <- sparkR.conf() -if (!is.null(allConf[["spark.sql.repl.eagerEval.enabled"]]) && -identical(allConf[["spark.sql.repl.eagerEval.enabled"]], "true")) { - argsList <- list() - argsList$x <- object - if (!is.null(allConf[["spark.sql.repl.eagerEval.maxNumRows"]])) { -numRows <- as.numeric(allConf[["spark.sql.repl.eagerEval.maxNumRows"]]) -if (numRows > 0) { - argsList$numRows <- numRows +showFunc <- getOption("sparkr.SparkDataFrame.base_show_func") --- End diff -- I don't have any naming preference. One thing to note is that the `spark.sparkr.r.command` you mentioned above is the config to the Spark while the one I tried to define here is a config in the R side. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...
Github user adrian555 commented on a diff in the pull request: https://github.com/apache/spark/pull/22455#discussion_r222450798 --- Diff: R/pkg/tests/fulltests/test_eager_execution.R --- @@ -21,11 +21,7 @@ context("Show SparkDataFrame when eager execution is enabled.") test_that("eager execution is not enabled", { # Start Spark session without eager execution enabled - sparkSession <- if (windows_with_hadoop()) { -sparkR.session(master = sparkRTestMaster) - } else { -sparkR.session(master = sparkRTestMaster, enableHiveSupport = FALSE) - } + sparkR.session(master = sparkRTestMaster) --- End diff -- Sure. Guess I misinterpreted your other comment. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/22455#discussion_r221416198 --- Diff: R/pkg/tests/fulltests/test_eager_execution.R --- @@ -21,11 +21,7 @@ context("Show SparkDataFrame when eager execution is enabled.") test_that("eager execution is not enabled", { # Start Spark session without eager execution enabled - sparkSession <- if (windows_with_hadoop()) { -sparkR.session(master = sparkRTestMaster) - } else { -sparkR.session(master = sparkRTestMaster, enableHiveSupport = FALSE) - } + sparkR.session(master = sparkRTestMaster) --- End diff -- as mentioned here https://github.com/apache/spark/pull/22455#discussion_r220030686 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/22455#discussion_r221416164 --- Diff: R/pkg/tests/fulltests/test_eager_execution.R --- @@ -21,11 +21,7 @@ context("Show SparkDataFrame when eager execution is enabled.") test_that("eager execution is not enabled", { # Start Spark session without eager execution enabled - sparkSession <- if (windows_with_hadoop()) { -sparkR.session(master = sparkRTestMaster) - } else { -sparkR.session(master = sparkRTestMaster, enableHiveSupport = FALSE) - } + sparkR.session(master = sparkRTestMaster) --- End diff -- you should definitely set `enableHiveSupport = FALSE` - historically this hasn't work well in other R tests when hive catalog is enabled --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/22455#discussion_r221415826 --- Diff: R/pkg/R/DataFrame.R --- @@ -246,30 +248,38 @@ setMethod("showDF", #' @note show(SparkDataFrame) since 1.4.0 setMethod("show", "SparkDataFrame", function(object) { -allConf <- sparkR.conf() -if (!is.null(allConf[["spark.sql.repl.eagerEval.enabled"]]) && -identical(allConf[["spark.sql.repl.eagerEval.enabled"]], "true")) { - argsList <- list() - argsList$x <- object - if (!is.null(allConf[["spark.sql.repl.eagerEval.maxNumRows"]])) { -numRows <- as.numeric(allConf[["spark.sql.repl.eagerEval.maxNumRows"]]) -if (numRows > 0) { - argsList$numRows <- numRows +showFunc <- getOption("sparkr.SparkDataFrame.base_show_func") --- End diff -- hmm, this naming convention? typically we don't mix `.` and `_` and I don't think we have anything with `SparkDataFrame` seems like we have `spark.sparkr.something` before https://github.com/apache/spark/blob/5264164a67df498b73facae207eda12ee133be7d/core/src/main/scala/org/apache/spark/api/r/RRunner.scala#L343 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...
Github user adrian555 commented on a diff in the pull request: https://github.com/apache/spark/pull/22455#discussion_r220292594 --- Diff: docs/sparkr.md --- @@ -450,6 +450,48 @@ print(model.summaries) {% endhighlight %} +### Eager execution + +If eager execution is enabled, the data will be returned to R client immediately when the `SparkDataFrame` is created. By default, eager execution is not enabled and can be enabled by setting the configuration property `spark.sql.repl.eagerEval.enabled` to `true` when the `SparkSession` is started up. + +Maximum number of rows and maximum number of characters per column of data to display can be controlled by `spark.sql.repl.eagerEval.maxNumRows` and `spark.sql.repl.eagerEval.truncate` configuration properties, respectively. These properties are only effective when eager execution is enabled. If these properties are not set explicitly, by default, data up to 20 rows and up to 20 characters per column will be showed. + + +{% highlight r %} + +# Start up spark session with eager execution enabled +sparkR.session(master = "local[*]", + sparkConfig = list(spark.sql.repl.eagerEval.enabled = "true", + spark.sql.repl.eagerEval.maxNumRows = as.integer(10))) + +# Create a grouped and sorted SparkDataFrame +df <- createDataFrame(faithful) +df2 <- arrange(summarize(groupBy(df, df$waiting), count = n(df$waiting)), "waiting") + +# Similar to R data.frame, displays the data returned, instead of SparkDataFrame class string +df2 + +##+---+-+ +##|waiting|count| +##+---+-+ +##| 43.0|1| +##| 45.0|3| +##| 46.0|5| +##| 47.0|4| +##| 48.0|3| +##| 49.0|5| +##| 50.0|5| +##| 51.0|6| +##| 52.0|5| +##| 53.0|7| +##+---+-+ +##only showing top 10 rows + +{% endhighlight %} + + +Note that to enable eager execution through `sparkR` command, add `spark.sql.repl.eagerEval.enabled=true` configuration property to the `--conf` option. --- End diff -- In the same doc "From Data Sources", it has `either be added by specifying --packages with spark-submit or sparkR commands`, that is why I used `command` instead of `shell`. I would think that `script`, `shell` and `command` are exchangeable here. But if viewed by the angle that `sparkR` ends with a R execution env, maybe `shell` makes more sense. :) So I made the change. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...
Github user adrian555 commented on a diff in the pull request: https://github.com/apache/spark/pull/22455#discussion_r220292560 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -1491,9 +1491,10 @@ object SQLConf { val REPL_EAGER_EVAL_ENABLED = buildConf("spark.sql.repl.eagerEval.enabled") .doc("Enables eager evaluation or not. When true, the top K rows of Dataset will be " + "displayed if and only if the REPL supports the eager evaluation. Currently, the " + - "eager evaluation is only supported in PySpark. For the notebooks like Jupyter, " + - "the HTML table (generated by _repr_html_) will be returned. For plain Python REPL, " + - "the returned outputs are formatted like dataframe.show().") + "eager evaluation is supported in PySpark and SparkR. In PySpark, for the notebooks like " + + "Jupyter, the HTML table (generated by _repr_html_) will be returned. For plain Python " + + "REPL, the returned outputs are formatted like dataframe.show(). In SparkR, the returned " + + "outputs are showed as R data.frame.") --- End diff -- Done --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...
Github user adrian555 commented on a diff in the pull request: https://github.com/apache/spark/pull/22455#discussion_r220292513 --- Diff: R/pkg/R/DataFrame.R --- @@ -244,11 +246,31 @@ setMethod("showDF", #' @note show(SparkDataFrame) since 1.4.0 setMethod("show", "SparkDataFrame", function(object) { -cols <- lapply(dtypes(object), function(l) { - paste(l, collapse = ":") -}) -s <- paste(cols, collapse = ", ") -cat(paste(class(object), "[", s, "]\n", sep = "")) +allConf <- sparkR.conf() +if (!is.null(allConf[["spark.sql.repl.eagerEval.enabled"]]) && --- End diff -- Done --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...
Github user adrian555 commented on a diff in the pull request: https://github.com/apache/spark/pull/22455#discussion_r220292390 --- Diff: R/pkg/tests/fulltests/test_eager_execution.R --- @@ -0,0 +1,60 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +library(testthat) + +context("Show SparkDataFrame when eager execution is enabled.") + +test_that("eager execution is not enabled", { + # Start Spark session without eager execution enabled + sparkSession <- if (windows_with_hadoop()) { +sparkR.session(master = sparkRTestMaster) + } else { +sparkR.session(master = sparkRTestMaster, enableHiveSupport = FALSE) --- End diff -- Done --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/22455#discussion_r220034035 --- Diff: R/pkg/R/DataFrame.R --- @@ -244,11 +246,25 @@ setMethod("showDF", #' @note show(SparkDataFrame) since 1.4.0 setMethod("show", "SparkDataFrame", function(object) { -cols <- lapply(dtypes(object), function(l) { - paste(l, collapse = ":") -}) -s <- paste(cols, collapse = ", ") -cat(paste(class(object), "[", s, "]\n", sep = "")) +if (identical(sparkR.conf("spark.sql.repl.eagerEval.enabled", "false")[[1]], "true")) { + argsList <- list() + argsList$x <- object + numRows <- as.numeric(sparkR.conf("spark.sql.repl.eagerEval.maxNumRows", "0")[[1]]) + if (numRows > 0) { +argsList$numRows <- numRows + } + truncate <- as.numeric(sparkR.conf("spark.sql.repl.eagerEval.truncate", "0")[[1]]) --- End diff -- just asking since this variation was not in unit test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/22455#discussion_r220033577 --- Diff: R/pkg/R/DataFrame.R --- @@ -244,11 +246,31 @@ setMethod("showDF", #' @note show(SparkDataFrame) since 1.4.0 setMethod("show", "SparkDataFrame", function(object) { -cols <- lapply(dtypes(object), function(l) { - paste(l, collapse = ":") -}) -s <- paste(cols, collapse = ", ") -cat(paste(class(object), "[", s, "]\n", sep = "")) +allConf <- sparkR.conf() +if (!is.null(allConf[["spark.sql.repl.eagerEval.enabled"]]) && --- End diff -- you can also save off `foo <- allConf[["spark.sql.repl.eagerEval.enabled"]]` and reuse it, ditto for other cases --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/22455#discussion_r220033775 --- Diff: docs/sparkr.md --- @@ -450,6 +450,48 @@ print(model.summaries) {% endhighlight %} +### Eager execution + +If eager execution is enabled, the data will be returned to R client immediately when the `SparkDataFrame` is created. By default, eager execution is not enabled and can be enabled by setting the configuration property `spark.sql.repl.eagerEval.enabled` to `true` when the `SparkSession` is started up. + +Maximum number of rows and maximum number of characters per column of data to display can be controlled by `spark.sql.repl.eagerEval.maxNumRows` and `spark.sql.repl.eagerEval.truncate` configuration properties, respectively. These properties are only effective when eager execution is enabled. If these properties are not set explicitly, by default, data up to 20 rows and up to 20 characters per column will be showed. + + +{% highlight r %} + +# Start up spark session with eager execution enabled +sparkR.session(master = "local[*]", + sparkConfig = list(spark.sql.repl.eagerEval.enabled = "true", + spark.sql.repl.eagerEval.maxNumRows = as.integer(10))) + +# Create a grouped and sorted SparkDataFrame +df <- createDataFrame(faithful) +df2 <- arrange(summarize(groupBy(df, df$waiting), count = n(df$waiting)), "waiting") + +# Similar to R data.frame, displays the data returned, instead of SparkDataFrame class string +df2 + +##+---+-+ +##|waiting|count| +##+---+-+ +##| 43.0|1| +##| 45.0|3| +##| 46.0|5| +##| 47.0|4| +##| 48.0|3| +##| 49.0|5| +##| 50.0|5| +##| 51.0|6| +##| 52.0|5| +##| 53.0|7| +##+---+-+ +##only showing top 10 rows + +{% endhighlight %} + + +Note that to enable eager execution through `sparkR` command, add `spark.sql.repl.eagerEval.enabled=true` configuration property to the `--conf` option. --- End diff -- `sparkR command` - I think should be `sparkR shell`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/22455#discussion_r220033684 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -1491,9 +1491,10 @@ object SQLConf { val REPL_EAGER_EVAL_ENABLED = buildConf("spark.sql.repl.eagerEval.enabled") .doc("Enables eager evaluation or not. When true, the top K rows of Dataset will be " + "displayed if and only if the REPL supports the eager evaluation. Currently, the " + - "eager evaluation is only supported in PySpark. For the notebooks like Jupyter, " + - "the HTML table (generated by _repr_html_) will be returned. For plain Python REPL, " + - "the returned outputs are formatted like dataframe.show().") + "eager evaluation is supported in PySpark and SparkR. In PySpark, for the notebooks like " + + "Jupyter, the HTML table (generated by _repr_html_) will be returned. For plain Python " + + "REPL, the returned outputs are formatted like dataframe.show(). In SparkR, the returned " + + "outputs are showed as R data.frame.") --- End diff -- ` are showed as R data.frame` - it's not actually the exact format, so how about we say ` are showed similar to R data.frame would` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/22455#discussion_r220030686 --- Diff: R/pkg/tests/fulltests/test_eager_execution.R --- @@ -0,0 +1,60 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +library(testthat) + +context("Show SparkDataFrame when eager execution is enabled.") + +test_that("eager execution is not enabled", { + # Start Spark session without eager execution enabled + sparkSession <- if (windows_with_hadoop()) { +sparkR.session(master = sparkRTestMaster) + } else { +sparkR.session(master = sparkRTestMaster, enableHiveSupport = FALSE) --- End diff -- actually you can use `, enableHiveSupport = FALSE` for this file in all cases - we only need the opposite for SQL tests --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...
Github user adrian555 commented on a diff in the pull request: https://github.com/apache/spark/pull/22455#discussion_r219971196 --- Diff: R/pkg/R/DataFrame.R --- @@ -244,11 +246,25 @@ setMethod("showDF", #' @note show(SparkDataFrame) since 1.4.0 setMethod("show", "SparkDataFrame", function(object) { -cols <- lapply(dtypes(object), function(l) { - paste(l, collapse = ":") -}) -s <- paste(cols, collapse = ", ") -cat(paste(class(object), "[", s, "]\n", sep = "")) +if (identical(sparkR.conf("spark.sql.repl.eagerEval.enabled", "false")[[1]], "true")) { + argsList <- list() + argsList$x <- object + numRows <- as.numeric(sparkR.conf("spark.sql.repl.eagerEval.maxNumRows", "0")[[1]]) + if (numRows > 0) { +argsList$numRows <- numRows + } + truncate <- as.numeric(sparkR.conf("spark.sql.repl.eagerEval.truncate", "0")[[1]]) --- End diff -- I did test and it worked. Any specific concern? Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...
Github user adrian555 commented on a diff in the pull request: https://github.com/apache/spark/pull/22455#discussion_r219938610 --- Diff: docs/sparkr.md --- @@ -450,6 +450,48 @@ print(model.summaries) {% endhighlight %} +### Eager execution + +If the eager execution is enabled, the data will be returned to R client immediately when the `SparkDataFrame` is created. Eager execution can be enabled by setting the configuration property `spark.sql.repl.eagerEval.enabled` to `true` when the `SparkSession` is started up. + +Maximum number of rows and maximum number of characters per column of data to display can be controlled by `spark.sql.repl.eagerEval.maxNumRows` and `spark.sql.repl.eagerEval.truncate` configuration properties, respectively. --- End diff -- Done --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...
Github user adrian555 commented on a diff in the pull request: https://github.com/apache/spark/pull/22455#discussion_r219938566 --- Diff: R/pkg/R/DataFrame.R --- @@ -244,11 +246,25 @@ setMethod("showDF", #' @note show(SparkDataFrame) since 1.4.0 setMethod("show", "SparkDataFrame", function(object) { -cols <- lapply(dtypes(object), function(l) { - paste(l, collapse = ":") -}) -s <- paste(cols, collapse = ", ") -cat(paste(class(object), "[", s, "]\n", sep = "")) +if (identical(sparkR.conf("spark.sql.repl.eagerEval.enabled", "false")[[1]], "true")) { --- End diff -- Retrieving all conf through `sparkR.conf()` does not provide a default value for each conf, so extra condition check is needed. Ideally, I believe we should have cached all configuration for the session inside the sparkr env to avoid backend calls at all. Anyway, I made the suggested change. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...
Github user adrian555 commented on a diff in the pull request: https://github.com/apache/spark/pull/22455#discussion_r219937585 --- Diff: R/pkg/tests/fulltests/test_eager_execution.R --- @@ -0,0 +1,61 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +library(testthat) + +context("Show SparkDataFrame when eager execution is enabled.") + +test_that("eager execution is not enabled", { + # Start Spark session without eager execution enabled + sparkSession <- if (windows_with_hadoop()) { +sparkR.session(master = sparkRTestMaster) + } else { +sparkR.session(master = sparkRTestMaster, enableHiveSupport = FALSE) + } + + df <- createDataFrame(faithful) + expect_is(df, "SparkDataFrame") + expected <- "eruptions:double, waiting:double" + expect_output(show(df), expected) + + # Stop Spark session + sparkR.session.stop() +}) + +test_that("eager execution is enabled", { + # Start Spark session with eager execution enabled + sparkSession <- if (windows_with_hadoop()) { +sparkR.session(master = sparkRTestMaster, + sparkConfig = list(spark.sql.repl.eagerEval.enabled = "true", + spark.sql.repl.eagerEval.maxNumRows = as.integer(10))) + } else { +sparkR.session(master = sparkRTestMaster, enableHiveSupport = FALSE, + sparkConfig = list(spark.sql.repl.eagerEval.enabled = "true", + spark.sql.repl.eagerEval.maxNumRows = as.integer(10))) --- End diff -- Done --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22455#discussion_r219897812 --- Diff: docs/sparkr.md --- @@ -450,6 +450,48 @@ print(model.summaries) {% endhighlight %} +### Eager execution + +If the eager execution is enabled, the data will be returned to R client immediately when the `SparkDataFrame` is created. Eager execution can be enabled by setting the configuration property `spark.sql.repl.eagerEval.enabled` to `true` when the `SparkSession` is started up. + +Maximum number of rows and maximum number of characters per column of data to display can be controlled by `spark.sql.repl.eagerEval.maxNumRows` and `spark.sql.repl.eagerEval.truncate` configuration properties, respectively. --- End diff -- Let's note the default values. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/22455#discussion_r219689581 --- Diff: R/pkg/tests/fulltests/test_eager_execution.R --- @@ -0,0 +1,61 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +library(testthat) + +context("Show SparkDataFrame when eager execution is enabled.") + +test_that("eager execution is not enabled", { + # Start Spark session without eager execution enabled + sparkSession <- if (windows_with_hadoop()) { +sparkR.session(master = sparkRTestMaster) + } else { +sparkR.session(master = sparkRTestMaster, enableHiveSupport = FALSE) + } + + df <- createDataFrame(faithful) + expect_is(df, "SparkDataFrame") + expected <- "eruptions:double, waiting:double" + expect_output(show(df), expected) + + # Stop Spark session + sparkR.session.stop() +}) + +test_that("eager execution is enabled", { + # Start Spark session with eager execution enabled + sparkSession <- if (windows_with_hadoop()) { +sparkR.session(master = sparkRTestMaster, + sparkConfig = list(spark.sql.repl.eagerEval.enabled = "true", + spark.sql.repl.eagerEval.maxNumRows = as.integer(10))) + } else { +sparkR.session(master = sparkRTestMaster, enableHiveSupport = FALSE, + sparkConfig = list(spark.sql.repl.eagerEval.enabled = "true", + spark.sql.repl.eagerEval.maxNumRows = as.integer(10))) --- End diff -- I'd set sparkConfig to a list before this and simply reuse it for both cases instead of having them duplicated --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/22455#discussion_r219689697 --- Diff: R/pkg/R/DataFrame.R --- @@ -244,11 +246,25 @@ setMethod("showDF", #' @note show(SparkDataFrame) since 1.4.0 setMethod("show", "SparkDataFrame", function(object) { -cols <- lapply(dtypes(object), function(l) { - paste(l, collapse = ":") -}) -s <- paste(cols, collapse = ", ") -cat(paste(class(object), "[", s, "]\n", sep = "")) +if (identical(sparkR.conf("spark.sql.repl.eagerEval.enabled", "false")[[1]], "true")) { + argsList <- list() + argsList$x <- object + numRows <- as.numeric(sparkR.conf("spark.sql.repl.eagerEval.maxNumRows", "0")[[1]]) + if (numRows > 0) { +argsList$numRows <- numRows + } + truncate <- as.numeric(sparkR.conf("spark.sql.repl.eagerEval.truncate", "0")[[1]]) --- End diff -- have you tested with the case when only `spark.sql.repl.eagerEval.truncate` is set, and `spark.sql.repl.eagerEval.maxNumRows` is not? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/22455#discussion_r219689694 --- Diff: R/pkg/R/DataFrame.R --- @@ -244,11 +246,25 @@ setMethod("showDF", #' @note show(SparkDataFrame) since 1.4.0 setMethod("show", "SparkDataFrame", function(object) { -cols <- lapply(dtypes(object), function(l) { - paste(l, collapse = ":") -}) -s <- paste(cols, collapse = ", ") -cat(paste(class(object), "[", s, "]\n", sep = "")) +if (identical(sparkR.conf("spark.sql.repl.eagerEval.enabled", "false")[[1]], "true")) { --- End diff -- instead of calling sparkR.conf three times, I'm wondering if it cleaner/safer to get a copy like in here: https://github.com/apache/spark/blob/95b177c8f0862c6965a7c3cd76b3935c975adee9/R/pkg/tests/fulltests/test_sparkSQL.R#L3538 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...
Github user adrian555 commented on a diff in the pull request: https://github.com/apache/spark/pull/22455#discussion_r219576011 --- Diff: R/pkg/R/DataFrame.R --- @@ -226,7 +226,8 @@ setMethod("showDF", #' show #' -#' Print class and type information of a Spark object. +#' If eager evaluation is enabled and the Spark object is a SparkDataFrame, return the data of --- End diff -- Done --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...
Github user adrian555 commented on a diff in the pull request: https://github.com/apache/spark/pull/22455#discussion_r219575988 --- Diff: R/pkg/R/DataFrame.R --- @@ -244,11 +245,15 @@ setMethod("showDF", #' @note show(SparkDataFrame) since 1.4.0 setMethod("show", "SparkDataFrame", function(object) { -cols <- lapply(dtypes(object), function(l) { - paste(l, collapse = ":") -}) -s <- paste(cols, collapse = ", ") -cat(paste(class(object), "[", s, "]\n", sep = "")) +if (identical(sparkR.conf("spark.sql.repl.eagerEval.enabled", "false")[[1]], "true")) { --- End diff -- Done --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22455#discussion_r219410274 --- Diff: R/pkg/R/DataFrame.R --- @@ -226,7 +226,8 @@ setMethod("showDF", #' show #' -#' Print class and type information of a Spark object. +#' If eager evaluation is enabled and the Spark object is a SparkDataFrame, return the data of --- End diff -- return the data of the SparkDataFrame object -> evaluate the SparkDataFrame and print top rows of the SparkDataFrame --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22455#discussion_r219404319 --- Diff: R/pkg/R/DataFrame.R --- @@ -244,11 +245,15 @@ setMethod("showDF", #' @note show(SparkDataFrame) since 1.4.0 setMethod("show", "SparkDataFrame", function(object) { -cols <- lapply(dtypes(object), function(l) { - paste(l, collapse = ":") -}) -s <- paste(cols, collapse = ", ") -cat(paste(class(object), "[", s, "]\n", sep = "")) +if (identical(sparkR.conf("spark.sql.repl.eagerEval.enabled", "false")[[1]], "true")) { --- End diff -- I see. The document generated looks like https://spark.apache.org/docs/latest/api/R/index.html. Then rewriting the description is good and clear. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/22455#discussion_r219402870 --- Diff: R/pkg/R/DataFrame.R --- @@ -244,11 +245,15 @@ setMethod("showDF", #' @note show(SparkDataFrame) since 1.4.0 setMethod("show", "SparkDataFrame", function(object) { -cols <- lapply(dtypes(object), function(l) { - paste(l, collapse = ":") -}) -s <- paste(cols, collapse = ", ") -cat(paste(class(object), "[", s, "]\n", sep = "")) +if (identical(sparkR.conf("spark.sql.repl.eagerEval.enabled", "false")[[1]], "true")) { --- End diff -- respecting `spark.sql.repl.eagerEval.maxNumRows` somewhat makes sense. but instead of changing `showDF` which has other cases beyond eagerEval, we could change where `showDF` is called by `show`, and pass the max rows. here https://github.com/apache/spark/pull/22455/files#diff-508641a8bd6c6b59f3e77c80cdcfa6a9R249 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/22455#discussion_r219402281 --- Diff: R/pkg/R/DataFrame.R --- @@ -244,11 +245,15 @@ setMethod("showDF", #' @note show(SparkDataFrame) since 1.4.0 setMethod("show", "SparkDataFrame", function(object) { -cols <- lapply(dtypes(object), function(l) { - paste(l, collapse = ":") -}) -s <- paste(cols, collapse = ", ") -cat(paste(class(object), "[", s, "]\n", sep = "")) +if (identical(sparkR.conf("spark.sql.repl.eagerEval.enabled", "false")[[1]], "true")) { --- End diff -- ^ that - it's two different "overloads" for two different "class" --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...
Github user dilipbiswal commented on a diff in the pull request: https://github.com/apache/spark/pull/22455#discussion_r219388335 --- Diff: R/pkg/R/DataFrame.R --- @@ -244,11 +245,15 @@ setMethod("showDF", #' @note show(SparkDataFrame) since 1.4.0 setMethod("show", "SparkDataFrame", function(object) { -cols <- lapply(dtypes(object), function(l) { - paste(l, collapse = ":") -}) -s <- paste(cols, collapse = ", ") -cat(paste(class(object), "[", s, "]\n", sep = "")) +if (identical(sparkR.conf("spark.sql.repl.eagerEval.enabled", "false")[[1]], "true")) { --- End diff -- @adrian555 Thanks for the explanation. > However, my second point is that I don't think these two configs matter much or that important/necessary. Since the eager execution is just to show a snippet data of the SparkDataFrame, our default numRows = 20 and truncate = TRUE are good enough iMO. If users want to see more or less number of rows, they should call showDF(). So i just wanted to make sure if its possible to have parity with how it works for python. It seems to me that in python, we just get the two configs and call the showstring method. > And if we think that showDF() can ignore the eager execution setting and still want the show() to observe eager execution config, we can certainly just grab the maxNumRows and truncate setting and pass to showDF() call. What will happen if we grab these config in show() when eager execution is enabled and then call showDF() by passing these parameters ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...
Github user adrian555 commented on a diff in the pull request: https://github.com/apache/spark/pull/22455#discussion_r219384936 --- Diff: R/pkg/R/DataFrame.R --- @@ -244,11 +245,15 @@ setMethod("showDF", #' @note show(SparkDataFrame) since 1.4.0 setMethod("show", "SparkDataFrame", function(object) { -cols <- lapply(dtypes(object), function(l) { - paste(l, collapse = ":") -}) -s <- paste(cols, collapse = ", ") -cat(paste(class(object), "[", s, "]\n", sep = "")) +if (identical(sparkR.conf("spark.sql.repl.eagerEval.enabled", "false")[[1]], "true")) { --- End diff -- Had thought about this. First, I consider it is not in the scope of this jira, because I think they are conflicting with the current `showDF()` behavior. Some details: the `showDF()` already takes `numRows` and `truncate` arguments. So if we are going to respect those two as well, we have to decide what behavior is best suitable for `showDF()`. For example, whether `showDF()` should just ignore the eager execution, or it picks the `maxNumRows` and `truncate` set through eager execution like following: ``` setMethod("showDF", signature(x = "SparkDataFrame"), function(x, numRows = 20, truncate = TRUE, vertical = FALSE) { eagerNumRows <- as.numeric(sparkR.conf("spark.sql.repl.eagerEval.maxNumRows", "0")[[1]]) numRows <- ifelse(eagerNumRows == 0, numRows, eagerNumRows) eagerTruncate <- as.numeric(sparkR.conf("spark.sql.repl.eagerEval.truncate", "0")[[1]]) truncate <- ifelse(eagerTruncate == 0, truncate, eagerTruncate) if (is.logical(truncate) && truncate) { s <- callJMethod(x@sdf, "showString", numToInt(numRows), numToInt(20), vertical) } else { truncate2 <- as.numeric(truncate) s <- callJMethod(x@sdf, "showString", numToInt(numRows), numToInt(truncate2), vertical) } cat(s) }) ``` And if we think that `showDF()` can ignore the eager execution setting and still want the `show()` to observe eager execution config, we can certainly just grab the `maxNumRows` and `truncate` setting and pass to `showDF() call. However, my second point is that I don't think these two configs matter much or that important/necessary. Since the eager execution is just to show a snippet data of the SparkDataFrame, our default `numRows = 20` and `truncate = TRUE` are good enough iMO. If users want to see more or less number of rows, they should call `showDF()`. @felixcheung, your thought? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...
Github user dilipbiswal commented on a diff in the pull request: https://github.com/apache/spark/pull/22455#discussion_r219356224 --- Diff: R/pkg/R/DataFrame.R --- @@ -244,11 +245,15 @@ setMethod("showDF", #' @note show(SparkDataFrame) since 1.4.0 setMethod("show", "SparkDataFrame", function(object) { -cols <- lapply(dtypes(object), function(l) { - paste(l, collapse = ":") -}) -s <- paste(cols, collapse = ", ") -cat(paste(class(object), "[", s, "]\n", sep = "")) +if (identical(sparkR.conf("spark.sql.repl.eagerEval.enabled", "false")[[1]], "true")) { --- End diff -- @adrian555 Please correct me on this one. Should we also respect `spark.sql.repl.eagerEval.maxNumRows` and `spark.sql.repl.eagerEval.truncate` ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...
Github user adrian555 commented on a diff in the pull request: https://github.com/apache/spark/pull/22455#discussion_r219355827 --- Diff: R/pkg/R/DataFrame.R --- @@ -244,11 +245,15 @@ setMethod("showDF", #' @note show(SparkDataFrame) since 1.4.0 setMethod("show", "SparkDataFrame", function(object) { -cols <- lapply(dtypes(object), function(l) { - paste(l, collapse = ":") -}) -s <- paste(cols, collapse = ", ") -cat(paste(class(object), "[", s, "]\n", sep = "")) +if (identical(sparkR.conf("spark.sql.repl.eagerEval.enabled", "false")[[1]], "true")) { --- End diff -- I might have misunderstood your previous question. The `@param` does not need to change as it is common to `SparkDataFrame` and `Column` etc for `show()` method. Same applies to description, so I rewrote the description. Please have a look. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22455#discussion_r219351707 --- Diff: R/pkg/R/DataFrame.R --- @@ -244,11 +245,15 @@ setMethod("showDF", #' @note show(SparkDataFrame) since 1.4.0 setMethod("show", "SparkDataFrame", function(object) { -cols <- lapply(dtypes(object), function(l) { - paste(l, collapse = ":") -}) -s <- paste(cols, collapse = ", ") -cat(paste(class(object), "[", s, "]\n", sep = "")) +if (identical(sparkR.conf("spark.sql.repl.eagerEval.enabled", "false")[[1]], "true")) { --- End diff -- Can you also correct the param doc? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...
Github user adrian555 commented on a diff in the pull request: https://github.com/apache/spark/pull/22455#discussion_r219351194 --- Diff: R/pkg/R/DataFrame.R --- @@ -244,11 +245,15 @@ setMethod("showDF", #' @note show(SparkDataFrame) since 1.4.0 setMethod("show", "SparkDataFrame", function(object) { -cols <- lapply(dtypes(object), function(l) { - paste(l, collapse = ":") -}) -s <- paste(cols, collapse = ", ") -cat(paste(class(object), "[", s, "]\n", sep = "")) +if (identical(sparkR.conf("spark.sql.repl.eagerEval.enabled", "false")[[1]], "true")) { --- End diff -- I think the `param` is not correct... the signature of `show()` has `SparkDataFrame` so in this one it won't work for `Column`. In column.R file, it has its own `show()` implementation. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22455#discussion_r219348795 --- Diff: R/pkg/R/DataFrame.R --- @@ -244,11 +245,15 @@ setMethod("showDF", #' @note show(SparkDataFrame) since 1.4.0 setMethod("show", "SparkDataFrame", function(object) { -cols <- lapply(dtypes(object), function(l) { - paste(l, collapse = ":") -}) -s <- paste(cols, collapse = ", ") -cat(paste(class(object), "[", s, "]\n", sep = "")) +if (identical(sparkR.conf("spark.sql.repl.eagerEval.enabled", "false")[[1]], "true")) { --- End diff -- Or the param doc is not correct? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22455#discussion_r219347786 --- Diff: R/pkg/R/DataFrame.R --- @@ -244,11 +245,15 @@ setMethod("showDF", #' @note show(SparkDataFrame) since 1.4.0 setMethod("show", "SparkDataFrame", function(object) { -cols <- lapply(dtypes(object), function(l) { - paste(l, collapse = ":") -}) -s <- paste(cols, collapse = ", ") -cat(paste(class(object), "[", s, "]\n", sep = "")) +if (identical(sparkR.conf("spark.sql.repl.eagerEval.enabled", "false")[[1]], "true")) { --- End diff -- `show` also works on other Spark object like `Column`, I think we only need to do `showDF` for `SparkDataFrame`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...
Github user adrian555 commented on a diff in the pull request: https://github.com/apache/spark/pull/22455#discussion_r219331649 --- Diff: R/pkg/tests/fulltests/test_eager_execution.R --- @@ -0,0 +1,58 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +library(testthat) + +context("Show SparkDataFrame when eager execution is enabled.") + +test_that("eager execution is not enabled", { + # Start Spark session without eager execution enabled + sparkSession <- if (windows_with_hadoop()) { +sparkR.session(master = sparkRTestMaster) + } else { +sparkR.session(master = sparkRTestMaster, enableHiveSupport = FALSE) + } + + df <- createDataFrame(faithful) + expect_is(df, "SparkDataFrame") + expected <- "eruptions:double, waiting:double" + expect_output(show(df), expected) + + # Stop Spark session + sparkR.session.stop() +}) + +test_that("eager execution is enabled", { + # Start Spark session without eager execution enabled --- End diff -- Done --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...
Github user dilipbiswal commented on a diff in the pull request: https://github.com/apache/spark/pull/22455#discussion_r219330082 --- Diff: R/pkg/tests/fulltests/test_eager_execution.R --- @@ -0,0 +1,58 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +library(testthat) + +context("Show SparkDataFrame when eager execution is enabled.") + +test_that("eager execution is not enabled", { + # Start Spark session without eager execution enabled + sparkSession <- if (windows_with_hadoop()) { +sparkR.session(master = sparkRTestMaster) + } else { +sparkR.session(master = sparkRTestMaster, enableHiveSupport = FALSE) + } + + df <- createDataFrame(faithful) + expect_is(df, "SparkDataFrame") + expected <- "eruptions:double, waiting:double" + expect_output(show(df), expected) + + # Stop Spark session + sparkR.session.stop() +}) + +test_that("eager execution is enabled", { + # Start Spark session without eager execution enabled --- End diff -- Here we are testing with eager execution enabled, right ? Do we need to fix the comment here ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...
Github user adrian555 commented on a diff in the pull request: https://github.com/apache/spark/pull/22455#discussion_r219315429 --- Diff: R/pkg/R/DataFrame.R --- @@ -244,11 +244,15 @@ setMethod("showDF", #' @note show(SparkDataFrame) since 1.4.0 setMethod("show", "SparkDataFrame", function(object) { -cols <- lapply(dtypes(object), function(l) { - paste(l, collapse = ":") -}) -s <- paste(cols, collapse = ", ") -cat(paste(class(object), "[", s, "]\n", sep = "")) +if (identical(sparkR.conf("spark.sql.repl.eagerEval.enabled", "false")[[1]], "true")) { --- End diff -- Done --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...
Github user adrian555 commented on a diff in the pull request: https://github.com/apache/spark/pull/22455#discussion_r219315410 --- Diff: R/pkg/tests/fulltests/test_eager_execution.R --- @@ -0,0 +1,58 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +library(testthat) + +context("Show SparkDataFrame when eager execution is enabled.") + +test_that("eager execution is not enabled", { + # Start Spark session without eager execution enabled + sparkSession <- if (windows_with_hadoop()) { +sparkR.session(master = sparkRTestMaster) + } else { +sparkR.session(master = sparkRTestMaster, enableHiveSupport = FALSE) + } + + df <- suppressWarnings(createDataFrame(iris)) + expect_is(df, "SparkDataFrame") + expected <- "Sepal_Length:double, Sepal_Width:double, Petal_Length:double, Petal_Width:double, Species:string" + expect_output(show(df), expected) + + # Stop Spark session + sparkR.session.stop() +}) + +test_that("eager execution is enabled", { + # Start Spark session without eager execution enabled + sparkSession <- if (windows_with_hadoop()) { +sparkR.session(master = sparkRTestMaster, + sparkConfig = list(spark.sql.repl.eagerEval.enabled = "true")) + } else { +sparkR.session(master = sparkRTestMaster, enableHiveSupport = FALSE, + sparkConfig = list(spark.sql.repl.eagerEval.enabled = "true")) + } + + df <- suppressWarnings(createDataFrame(iris)) --- End diff -- Done --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...
Github user adrian555 commented on a diff in the pull request: https://github.com/apache/spark/pull/22455#discussion_r219315391 --- Diff: R/pkg/tests/fulltests/test_eager_execution.R --- @@ -0,0 +1,58 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +library(testthat) + +context("Show SparkDataFrame when eager execution is enabled.") + +test_that("eager execution is not enabled", { + # Start Spark session without eager execution enabled + sparkSession <- if (windows_with_hadoop()) { +sparkR.session(master = sparkRTestMaster) + } else { +sparkR.session(master = sparkRTestMaster, enableHiveSupport = FALSE) + } + + df <- suppressWarnings(createDataFrame(iris)) --- End diff -- Done --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...
Github user adrian555 commented on a diff in the pull request: https://github.com/apache/spark/pull/22455#discussion_r219315320 --- Diff: docs/sparkr.md --- @@ -450,6 +450,42 @@ print(model.summaries) {% endhighlight %} +### Eager execution + +If the eager execution is enabled, the data will be returned to R client immediately when the `SparkDataFrame` is created. Eager execution can be enabled by setting the configuration property `spark.sql.repl.eagerEval.enabled` to `true` when the `SparkSession` is started up. + + +{% highlight r %} + +# Start up spark session with eager execution enabled +sparkR.session(master = "local[*]", sparkConfig = list(spark.sql.repl.eagerEval.enabled = "true")) + +df <- createDataFrame(faithful) + +# Instead of displaying the SparkDataFrame class, displays the data returned +df + +##+-+---+ +##|eruptions|waiting| +##+-+---+ +##| 3.6| 79.0| +##| 1.8| 54.0| +##|3.333| 74.0| +##|2.283| 62.0| +##|4.533| 85.0| +##|2.883| 55.0| +##| 4.7| 88.0| +##| 3.6| 85.0| +##| 1.95| 51.0| +##| 4.35| 85.0| +##+-+---+ +##only showing top 10 rows + +{% endhighlight %} + + +Note that the `SparkSession` created by `sparkR` shell does not have eager execution enabled. You can stop the current session and start up a new session like above to enable. --- End diff -- Changed to this: ``` Note that to enable eager execution through `sparkR` command, add `spark.sql.repl.eagerEval.enabled=true` configuration property to the `--conf` option. ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...
Github user adrian555 commented on a diff in the pull request: https://github.com/apache/spark/pull/22455#discussion_r219315374 --- Diff: R/pkg/tests/fulltests/test_eager_execution.R --- @@ -0,0 +1,58 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +library(testthat) + +context("Show SparkDataFrame when eager execution is enabled.") + +test_that("eager execution is not enabled", { --- End diff -- Since this test will terminate the spark session and restart, I think it is better to be self contained. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...
Github user adrian555 commented on a diff in the pull request: https://github.com/apache/spark/pull/22455#discussion_r219315297 --- Diff: docs/sparkr.md --- @@ -450,6 +450,42 @@ print(model.summaries) {% endhighlight %} +### Eager execution + +If the eager execution is enabled, the data will be returned to R client immediately when the `SparkDataFrame` is created. Eager execution can be enabled by setting the configuration property `spark.sql.repl.eagerEval.enabled` to `true` when the `SparkSession` is started up. + + +{% highlight r %} + +# Start up spark session with eager execution enabled +sparkR.session(master = "local[*]", sparkConfig = list(spark.sql.repl.eagerEval.enabled = "true")) + +df <- createDataFrame(faithful) + +# Instead of displaying the SparkDataFrame class, displays the data returned --- End diff -- Done --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...
Github user adrian555 commented on a diff in the pull request: https://github.com/apache/spark/pull/22455#discussion_r219315251 --- Diff: docs/sparkr.md --- @@ -450,6 +450,42 @@ print(model.summaries) {% endhighlight %} +### Eager execution --- End diff -- This will be the same level as "Applying User-Defined Function" where its parent topic is "SparkDataFrame Operations". So the heading is with three `#`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...
Github user adrian555 commented on a diff in the pull request: https://github.com/apache/spark/pull/22455#discussion_r219315277 --- Diff: docs/sparkr.md --- @@ -450,6 +450,42 @@ print(model.summaries) {% endhighlight %} +### Eager execution + +If the eager execution is enabled, the data will be returned to R client immediately when the `SparkDataFrame` is created. Eager execution can be enabled by setting the configuration property `spark.sql.repl.eagerEval.enabled` to `true` when the `SparkSession` is started up. + + +{% highlight r %} + +# Start up spark session with eager execution enabled +sparkR.session(master = "local[*]", sparkConfig = list(spark.sql.repl.eagerEval.enabled = "true")) + +df <- createDataFrame(faithful) --- End diff -- Done --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/22455#discussion_r219030775 --- Diff: R/pkg/R/DataFrame.R --- @@ -244,11 +244,15 @@ setMethod("showDF", #' @note show(SparkDataFrame) since 1.4.0 setMethod("show", "SparkDataFrame", function(object) { -cols <- lapply(dtypes(object), function(l) { - paste(l, collapse = ":") -}) -s <- paste(cols, collapse = ", ") -cat(paste(class(object), "[", s, "]\n", sep = "")) +if (identical(sparkR.conf("spark.sql.repl.eagerEval.enabled", "false")[[1]], "true")) { --- End diff -- also not sure if it's done for python, consider adding to the doc above (L229) how it behaves with eagerEval --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/22455#discussion_r219030350 --- Diff: docs/sparkr.md --- @@ -450,6 +450,42 @@ print(model.summaries) {% endhighlight %} +### Eager execution + +If the eager execution is enabled, the data will be returned to R client immediately when the `SparkDataFrame` is created. Eager execution can be enabled by setting the configuration property `spark.sql.repl.eagerEval.enabled` to `true` when the `SparkSession` is started up. + + +{% highlight r %} + +# Start up spark session with eager execution enabled +sparkR.session(master = "local[*]", sparkConfig = list(spark.sql.repl.eagerEval.enabled = "true")) + +df <- createDataFrame(faithful) + +# Instead of displaying the SparkDataFrame class, displays the data returned +df + +##+-+---+ +##|eruptions|waiting| +##+-+---+ +##| 3.6| 79.0| +##| 1.8| 54.0| +##|3.333| 74.0| +##|2.283| 62.0| +##|4.533| 85.0| +##|2.883| 55.0| +##| 4.7| 88.0| +##| 3.6| 85.0| +##| 1.95| 51.0| +##| 4.35| 85.0| +##+-+---+ +##only showing top 10 rows + +{% endhighlight %} + + +Note that the `SparkSession` created by `sparkR` shell does not have eager execution enabled. You can stop the current session and start up a new session like above to enable. --- End diff -- actually I think the suggestion should be to set that in the `sparkR` command line as spark conf? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/22455#discussion_r219030512 --- Diff: R/pkg/tests/fulltests/test_eager_execution.R --- @@ -0,0 +1,58 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +library(testthat) + +context("Show SparkDataFrame when eager execution is enabled.") + +test_that("eager execution is not enabled", { + # Start Spark session without eager execution enabled + sparkSession <- if (windows_with_hadoop()) { +sparkR.session(master = sparkRTestMaster) + } else { +sparkR.session(master = sparkRTestMaster, enableHiveSupport = FALSE) + } + + df <- suppressWarnings(createDataFrame(iris)) --- End diff -- use a different dataset that does not require `suppressWarnings` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/22455#discussion_r219030211 --- Diff: docs/sparkr.md --- @@ -450,6 +450,42 @@ print(model.summaries) {% endhighlight %} +### Eager execution + +If the eager execution is enabled, the data will be returned to R client immediately when the `SparkDataFrame` is created. Eager execution can be enabled by setting the configuration property `spark.sql.repl.eagerEval.enabled` to `true` when the `SparkSession` is started up. + + +{% highlight r %} + +# Start up spark session with eager execution enabled +sparkR.session(master = "local[*]", sparkConfig = list(spark.sql.repl.eagerEval.enabled = "true")) + +df <- createDataFrame(faithful) + +# Instead of displaying the SparkDataFrame class, displays the data returned --- End diff -- we could also start here by saying "similar to R data.frame`... --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/22455#discussion_r219030277 --- Diff: docs/sparkr.md --- @@ -450,6 +450,42 @@ print(model.summaries) {% endhighlight %} +### Eager execution + +If the eager execution is enabled, the data will be returned to R client immediately when the `SparkDataFrame` is created. Eager execution can be enabled by setting the configuration property `spark.sql.repl.eagerEval.enabled` to `true` when the `SparkSession` is started up. + + +{% highlight r %} + +# Start up spark session with eager execution enabled +sparkR.session(master = "local[*]", sparkConfig = list(spark.sql.repl.eagerEval.enabled = "true")) + +df <- createDataFrame(faithful) + +# Instead of displaying the SparkDataFrame class, displays the data returned +df + +##+-+---+ +##|eruptions|waiting| +##+-+---+ +##| 3.6| 79.0| +##| 1.8| 54.0| +##|3.333| 74.0| +##|2.283| 62.0| +##|4.533| 85.0| +##|2.883| 55.0| +##| 4.7| 88.0| +##| 3.6| 85.0| +##| 1.95| 51.0| +##| 4.35| 85.0| +##+-+---+ +##only showing top 10 rows + +{% endhighlight %} + + +Note that the `SparkSession` created by `sparkR` shell does not have eager execution enabled. You can stop the current session and start up a new session like above to enable. --- End diff -- change to `Note that the `SparkSession` created by `sparkR` shell by default does not ` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/22455#discussion_r219029847 --- Diff: docs/sparkr.md --- @@ -450,6 +450,42 @@ print(model.summaries) {% endhighlight %} +### Eager execution --- End diff -- should be `` I think? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/22455#discussion_r219030474 --- Diff: R/pkg/tests/fulltests/test_eager_execution.R --- @@ -0,0 +1,58 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +library(testthat) + +context("Show SparkDataFrame when eager execution is enabled.") + +test_that("eager execution is not enabled", { --- End diff -- I'm neutral, should these tests be in test_sparkSQL.R? it takes longer to run with many test files --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/22455#discussion_r219030085 --- Diff: docs/sparkr.md --- @@ -450,6 +450,42 @@ print(model.summaries) {% endhighlight %} +### Eager execution + +If the eager execution is enabled, the data will be returned to R client immediately when the `SparkDataFrame` is created. Eager execution can be enabled by setting the configuration property `spark.sql.repl.eagerEval.enabled` to `true` when the `SparkSession` is started up. + + +{% highlight r %} + +# Start up spark session with eager execution enabled +sparkR.session(master = "local[*]", sparkConfig = list(spark.sql.repl.eagerEval.enabled = "true")) + +df <- createDataFrame(faithful) --- End diff -- perhaps a more complete example - like `summarize(groupBy(df, df$waiting), count = n(df$waiting))` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/22455#discussion_r219030537 --- Diff: R/pkg/tests/fulltests/test_eager_execution.R --- @@ -0,0 +1,58 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +library(testthat) + +context("Show SparkDataFrame when eager execution is enabled.") + +test_that("eager execution is not enabled", { + # Start Spark session without eager execution enabled + sparkSession <- if (windows_with_hadoop()) { +sparkR.session(master = sparkRTestMaster) + } else { +sparkR.session(master = sparkRTestMaster, enableHiveSupport = FALSE) + } + + df <- suppressWarnings(createDataFrame(iris)) + expect_is(df, "SparkDataFrame") + expected <- "Sepal_Length:double, Sepal_Width:double, Petal_Length:double, Petal_Width:double, Species:string" + expect_output(show(df), expected) + + # Stop Spark session + sparkR.session.stop() +}) + +test_that("eager execution is enabled", { + # Start Spark session without eager execution enabled + sparkSession <- if (windows_with_hadoop()) { +sparkR.session(master = sparkRTestMaster, + sparkConfig = list(spark.sql.repl.eagerEval.enabled = "true")) + } else { +sparkR.session(master = sparkRTestMaster, enableHiveSupport = FALSE, + sparkConfig = list(spark.sql.repl.eagerEval.enabled = "true")) + } + + df <- suppressWarnings(createDataFrame(iris)) --- End diff -- ditto --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22455: [SPARK-24572][SparkR] "eager execution" for R she...
GitHub user adrian555 opened a pull request: https://github.com/apache/spark/pull/22455 [SPARK-24572][SparkR] "eager execution" for R shell, IDE ## What changes were proposed in this pull request? Check the `spark.sql.repl.eagerEval.enabled` configuration property in SparkDataFrame `show()` method. If the `SparkSession` has eager execution enabled, the data will be returned to the R client when the data frame is created. So instead of seeing this ``` > df <- createDataFrame(faithful) > df SparkDataFrame[eruptions:double, waiting:double] ``` you will see ``` > df <- createDataFrame(faithful) > df +-+---+ |eruptions|waiting| +-+---+ | 3.6| 79.0| | 1.8| 54.0| |3.333| 74.0| |2.283| 62.0| |4.533| 85.0| |2.883| 55.0| | 4.7| 88.0| | 3.6| 85.0| | 1.95| 51.0| | 4.35| 85.0| |1.833| 54.0| |3.917| 84.0| | 4.2| 78.0| | 1.75| 47.0| | 4.7| 83.0| |2.167| 52.0| | 1.75| 62.0| | 4.8| 84.0| | 1.6| 52.0| | 4.25| 79.0| +-+---+ only showing top 20 rows ``` ## How was this patch tested? Manual tests as well as unit tests (one new test case is added). You can merge this pull request into a Git repository by running: $ git pull https://github.com/adrian555/spark eager_execution Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22455.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22455 commit b3e014c4e8050e8a0b3da190bb327347f9136b7e Author: adrian555 Date: 2018-09-18T20:48:56Z support eager execution commit d0be3a8582de548862a78cfa5ffb58df933efa8d Author: adrian555 Date: 2018-09-18T20:51:43Z Merge remote-tracking branch 'remotes/upstream/master' into eager_execution commit cd8a7041c6eecc59d22db72f4f2065ed9f06640a Author: adrian555 Date: 2018-09-18T21:09:48Z add newline --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org