[GitHub] spark issue #22455: [SPARK-24572][SPARKR] "eager execution" for R shell, IDE

2018-09-26 Thread falaki
Github user falaki commented on the issue: https://github.com/apache/spark/pull/22455 @adrian555 yes, that looks good. Thank you! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #22455: [SPARK-24572][SPARKR] "eager execution" for R shell, IDE

2018-09-25 Thread falaki
Github user falaki commented on the issue: https://github.com/apache/spark/pull/22455 @adrian555 These all great points. My high-level was enabling other platforms (e.g., Jupyter) to plugin more advanced (custom) functions for displaying SparkDataFrame. If the framework does not set

[GitHub] spark issue #22455: [SPARK-24572][SPARKR] "eager execution" for R shell, IDE

2018-09-25 Thread falaki
Github user falaki commented on the issue: https://github.com/apache/spark/pull/22455 @adrian555 thanks for submitting this. Can we have a config to set the default `print` function in eager mode. It can default to `show`, but I can imagine it is useful to make it configurable

[GitHub] spark issue #22225: [SPARK-25234][SPARKR] avoid integer overflow in parallel...

2018-08-24 Thread falaki
Github user falaki commented on the issue: https://github.com/apache/spark/pull/5 LGTM. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #20005: [DO-NOT-MERGE] Investigating SparkR test failure

2017-12-18 Thread falaki
Github user falaki commented on the issue: https://github.com/apache/spark/pull/20005 Thank you guys. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #19959: [SPARK-22766] Install R linter package in spark l...

2017-12-16 Thread falaki
Github user falaki commented on a diff in the pull request: https://github.com/apache/spark/pull/19959#discussion_r157351383 --- Diff: dev/lint-r.R --- @@ -27,10 +27,11 @@ if (! library(SparkR, lib.loc = LOCAL_LIB_LOC, logical.return = TRUE)) { # Installs lintr from Github

[GitHub] spark issue #19959: [SPARK-22766] Install R linter package in spark lib dire...

2017-12-14 Thread falaki
Github user falaki commented on the issue: https://github.com/apache/spark/pull/19959 @shivaram are there other concerns with this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark pull request #19959: [SPARK-22766] Install R linter package in spark l...

2017-12-13 Thread falaki
Github user falaki commented on a diff in the pull request: https://github.com/apache/spark/pull/19959#discussion_r156820420 --- Diff: dev/lint-r.R --- @@ -27,10 +27,11 @@ if (! library(SparkR, lib.loc = LOCAL_LIB_LOC, logical.return = TRUE)) { # Installs lintr from Github

[GitHub] spark pull request #19959: [SPARK-22766] Install R linter package in spark l...

2017-12-13 Thread falaki
Github user falaki commented on a diff in the pull request: https://github.com/apache/spark/pull/19959#discussion_r156789375 --- Diff: dev/lint-r.R --- @@ -27,10 +27,11 @@ if (! library(SparkR, lib.loc = LOCAL_LIB_LOC, logical.return = TRUE)) { # Installs lintr from Github

[GitHub] spark pull request #19959: [SPARK-22766] Install R linter package in spark l...

2017-12-12 Thread falaki
Github user falaki commented on a diff in the pull request: https://github.com/apache/spark/pull/19959#discussion_r156551105 --- Diff: dev/lint-r.R --- @@ -27,10 +27,11 @@ if (! library(SparkR, lib.loc = LOCAL_LIB_LOC, logical.return = TRUE)) { # Installs lintr from Github

[GitHub] spark pull request #19959: [SPARK-22766] Install R linter package in spark l...

2017-12-12 Thread falaki
Github user falaki commented on a diff in the pull request: https://github.com/apache/spark/pull/19959#discussion_r156528729 --- Diff: dev/lint-r.R --- @@ -27,10 +27,11 @@ if (! library(SparkR, lib.loc = LOCAL_LIB_LOC, logical.return = TRUE)) { # Installs lintr from Github

[GitHub] spark issue #19959: [SPARK-22766] Install R linter package in spark lib dire...

2017-12-12 Thread falaki
Github user falaki commented on the issue: https://github.com/apache/spark/pull/19959 @JoshRosen the SparkR package is built from `spark/R/pkg` directory. `$SPARK_HOME/R/lib` is just a place where `SparkR` is installed (after it has been built) and at runtime loaded from

[GitHub] spark pull request #19959: [SPARK-22766] Install R linter package in spark l...

2017-12-12 Thread falaki
GitHub user falaki opened a pull request: https://github.com/apache/spark/pull/19959 [SPARK-22766] Install R linter package in spark lib directory ## What changes were proposed in this pull request? `dev/lint-r.R` file installs uses `devtools` to install `jimhester/lintr

[GitHub] spark pull request #19551: [SPARK-17902][R] Revive stringsAsFactors option f...

2017-10-23 Thread falaki
Github user falaki commented on a diff in the pull request: https://github.com/apache/spark/pull/19551#discussion_r146345486 --- Diff: R/pkg/tests/fulltests/test_sparkSQL.R --- @@ -499,6 +499,12 @@ test_that("create DataFrame with different data types", { ex

[GitHub] spark issue #19551: [SPARK-17902][R] Revive stringsAsFactors option for coll...

2017-10-23 Thread falaki
Github user falaki commented on the issue: https://github.com/apache/spark/pull/19551 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #19551: [SPARK-17902][R] Revive stringsAsFactors option f...

2017-10-23 Thread falaki
Github user falaki commented on a diff in the pull request: https://github.com/apache/spark/pull/19551#discussion_r146345075 --- Diff: R/pkg/tests/fulltests/test_sparkSQL.R --- @@ -499,6 +499,12 @@ test_that("create DataFrame with different data types", { ex

[GitHub] spark pull request #19551: [WIP][SPARK-17902][R] Revive stringsAsFactors opt...

2017-10-22 Thread falaki
Github user falaki commented on a diff in the pull request: https://github.com/apache/spark/pull/19551#discussion_r146160321 --- Diff: R/pkg/tests/fulltests/test_sparkSQL.R --- @@ -499,6 +499,12 @@ test_that("create DataFrame with different data types", { ex

[GitHub] spark pull request #19551: [WIP][SPARK-17902][R] Revive stringsAsFactors opt...

2017-10-22 Thread falaki
Github user falaki commented on a diff in the pull request: https://github.com/apache/spark/pull/19551#discussion_r146160421 --- Diff: R/pkg/R/DataFrame.R --- @@ -1191,6 +1191,9 @@ setMethod("collect", vec <- d

[GitHub] spark issue #19342: [MINOR][SparkR] minor fixes for CRAN compliance

2017-09-25 Thread falaki
Github user falaki commented on the issue: https://github.com/apache/spark/pull/19342 Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #19023: Add R interface of binarizer

2017-08-22 Thread falaki
Github user falaki commented on the issue: https://github.com/apache/spark/pull/19023 I suggest we look at this problem holistically. Basically what is missing is MLLib pipelines. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request #18532: [SPARK-21263][SQL] Do not allow partially parsing...

2017-07-05 Thread falaki
Github user falaki commented on a diff in the pull request: https://github.com/apache/spark/pull/18532#discussion_r125763800 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityParserSuite.scala --- @@ -130,17 +130,17 @@ class

[GitHub] spark pull request #18532: [SPARK-21263][SQL] Do not allow partially parsing...

2017-07-05 Thread falaki
Github user falaki commented on a diff in the pull request: https://github.com/apache/spark/pull/18532#discussion_r125763819 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityParserSuite.scala --- @@ -130,17 +130,17 @@ class

[GitHub] spark pull request #18532: [SPARK-21263][SQL] Do not allow partially parsing...

2017-07-05 Thread falaki
Github user falaki commented on a diff in the pull request: https://github.com/apache/spark/pull/18532#discussion_r125764216 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala --- @@ -1174,4 +1174,25 @@ class CSVSuite extends QueryTest

[GitHub] spark pull request #18532: [SPARK-21263][SQL] Do not allow partially parsing...

2017-07-04 Thread falaki
Github user falaki commented on a diff in the pull request: https://github.com/apache/spark/pull/18532#discussion_r125537110 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala --- @@ -1174,4 +1174,25 @@ class CSVSuite extends QueryTest

[GitHub] spark issue #14431: [SPARK-16258][SparkR] Automatically append the grouping ...

2017-06-30 Thread falaki
Github user falaki commented on the issue: https://github.com/apache/spark/pull/14431 If we want to avoid yet another method, we could add this functionality as a non-default behavior. E.g., ``` gapply(df, "key", function(key, x) { x }, schema(df), appe

[GitHub] spark issue #14431: [SPARK-16258][SparkR] Automatically append the grouping ...

2017-06-30 Thread falaki
Github user falaki commented on the issue: https://github.com/apache/spark/pull/14431 @NarineK how about adding this as a new API e.g., `gapplyWithKeys()`. I am extremely worried about the semantic change. It can break existing SparkR applications and will be confusing for users

[GitHub] spark issue #17941: [SPARK-20684][R] Expose createGlobalTempView and dropGlo...

2017-05-16 Thread falaki
Github user falaki commented on the issue: https://github.com/apache/spark/pull/17941 @felixcheung How about before `sparkR.session.stop()`? The use case I am trying to recover (which is broken now) is sharing table in SparkR and working on it in a different language. --- If your

[GitHub] spark issue #17941: [SPARK-20684][R] Expose createGlobalTempView and dropGlo...

2017-05-14 Thread falaki
Github user falaki commented on the issue: https://github.com/apache/spark/pull/17941 @felixcheung we all know that SparkR (and in general R) API is not perfect when it comes to ETLing unstructured data. For example we don't have a great story for nested data, etc. To overcome

[GitHub] spark issue #17941: [SPARK-20684][R] Expose createGlobalTempView and dropGlo...

2017-05-14 Thread falaki
Github user falaki commented on the issue: https://github.com/apache/spark/pull/17941 @dongjoon-hyun and @felixcheung just to confirm, yes we do share SparkContext across languages in Databricks. I think this is a useful API in general. Note that when user creates spark

[GitHub] spark pull request #17941: [SPARK-20684][R] Expose createGlobalTempView and ...

2017-05-10 Thread falaki
Github user falaki commented on a diff in the pull request: https://github.com/apache/spark/pull/17941#discussion_r115872802 --- Diff: R/pkg/R/DataFrame.R --- @@ -501,6 +501,34 @@ setMethod("createOrReplaceTempView", invisible(callJMe

[GitHub] spark pull request #17941: [SPARK-20684][R] Expose createGlobalTempView and ...

2017-05-10 Thread falaki
Github user falaki commented on a diff in the pull request: https://github.com/apache/spark/pull/17941#discussion_r115873162 --- Diff: R/pkg/inst/tests/testthat/test_sparkSQL.R --- @@ -721,6 +721,16 @@ test_that( expect_true(dropTempView("d

[GitHub] spark issue #17905: [SPARK-20661][SPARKR][TEST][FOLLOWUP] SparkR tableNames(...

2017-05-08 Thread falaki
Github user falaki commented on the issue: https://github.com/apache/spark/pull/17905 @felixcheung this approach is fine, but I think it is better if unit tests do not leave any side-effects to begin with. In this case every test should clean up state before and after (similar

[GitHub] spark issue #17903: [SPARK-20661][SparkR][Test] SparkR tableNames() test fai...

2017-05-08 Thread falaki
Github user falaki commented on the issue: https://github.com/apache/spark/pull/17903 @shivaram it seems to have started today with this build: https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-master-test-sbt-hadoop-2.7/2844/consoleFull The build just

[GitHub] spark pull request #17903: [SPARK-20661][SparkR][Test] SparkR tableNames() t...

2017-05-08 Thread falaki
GitHub user falaki opened a pull request: https://github.com/apache/spark/pull/17903 [SPARK-20661][SparkR][Test] SparkR tableNames() test fails ## What changes were proposed in this pull request? Cleaning existing temp tables before running tableNames tests ## How

[GitHub] spark pull request #17423: [SPARK-20088] Do not create new SparkContext in S...

2017-03-24 Thread falaki
GitHub user falaki opened a pull request: https://github.com/apache/spark/pull/17423 [SPARK-20088] Do not create new SparkContext in SparkR createSparkContext ## What changes were proposed in this pull request? Instead of creating new `JavaSparkContext` we use

[GitHub] spark issue #16611: [SPARK-17967][SPARK-17878][SQL][PYTHON] Support for arra...

2017-01-17 Thread falaki
Github user falaki commented on the issue: https://github.com/apache/spark/pull/16611 @HyukjinKwon as I laid out in the JIRA a major problem with this approach for specifying multiple options is that it won't work in DDL. What is wrong with having a numbered list. E.g.: `nullValue1

[GitHub] spark pull request #16154: [SPARK-17822] [R] Make JVMObjectTracker a member ...

2016-12-06 Thread falaki
Github user falaki commented on a diff in the pull request: https://github.com/apache/spark/pull/16154#discussion_r91173424 --- Diff: core/src/main/scala/org/apache/spark/api/r/SerDe.scala --- @@ -272,18 +282,22 @@ private[spark] object SerDe

[GitHub] spark pull request #16154: [SPARK-17822] [R] Make JVMObjectTracker a member ...

2016-12-06 Thread falaki
Github user falaki commented on a diff in the pull request: https://github.com/apache/spark/pull/16154#discussion_r91169543 --- Diff: core/src/main/scala/org/apache/spark/api/r/RBackendHandler.scala --- @@ -143,12 +142,8 @@ private[r] class RBackendHandler(server: RBackend

[GitHub] spark pull request #16154: [SPARK-17822] [R] Make JVMObjectTracker a member ...

2016-12-06 Thread falaki
Github user falaki commented on a diff in the pull request: https://github.com/apache/spark/pull/16154#discussion_r91171383 --- Diff: core/src/main/scala/org/apache/spark/api/r/JVMObjectTracker.scala --- @@ -0,0 +1,65 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #16154: [SPARK-17822] [R] Make JVMObjectTracker a member ...

2016-12-06 Thread falaki
Github user falaki commented on a diff in the pull request: https://github.com/apache/spark/pull/16154#discussion_r91170861 --- Diff: core/src/main/scala/org/apache/spark/api/r/SerDe.scala --- @@ -272,18 +282,22 @@ private[spark] object SerDe

[GitHub] spark issue #11336: [SPARK-9325][SPARK-R] head() and show() for Columns

2016-11-29 Thread falaki
Github user falaki commented on the issue: https://github.com/apache/spark/pull/11336 I did another pass. It looks good to me. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15471: [SPARK-17919] Make timeout to RBackend configurable in S...

2016-10-29 Thread falaki
Github user falaki commented on the issue: https://github.com/apache/spark/pull/15471 @shivaram that is fine. We can merge it to 2.1 (or whatever the next major release is going to be). --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark issue #15471: [SPARK-17919] Make timeout to RBackend configurable in S...

2016-10-28 Thread falaki
Github user falaki commented on the issue: https://github.com/apache/spark/pull/15471 @shivaram is there a chance this makes it to the 2.0.2 release? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark issue #15471: [SPARK-17919] Make timeout to RBackend configurable in S...

2016-10-27 Thread falaki
Github user falaki commented on the issue: https://github.com/apache/spark/pull/15471 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #15471: [SPARK-17919] Make timeout to RBackend configurable in S...

2016-10-26 Thread falaki
Github user falaki commented on the issue: https://github.com/apache/spark/pull/15471 @shivaram sorry for delay getting back to this. Please take another look. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request #15471: [SPARK-17919] Make timeout to RBackend configurab...

2016-10-26 Thread falaki
Github user falaki commented on a diff in the pull request: https://github.com/apache/spark/pull/15471#discussion_r85230441 --- Diff: core/src/main/scala/org/apache/spark/api/r/RBackendHandler.scala --- @@ -83,7 +86,29 @@ private[r] class RBackendHandler(server: RBackend

[GitHub] spark pull request #15471: [SPARK-17919] Make timeout to RBackend configurab...

2016-10-26 Thread falaki
Github user falaki commented on a diff in the pull request: https://github.com/apache/spark/pull/15471#discussion_r85229665 --- Diff: core/src/main/scala/org/apache/spark/api/r/RBackend.scala --- @@ -110,6 +115,11 @@ private[spark] object RBackend extends Logging { val

[GitHub] spark issue #15421: [SPARK-17811] SparkR cannot parallelize data.frame with ...

2016-10-21 Thread falaki
Github user falaki commented on the issue: https://github.com/apache/spark/pull/15421 Ping. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #15471: [SPARK-17919] Make timeout to RBackend configurable in S...

2016-10-20 Thread falaki
Github user falaki commented on the issue: https://github.com/apache/spark/pull/15471 Thanks @shivaram. I ran a real workload consisting of long running parallel simulations that took about 3.5 hours. I also tested it by calling `Sys.sleep()` inside workers with dapply

[GitHub] spark issue #15471: [SPARK-17919] Make timeout to RBackend configurable in S...

2016-10-20 Thread falaki
Github user falaki commented on the issue: https://github.com/apache/spark/pull/15471 Thanks @felixcheung addressed your comments. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #15421: [SPARK-17811] SparkR cannot parallelize data.frame with ...

2016-10-19 Thread falaki
Github user falaki commented on the issue: https://github.com/apache/spark/pull/15421 @wangmiao1981 could that behavior with `list` relate to the issue in this ticket? https://issues.apache.org/jira/browse/SPARK-17781 --- If your project is set up for it, you can reply to this email

[GitHub] spark issue #15421: [SPARK-17811] SparkR cannot parallelize data.frame with ...

2016-10-19 Thread falaki
Github user falaki commented on the issue: https://github.com/apache/spark/pull/15421 @felixcheung you mean test for parallelizing NAs and getting them back? The patch includes that test. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request #15421: [SPARK-17811] SparkR cannot parallelize data.fram...

2016-10-19 Thread falaki
Github user falaki commented on a diff in the pull request: https://github.com/apache/spark/pull/15421#discussion_r84117167 --- Diff: core/src/main/scala/org/apache/spark/api/r/SerDe.scala --- @@ -125,15 +125,34 @@ private[spark] object SerDe { } def readDate

[GitHub] spark issue #15421: [SPARK-17811] SparkR cannot parallelize data.frame with ...

2016-10-19 Thread falaki
Github user falaki commented on the issue: https://github.com/apache/spark/pull/15421 @shivaram I removed catching `NegativeArraySizeException` from this PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #15471: [SPARK-17919] Make timeout to RBackend configurable in S...

2016-10-18 Thread falaki
Github user falaki commented on the issue: https://github.com/apache/spark/pull/15471 @shivaram it was indeed my fault. I did not run local tests after I added the heartbeat. I am now using +1 for heartbeat. --- If your project is set up for it, you can reply to this email and have

[GitHub] spark issue #15421: [SPARK-17811] SparkR cannot parallelize data.frame with ...

2016-10-18 Thread falaki
Github user falaki commented on the issue: https://github.com/apache/spark/pull/15421 Thanks @wangmiao1981. I thin it is best to file a separate JIRA for this issue. Thanks a lot for catching it. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark issue #15421: [SPARK-17811] SparkR cannot parallelize data.frame with ...

2016-10-18 Thread falaki
Github user falaki commented on the issue: https://github.com/apache/spark/pull/15421 @wangmiao1981 would you please also test the master branch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark issue #15471: [SPARK-17919] Make timeout to RBackend configurable in S...

2016-10-17 Thread falaki
Github user falaki commented on the issue: https://github.com/apache/spark/pull/15471 @shivaram I think they are unrelated. Can you trigger another test? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark issue #15421: [SPARK-17811] SparkR cannot parallelize data.frame with ...

2016-10-17 Thread falaki
Github user falaki commented on the issue: https://github.com/apache/spark/pull/15421 @shivaram and @felixcheung do I need to do more on this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #15471: [SPARK-17919] Make timeout to RBackend configurable in S...

2016-10-17 Thread falaki
Github user falaki commented on the issue: https://github.com/apache/spark/pull/15471 @shivaram this worked on my stress tests. The question is how to unit test this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request #12904: [SPARK-15125][SQL] Changing CSV data source mappi...

2016-10-14 Thread falaki
Github user falaki commented on a diff in the pull request: https://github.com/apache/spark/pull/12904#discussion_r83341979 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVParser.scala --- @@ -46,6 +46,7 @@ private[sql] abstract class CsvReader

[GitHub] spark pull request #12904: [SPARK-15125][SQL] Changing CSV data source mappi...

2016-10-14 Thread falaki
Github user falaki commented on a diff in the pull request: https://github.com/apache/spark/pull/12904#discussion_r83342029 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala --- @@ -555,4 +558,37 @@ class CSVSuite extends QueryTest

[GitHub] spark issue #15471: [WIP][SPARK-17919] Make timeout to RBackend configurable...

2016-10-14 Thread falaki
Github user falaki commented on the issue: https://github.com/apache/spark/pull/15471 @shivaram thanks for clarification. I realized we were not setting socket timeout on Netty socket. So I added that as well. I also introduced the heartbeat mechanism and tested it locally

[GitHub] spark issue #15421: [SPARK-17811] SparkR cannot parallelize data.frame with ...

2016-10-14 Thread falaki
Github user falaki commented on the issue: https://github.com/apache/spark/pull/15421 @wandjenkins thanks! It is interesting with R 3.3.1 it worked! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark issue #15471: [WIP][SPARK-17919] Make timeout to RBackend configurable...

2016-10-14 Thread falaki
Github user falaki commented on the issue: https://github.com/apache/spark/pull/15471 @felixcheung that is a good suggestion. I will try to use a single constant. I changed the label to WIP because something is still timing out the connection in my tests. Maybe very long timeouts

[GitHub] spark issue #15421: [SPARK-17811] SparkR cannot parallelize data.frame with ...

2016-10-14 Thread falaki
Github user falaki commented on the issue: https://github.com/apache/spark/pull/15421 @wandjenkins yes, I think there must be some non-standard issue on your system. I tested on Mac and Linux with different version of R and passed the test case. --- If your project is set up

[GitHub] spark issue #15421: [SPARK-17811] SparkR cannot parallelize data.frame with ...

2016-10-14 Thread falaki
Github user falaki commented on the issue: https://github.com/apache/spark/pull/15421 @felixcheung and @shivaram AppVeyor test that @HyukjinKwon kicked passed and jenkins passed too. Do you think this is ready? --- If your project is set up for it, you can reply to this email

[GitHub] spark issue #15421: [SPARK-17811] SparkR cannot parallelize data.frame with ...

2016-10-13 Thread falaki
Github user falaki commented on the issue: https://github.com/apache/spark/pull/15421 @wangmiao1981 you are talking about Windows right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #15421: [SPARK-17811] SparkR cannot parallelize data.frame with ...

2016-10-13 Thread falaki
Github user falaki commented on the issue: https://github.com/apache/spark/pull/15421 I just tried the patch on R version 3.3.1 (2016-06-21) -- "Bug in Your Hair" on Linux and it passed tests. @HyukjinKwon how can kick another AppVeyor test? --- If your project

[GitHub] spark pull request #15471: [SPARK-17919] Make timeout to RBackend configurab...

2016-10-13 Thread falaki
GitHub user falaki opened a pull request: https://github.com/apache/spark/pull/15471 [SPARK-17919] Make timeout to RBackend configurable in SparkR ## What changes were proposed in this pull request? This patch makes RBackend connection timeout configurable by user

[GitHub] spark issue #15421: [SPARK-17811] SparkR cannot parallelize data.frame with ...

2016-10-12 Thread falaki
Github user falaki commented on the issue: https://github.com/apache/spark/pull/15421 @felixcheung and @wangmiao1981 thanks! This is good point. I will try testing it on different version of R. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request #15421: [SPARK-17811] SparkR cannot parallelize data.fram...

2016-10-11 Thread falaki
Github user falaki commented on a diff in the pull request: https://github.com/apache/spark/pull/15421#discussion_r82940884 --- Diff: core/src/main/scala/org/apache/spark/api/r/SerDe.scala --- @@ -125,15 +125,34 @@ private[spark] object SerDe { } def readDate

[GitHub] spark issue #15446: [SPARK-17882][SparkR] Fix swallowed exception in RBacken...

2016-10-11 Thread falaki
Github user falaki commented on the issue: https://github.com/apache/spark/pull/15446 @shivaram yes I just noticed it during my debugging and fixed it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark issue #15375: [SPARK-17790][SPARKR] Support for parallelizing R data.f...

2016-10-11 Thread falaki
Github user falaki commented on the issue: https://github.com/apache/spark/pull/15375 Seems like a flaky test in `DirectKafkaStreamSuite`: ``` DirectKafkaStreamSuite: - pattern based subscription *** FAILED *** (1 minute, 41 seconds) ``` If jenkins listens

[GitHub] spark issue #15421: [SPARK-17811] SparkR cannot parallelize data.frame with ...

2016-10-11 Thread falaki
Github user falaki commented on the issue: https://github.com/apache/spark/pull/15421 @wangmiao1981 I don't get the exception that you reported on Mac. Also note that the unit test is passing on Linux. I am not sure why returning null is an issue. --- If your project is set up

[GitHub] spark issue #15375: [SPARK-17790][SPARKR] Support for parallelizing R data.f...

2016-10-11 Thread falaki
Github user falaki commented on the issue: https://github.com/apache/spark/pull/15375 @felixcheung does it look OK now? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #15421: [SPARK-17811] SparkR cannot parallelize data.frame with ...

2016-10-11 Thread falaki
Github user falaki commented on the issue: https://github.com/apache/spark/pull/15421 My guess is that on windows R serialization behaves differently and serializes `NA` as `null`. Unfortunately, I don't have a windows machine to verify. Would you please test that? --- If your

[GitHub] spark issue #11336: [SPARK-9325][SPARK-R] collect() head() and show() for Co...

2016-10-11 Thread falaki
Github user falaki commented on the issue: https://github.com/apache/spark/pull/11336 I think some of the old tests still rely on `sparkRSQL.init()`. I believe the warning is OK. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request #11336: [SPARK-9325][SPARK-R] collect() head() and show()...

2016-10-11 Thread falaki
Github user falaki commented on a diff in the pull request: https://github.com/apache/spark/pull/11336#discussion_r82888215 --- Diff: R/pkg/inst/tests/testthat/test_sparkSQL.R --- @@ -2252,6 +2252,31 @@ test_that("Method str()", { expect_equal(capture.output(utils:

[GitHub] spark pull request #11336: [SPARK-9325][SPARK-R] collect() head() and show()...

2016-10-11 Thread falaki
Github user falaki commented on a diff in the pull request: https://github.com/apache/spark/pull/11336#discussion_r82887849 --- Diff: R/pkg/R/column.R --- @@ -32,35 +34,65 @@ setOldClass("jobj") #' @export #' @note Column since 1.4.0 setCla

[GitHub] spark pull request #11336: [SPARK-9325][SPARK-R] collect() head() and show()...

2016-10-11 Thread falaki
Github user falaki commented on a diff in the pull request: https://github.com/apache/spark/pull/11336#discussion_r82888061 --- Diff: R/pkg/R/column.R --- @@ -32,35 +34,65 @@ setOldClass("jobj") #' @export #' @note Column since 1.4.0 setCla

[GitHub] spark pull request #11336: [SPARK-9325][SPARK-R] collect() head() and show()...

2016-10-11 Thread falaki
Github user falaki commented on a diff in the pull request: https://github.com/apache/spark/pull/11336#discussion_r82887445 --- Diff: R/pkg/R/DataFrame.R --- @@ -1035,10 +1035,17 @@ setMethod("dim", c(count(x), ncol(x)) }) -#' Co

[GitHub] spark issue #15421: [SPARK-17811] SparkR cannot parallelize data.frame with ...

2016-10-11 Thread falaki
Github user falaki commented on the issue: https://github.com/apache/spark/pull/15421 @wangmiao1981 thanks for testing on Windows. I added a check for this. Would you please try again and let me know? Unfortunately, I don't have access to a windows box for testing. --- If your

[GitHub] spark pull request #15421: [SPARK-17811] SparkR cannot parallelize data.fram...

2016-10-11 Thread falaki
Github user falaki commented on a diff in the pull request: https://github.com/apache/spark/pull/15421#discussion_r82864117 --- Diff: core/src/main/scala/org/apache/spark/api/r/SerDe.scala --- @@ -125,15 +125,24 @@ private[spark] object SerDe { } def readDate

[GitHub] spark pull request #15421: [SPARK-17811] SparkR cannot parallelize data.fram...

2016-10-11 Thread falaki
Github user falaki commented on a diff in the pull request: https://github.com/apache/spark/pull/15421#discussion_r82814431 --- Diff: R/pkg/DESCRIPTION --- @@ -11,7 +11,8 @@ Authors@R: c(person("Shivaram", "Venkataraman", role = c("aut", "cr

[GitHub] spark issue #15421: [SPARK-17811] SparkR cannot parallelize data.frame with ...

2016-10-10 Thread falaki
Github user falaki commented on the issue: https://github.com/apache/spark/pull/15421 @shivaram can I nominate this patch for 2.0 branch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #11336: [SPARK-9325][SPARK-R] collect() head() and show() for Co...

2016-10-10 Thread falaki
Github user falaki commented on the issue: https://github.com/apache/spark/pull/11336 @olarayej any update not his. If you are busy I can start another PR from yours. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request #15421: [SPARK-17811] SparkR cannot parallelize data.fram...

2016-10-10 Thread falaki
GitHub user falaki opened a pull request: https://github.com/apache/spark/pull/15421 [SPARK-17811] SparkR cannot parallelize data.frame with NA or NULL in Date columns ## What changes were proposed in this pull request? NA date values are serialized as "NA" and NA t

[GitHub] spark pull request #15375: [SPARK-17790] Support for parallelizing R data.fr...

2016-10-07 Thread falaki
Github user falaki commented on a diff in the pull request: https://github.com/apache/spark/pull/15375#discussion_r82462875 --- Diff: R/pkg/R/context.R --- @@ -126,13 +126,13 @@ parallelize <- function(sc, coll, numSlices = 1) { if (numSlices > lengt

[GitHub] spark pull request #15375: [SPARK-17790] Support for parallelizing R data.fr...

2016-10-07 Thread falaki
Github user falaki commented on a diff in the pull request: https://github.com/apache/spark/pull/15375#discussion_r82446808 --- Diff: R/pkg/R/context.R --- @@ -123,19 +126,48 @@ parallelize <- function(sc, coll, numSlices = 1) { if (numSlices > lengt

[GitHub] spark issue #15375: [SPARK-17790] Support for parallelizing R data.frame lar...

2016-10-06 Thread falaki
Github user falaki commented on the issue: https://github.com/apache/spark/pull/15375 @felixcheung added clean up for the temp file and unit test. PTAL. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark issue #11336: [SPARK-9325][SPARK-R] collect() head() and show() for Co...

2016-10-06 Thread falaki
Github user falaki commented on the issue: https://github.com/apache/spark/pull/11336 @olarayej are you interested in rebasing this PR and limiting its user impact to just `head` for now? We can continue the discussion on support for distributed vector (a.k.a. `Column`) in JIRA

[GitHub] spark pull request #15375: [SPARK-17790] Support for parallelizing R data.fr...

2016-10-06 Thread falaki
GitHub user falaki opened a pull request: https://github.com/apache/spark/pull/15375 [SPARK-17790] Support for parallelizing R data.frame larger than 2GB ## What changes were proposed in this pull request? If the R data structure that is being parallelized is larger than

[GitHub] spark pull request #15328: [SPARKR][SPARK-17762] invokeJava fails when seria...

2016-10-05 Thread falaki
Github user falaki closed the pull request at: https://github.com/apache/spark/pull/15328 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark issue #15328: [SPARKR][SPARK-17762] invokeJava fails when serialized a...

2016-10-05 Thread falaki
Github user falaki commented on the issue: https://github.com/apache/spark/pull/15328 Closing this, as it will not address the issue. I will work on it under SPARK-17790 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request #15328: [SPARKR][SPARK-17762] invokeJava fails when seria...

2016-10-05 Thread falaki
GitHub user falaki reopened a pull request: https://github.com/apache/spark/pull/15328 [SPARKR][SPARK-17762] invokeJava fails when serialized argument list is larger than INT_MAX bytes ## What changes were proposed in this pull request? * Updates implementation of `writeRaw

[GitHub] spark pull request #15328: [SPARKR][SPARK-17762] invokeJava fails when seria...

2016-10-05 Thread falaki
Github user falaki closed the pull request at: https://github.com/apache/spark/pull/15328 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark issue #15328: [SPARKR][SPARK-17762] invokeJava fails when serialized a...

2016-10-04 Thread falaki
Github user falaki commented on the issue: https://github.com/apache/spark/pull/15328 A user ran into this limit when trying to parallelize a fairly large R data.frame. The user has extensive logic implemented in R on that data.frame and migrating it over to SparkDataFrame API

[GitHub] spark issue #15328: [SPARKR][SPARK-17762] invokeJava fails when serialized a...

2016-10-04 Thread falaki
Github user falaki commented on the issue: https://github.com/apache/spark/pull/15328 @shivaram thanks for pointing it. INT_MAX is indeed a limit in R as well. I will update `RBackendHandler` to use lists. Sounds good? --- If your project is set up for it, you can reply

[GitHub] spark issue #15328: [SPARKR][SPARK-17762] invokeJava fails when serialized a...

2016-10-03 Thread falaki
Github user falaki commented on the issue: https://github.com/apache/spark/pull/15328 @shivaram added unit tests. On Java Array limitation, we deserialize all of the arguments as one Array[Object]. So if even one of the arguments is larger than `INT_MAX` we will fail on the R

  1   2   3   >