[GitHub] spark issue #14783: SPARK-16785 R dapply doesn't return array or raw columns

2017-06-13 Thread clarkfitzg
Github user clarkfitzg commented on the issue: https://github.com/apache/spark/pull/14783 This patch only handled the raw columns, not the vector / array value columns. So maybe that original JIRA should still be open, or create another one specific to this. --- If your project

[GitHub] spark issue #14783: SPARK-16785 R dapply doesn't return array or raw columns

2016-09-07 Thread clarkfitzg
Github user clarkfitzg commented on the issue: https://github.com/apache/spark/pull/14783 Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #14783: SPARK-16785 R dapply doesn't return array or raw ...

2016-09-07 Thread clarkfitzg
Github user clarkfitzg commented on a diff in the pull request: https://github.com/apache/spark/pull/14783#discussion_r77764184 --- Diff: R/pkg/inst/tests/testthat/test_utils.R --- @@ -183,4 +183,28 @@ test_that("overrideEnvs", { expect_equal(config[["conf

[GitHub] spark pull request #14783: SPARK-16785 R dapply doesn't return array or raw ...

2016-09-06 Thread clarkfitzg
Github user clarkfitzg commented on a diff in the pull request: https://github.com/apache/spark/pull/14783#discussion_r77763275 --- Diff: R/pkg/R/utils.R --- @@ -697,3 +697,18 @@ is_master_local <- function(master) { is_sparkR_shell <- function() { grepl("

[GitHub] spark pull request #14783: SPARK-16785 R dapply doesn't return array or raw ...

2016-09-06 Thread clarkfitzg
Github user clarkfitzg commented on a diff in the pull request: https://github.com/apache/spark/pull/14783#discussion_r77760776 --- Diff: R/pkg/R/utils.R --- @@ -697,3 +697,18 @@ is_master_local <- function(master) { is_sparkR_shell <- function() { grepl("

[GitHub] spark issue #14783: SPARK-16785 R dapply doesn't return array or raw columns

2016-09-06 Thread clarkfitzg
Github user clarkfitzg commented on the issue: https://github.com/apache/spark/pull/14783 I'm presenting something related to this on Thursday- it would be nice to tell the audience this patch made it in. Can I do anything to help this along? --- If your project is set up

[GitHub] spark issue #14783: SPARK-16785 R dapply doesn't return array or raw columns

2016-08-31 Thread clarkfitzg
Github user clarkfitzg commented on the issue: https://github.com/apache/spark/pull/14783 Yes, this is only for a bug fix. @shivaram mentioned in a previous email exchange it would be good to see some performance benchmarks as well. --- If your project is set up for it, you can

[GitHub] spark issue #14783: SPARK-16785 R dapply doesn't return array or raw columns

2016-08-30 Thread clarkfitzg
Github user clarkfitzg commented on the issue: https://github.com/apache/spark/pull/14783 @shivaram what do you think? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #14783: SPARK-16785 R dapply doesn't return array or raw columns

2016-08-29 Thread clarkfitzg
Github user clarkfitzg commented on the issue: https://github.com/apache/spark/pull/14783 Tried some more benchmarks today. Didn't see any difference in speed before / after patch. Observing the processes as they run I see the vast majority of time spent in the local R process, while

[GitHub] spark issue #14783: SPARK-16785 R dapply doesn't return array or raw columns

2016-08-25 Thread clarkfitzg
Github user clarkfitzg commented on the issue: https://github.com/apache/spark/pull/14783 Not sure why these timings are so bad. Found out today that by using bytes and calling directly into Java's `org.apache.spark.api.r.RRDD` these can be improved by 2 orders of magnitude

[GitHub] spark issue #14783: SPARK-16785 R dapply doesn't return array or raw columns

2016-08-24 Thread clarkfitzg
Github user clarkfitzg commented on the issue: https://github.com/apache/spark/pull/14783 This change doesn't appear to make any difference in speed. ``` # Wed Aug 24 14:12:12 KST 2016 # Benchmarking performance before and after dapplyCollect patch

[GitHub] spark pull request #14783: SPARK-16785 R dapply doesn't return array or raw ...

2016-08-24 Thread clarkfitzg
Github user clarkfitzg commented on a diff in the pull request: https://github.com/apache/spark/pull/14783#discussion_r76007525 --- Diff: R/pkg/inst/worker/worker.R --- @@ -36,7 +36,14 @@ compute <- function(mode, partition, serializer, deserializer, key, # availa

[GitHub] spark pull request #14783: SPARK-16785 R dapply doesn't return array or raw ...

2016-08-24 Thread clarkfitzg
Github user clarkfitzg commented on a diff in the pull request: https://github.com/apache/spark/pull/14783#discussion_r76007311 --- Diff: R/pkg/inst/worker/worker.R --- @@ -36,7 +36,14 @@ compute <- function(mode, partition, serializer, deserializer, key, # availa

[GitHub] spark pull request #14783: SPARK-16785 R dapply doesn't return array or raw ...

2016-08-24 Thread clarkfitzg
Github user clarkfitzg commented on a diff in the pull request: https://github.com/apache/spark/pull/14783#discussion_r76004770 --- Diff: R/pkg/inst/worker/worker.R --- @@ -36,7 +36,14 @@ compute <- function(mode, partition, serializer, deserializer, key, # availa

[GitHub] spark pull request #14783: SPARK-16785 R dapply doesn't return array or raw ...

2016-08-24 Thread clarkfitzg
Github user clarkfitzg commented on a diff in the pull request: https://github.com/apache/spark/pull/14783#discussion_r76004521 --- Diff: R/pkg/inst/worker/worker.R --- @@ -36,7 +36,14 @@ compute <- function(mode, partition, serializer, deserializer, key, # availa

[GitHub] spark issue #14783: SPARK-16785 R dapply doesn't return array or raw columns

2016-08-24 Thread clarkfitzg
Github user clarkfitzg commented on the issue: https://github.com/apache/spark/pull/14783 My pleasure. Let me know if / when I should squash these commits or rebase. Working on some before and after benchmarks now. --- If your project is set up for it, you can reply

[GitHub] spark pull request #14783: SPARK-16785 R dapply doesn't return array or raw ...

2016-08-24 Thread clarkfitzg
GitHub user clarkfitzg opened a pull request: https://github.com/apache/spark/pull/14783 SPARK-16785 R dapply doesn't return array or raw columns ## What changes were proposed in this pull request? Fixed bug in `dapplyCollect` by changing the `compute` function