Github user falaki commented on the issue:
https://github.com/apache/spark/pull/22455
@adrian555 yes, that looks good. Thank you!
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user falaki commented on the issue:
https://github.com/apache/spark/pull/22455
@adrian555 These all great points. My high-level was enabling other
platforms (e.g., Jupyter) to plugin more advanced (custom) functions for
displaying SparkDataFrame. If the framework does not set
Github user falaki commented on the issue:
https://github.com/apache/spark/pull/22455
@adrian555 thanks for submitting this. Can we have a config to set the
default `print` function in eager mode. It can default to `show`, but I can
imagine it is useful to make it configurable
Github user falaki commented on the issue:
https://github.com/apache/spark/pull/5
LGTM.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user falaki commented on the issue:
https://github.com/apache/spark/pull/20005
Thank you guys.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user falaki commented on a diff in the pull request:
https://github.com/apache/spark/pull/19959#discussion_r157351383
--- Diff: dev/lint-r.R ---
@@ -27,10 +27,11 @@ if (! library(SparkR, lib.loc = LOCAL_LIB_LOC,
logical.return = TRUE)) {
# Installs lintr from Github
Github user falaki commented on the issue:
https://github.com/apache/spark/pull/19959
@shivaram are there other concerns with this patch?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
Github user falaki commented on a diff in the pull request:
https://github.com/apache/spark/pull/19959#discussion_r156820420
--- Diff: dev/lint-r.R ---
@@ -27,10 +27,11 @@ if (! library(SparkR, lib.loc = LOCAL_LIB_LOC,
logical.return = TRUE)) {
# Installs lintr from Github
Github user falaki commented on a diff in the pull request:
https://github.com/apache/spark/pull/19959#discussion_r156789375
--- Diff: dev/lint-r.R ---
@@ -27,10 +27,11 @@ if (! library(SparkR, lib.loc = LOCAL_LIB_LOC,
logical.return = TRUE)) {
# Installs lintr from Github
Github user falaki commented on a diff in the pull request:
https://github.com/apache/spark/pull/19959#discussion_r156551105
--- Diff: dev/lint-r.R ---
@@ -27,10 +27,11 @@ if (! library(SparkR, lib.loc = LOCAL_LIB_LOC,
logical.return = TRUE)) {
# Installs lintr from Github
Github user falaki commented on a diff in the pull request:
https://github.com/apache/spark/pull/19959#discussion_r156528729
--- Diff: dev/lint-r.R ---
@@ -27,10 +27,11 @@ if (! library(SparkR, lib.loc = LOCAL_LIB_LOC,
logical.return = TRUE)) {
# Installs lintr from Github
Github user falaki commented on the issue:
https://github.com/apache/spark/pull/19959
@JoshRosen the SparkR package is built from `spark/R/pkg` directory.
`$SPARK_HOME/R/lib` is just a place where `SparkR` is installed (after it has
been built) and at runtime loaded from
GitHub user falaki opened a pull request:
https://github.com/apache/spark/pull/19959
[SPARK-22766] Install R linter package in spark lib directory
## What changes were proposed in this pull request?
`dev/lint-r.R` file installs uses `devtools` to install `jimhester/lintr
Github user falaki commented on a diff in the pull request:
https://github.com/apache/spark/pull/19551#discussion_r146345486
--- Diff: R/pkg/tests/fulltests/test_sparkSQL.R ---
@@ -499,6 +499,12 @@ test_that("create DataFrame with different data
types", {
ex
Github user falaki commented on the issue:
https://github.com/apache/spark/pull/19551
LGTM
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user falaki commented on a diff in the pull request:
https://github.com/apache/spark/pull/19551#discussion_r146345075
--- Diff: R/pkg/tests/fulltests/test_sparkSQL.R ---
@@ -499,6 +499,12 @@ test_that("create DataFrame with different data
types", {
ex
Github user falaki commented on a diff in the pull request:
https://github.com/apache/spark/pull/19551#discussion_r146160321
--- Diff: R/pkg/tests/fulltests/test_sparkSQL.R ---
@@ -499,6 +499,12 @@ test_that("create DataFrame with different data
types", {
ex
Github user falaki commented on a diff in the pull request:
https://github.com/apache/spark/pull/19551#discussion_r146160421
--- Diff: R/pkg/R/DataFrame.R ---
@@ -1191,6 +1191,9 @@ setMethod("collect",
vec <- d
Github user falaki commented on the issue:
https://github.com/apache/spark/pull/19342
Thanks!
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user falaki commented on the issue:
https://github.com/apache/spark/pull/19023
I suggest we look at this problem holistically. Basically what is missing
is MLLib pipelines.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub
Github user falaki commented on a diff in the pull request:
https://github.com/apache/spark/pull/18532#discussion_r125763800
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityParserSuite.scala
---
@@ -130,17 +130,17 @@ class
Github user falaki commented on a diff in the pull request:
https://github.com/apache/spark/pull/18532#discussion_r125763819
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityParserSuite.scala
---
@@ -130,17 +130,17 @@ class
Github user falaki commented on a diff in the pull request:
https://github.com/apache/spark/pull/18532#discussion_r125764216
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
---
@@ -1174,4 +1174,25 @@ class CSVSuite extends QueryTest
Github user falaki commented on a diff in the pull request:
https://github.com/apache/spark/pull/18532#discussion_r125537110
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
---
@@ -1174,4 +1174,25 @@ class CSVSuite extends QueryTest
Github user falaki commented on the issue:
https://github.com/apache/spark/pull/14431
If we want to avoid yet another method, we could add this functionality as
a non-default behavior. E.g.,
```
gapply(df, "key", function(key, x) { x }, schema(df), appe
Github user falaki commented on the issue:
https://github.com/apache/spark/pull/14431
@NarineK how about adding this as a new API e.g., `gapplyWithKeys()`. I am
extremely worried about the semantic change. It can break existing SparkR
applications and will be confusing for users
Github user falaki commented on the issue:
https://github.com/apache/spark/pull/17941
@felixcheung How about before `sparkR.session.stop()`? The use case I am
trying to recover (which is broken now) is sharing table in SparkR and working
on it in a different language.
---
If your
Github user falaki commented on the issue:
https://github.com/apache/spark/pull/17941
@felixcheung we all know that SparkR (and in general R) API is not perfect
when it comes to ETLing unstructured data. For example we don't have a great
story for nested data, etc. To overcome
Github user falaki commented on the issue:
https://github.com/apache/spark/pull/17941
@dongjoon-hyun and @felixcheung just to confirm, yes we do share
SparkContext across languages in Databricks.
I think this is a useful API in general. Note that when user creates spark
Github user falaki commented on a diff in the pull request:
https://github.com/apache/spark/pull/17941#discussion_r115872802
--- Diff: R/pkg/R/DataFrame.R ---
@@ -501,6 +501,34 @@ setMethod("createOrReplaceTempView",
invisible(callJMe
Github user falaki commented on a diff in the pull request:
https://github.com/apache/spark/pull/17941#discussion_r115873162
--- Diff: R/pkg/inst/tests/testthat/test_sparkSQL.R ---
@@ -721,6 +721,16 @@ test_that(
expect_true(dropTempView("d
Github user falaki commented on the issue:
https://github.com/apache/spark/pull/17905
@felixcheung this approach is fine, but I think it is better if unit tests
do not leave any side-effects to begin with. In this case every test should
clean up state before and after (similar
Github user falaki commented on the issue:
https://github.com/apache/spark/pull/17903
@shivaram it seems to have started today with this build:
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-master-test-sbt-hadoop-2.7/2844/consoleFull
The build just
GitHub user falaki opened a pull request:
https://github.com/apache/spark/pull/17903
[SPARK-20661][SparkR][Test] SparkR tableNames() test fails
## What changes were proposed in this pull request?
Cleaning existing temp tables before running tableNames tests
## How
GitHub user falaki opened a pull request:
https://github.com/apache/spark/pull/17423
[SPARK-20088] Do not create new SparkContext in SparkR createSparkContext
## What changes were proposed in this pull request?
Instead of creating new `JavaSparkContext` we use
Github user falaki commented on the issue:
https://github.com/apache/spark/pull/16611
@HyukjinKwon as I laid out in the JIRA a major problem with this approach
for specifying multiple options is that it won't work in DDL. What is wrong
with having a numbered list. E.g.: `nullValue1
Github user falaki commented on a diff in the pull request:
https://github.com/apache/spark/pull/16154#discussion_r91173424
--- Diff: core/src/main/scala/org/apache/spark/api/r/SerDe.scala ---
@@ -272,18 +282,22 @@ private[spark] object SerDe
Github user falaki commented on a diff in the pull request:
https://github.com/apache/spark/pull/16154#discussion_r91169543
--- Diff: core/src/main/scala/org/apache/spark/api/r/RBackendHandler.scala
---
@@ -143,12 +142,8 @@ private[r] class RBackendHandler(server: RBackend
Github user falaki commented on a diff in the pull request:
https://github.com/apache/spark/pull/16154#discussion_r91171383
--- Diff: core/src/main/scala/org/apache/spark/api/r/JVMObjectTracker.scala
---
@@ -0,0 +1,65 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user falaki commented on a diff in the pull request:
https://github.com/apache/spark/pull/16154#discussion_r91170861
--- Diff: core/src/main/scala/org/apache/spark/api/r/SerDe.scala ---
@@ -272,18 +282,22 @@ private[spark] object SerDe
Github user falaki commented on the issue:
https://github.com/apache/spark/pull/11336
I did another pass. It looks good to me.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user falaki commented on the issue:
https://github.com/apache/spark/pull/15471
@shivaram that is fine. We can merge it to 2.1 (or whatever the next major
release is going to be).
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user falaki commented on the issue:
https://github.com/apache/spark/pull/15471
@shivaram is there a chance this makes it to the 2.0.2 release?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user falaki commented on the issue:
https://github.com/apache/spark/pull/15471
retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so
Github user falaki commented on the issue:
https://github.com/apache/spark/pull/15471
@shivaram sorry for delay getting back to this. Please take another look.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user falaki commented on a diff in the pull request:
https://github.com/apache/spark/pull/15471#discussion_r85230441
--- Diff: core/src/main/scala/org/apache/spark/api/r/RBackendHandler.scala
---
@@ -83,7 +86,29 @@ private[r] class RBackendHandler(server: RBackend
Github user falaki commented on a diff in the pull request:
https://github.com/apache/spark/pull/15471#discussion_r85229665
--- Diff: core/src/main/scala/org/apache/spark/api/r/RBackend.scala ---
@@ -110,6 +115,11 @@ private[spark] object RBackend extends Logging {
val
Github user falaki commented on the issue:
https://github.com/apache/spark/pull/15421
Ping.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so
Github user falaki commented on the issue:
https://github.com/apache/spark/pull/15471
Thanks @shivaram. I ran a real workload consisting of long running parallel
simulations that took about 3.5 hours. I also tested it by calling
`Sys.sleep()` inside workers with dapply
Github user falaki commented on the issue:
https://github.com/apache/spark/pull/15471
Thanks @felixcheung addressed your comments.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user falaki commented on the issue:
https://github.com/apache/spark/pull/15421
@wangmiao1981 could that behavior with `list` relate to the issue in this
ticket? https://issues.apache.org/jira/browse/SPARK-17781
---
If your project is set up for it, you can reply to this email
Github user falaki commented on the issue:
https://github.com/apache/spark/pull/15421
@felixcheung you mean test for parallelizing NAs and getting them back? The
patch includes that test.
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user falaki commented on a diff in the pull request:
https://github.com/apache/spark/pull/15421#discussion_r84117167
--- Diff: core/src/main/scala/org/apache/spark/api/r/SerDe.scala ---
@@ -125,15 +125,34 @@ private[spark] object SerDe {
}
def readDate
Github user falaki commented on the issue:
https://github.com/apache/spark/pull/15421
@shivaram I removed catching `NegativeArraySizeException` from this PR.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user falaki commented on the issue:
https://github.com/apache/spark/pull/15471
@shivaram it was indeed my fault. I did not run local tests after I added
the heartbeat. I am now using +1 for heartbeat.
---
If your project is set up for it, you can reply to this email and have
Github user falaki commented on the issue:
https://github.com/apache/spark/pull/15421
Thanks @wangmiao1981. I thin it is best to file a separate JIRA for this
issue. Thanks a lot for catching it.
---
If your project is set up for it, you can reply to this email and have your
reply
Github user falaki commented on the issue:
https://github.com/apache/spark/pull/15421
@wangmiao1981 would you please also test the master branch?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user falaki commented on the issue:
https://github.com/apache/spark/pull/15471
@shivaram I think they are unrelated. Can you trigger another test?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
Github user falaki commented on the issue:
https://github.com/apache/spark/pull/15421
@shivaram and @felixcheung do I need to do more on this?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user falaki commented on the issue:
https://github.com/apache/spark/pull/15471
@shivaram this worked on my stress tests. The question is how to unit test
this?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user falaki commented on a diff in the pull request:
https://github.com/apache/spark/pull/12904#discussion_r83341979
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVParser.scala
---
@@ -46,6 +46,7 @@ private[sql] abstract class CsvReader
Github user falaki commented on a diff in the pull request:
https://github.com/apache/spark/pull/12904#discussion_r83342029
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
---
@@ -555,4 +558,37 @@ class CSVSuite extends QueryTest
Github user falaki commented on the issue:
https://github.com/apache/spark/pull/15471
@shivaram thanks for clarification.
I realized we were not setting socket timeout on Netty socket. So I added
that as well.
I also introduced the heartbeat mechanism and tested it locally
Github user falaki commented on the issue:
https://github.com/apache/spark/pull/15421
@wandjenkins thanks! It is interesting with R 3.3.1 it worked!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user falaki commented on the issue:
https://github.com/apache/spark/pull/15471
@felixcheung that is a good suggestion. I will try to use a single constant.
I changed the label to WIP because something is still timing out the
connection in my tests. Maybe very long timeouts
Github user falaki commented on the issue:
https://github.com/apache/spark/pull/15421
@wandjenkins yes, I think there must be some non-standard issue on your
system. I tested on Mac and Linux with different version of R and passed the
test case.
---
If your project is set up
Github user falaki commented on the issue:
https://github.com/apache/spark/pull/15421
@felixcheung and @shivaram AppVeyor test that @HyukjinKwon kicked passed
and jenkins passed too. Do you think this is ready?
---
If your project is set up for it, you can reply to this email
Github user falaki commented on the issue:
https://github.com/apache/spark/pull/15421
@wangmiao1981 you are talking about Windows right?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user falaki commented on the issue:
https://github.com/apache/spark/pull/15421
I just tried the patch on R version 3.3.1 (2016-06-21) -- "Bug in Your
Hair" on Linux and it passed tests.
@HyukjinKwon how can kick another AppVeyor test?
---
If your project
GitHub user falaki opened a pull request:
https://github.com/apache/spark/pull/15471
[SPARK-17919] Make timeout to RBackend configurable in SparkR
## What changes were proposed in this pull request?
This patch makes RBackend connection timeout configurable by user
Github user falaki commented on the issue:
https://github.com/apache/spark/pull/15421
@felixcheung and @wangmiao1981 thanks! This is good point. I will try
testing it on different version of R.
---
If your project is set up for it, you can reply to this email and have your
reply
Github user falaki commented on a diff in the pull request:
https://github.com/apache/spark/pull/15421#discussion_r82940884
--- Diff: core/src/main/scala/org/apache/spark/api/r/SerDe.scala ---
@@ -125,15 +125,34 @@ private[spark] object SerDe {
}
def readDate
Github user falaki commented on the issue:
https://github.com/apache/spark/pull/15446
@shivaram yes I just noticed it during my debugging and fixed it.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user falaki commented on the issue:
https://github.com/apache/spark/pull/15375
Seems like a flaky test in `DirectKafkaStreamSuite`:
```
DirectKafkaStreamSuite:
- pattern based subscription *** FAILED *** (1 minute, 41 seconds)
```
If jenkins listens
Github user falaki commented on the issue:
https://github.com/apache/spark/pull/15421
@wangmiao1981 I don't get the exception that you reported on Mac. Also note
that the unit test is passing on Linux. I am not sure why returning null is an
issue.
---
If your project is set up
Github user falaki commented on the issue:
https://github.com/apache/spark/pull/15375
@felixcheung does it look OK now?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user falaki commented on the issue:
https://github.com/apache/spark/pull/15421
My guess is that on windows R serialization behaves differently and
serializes `NA` as `null`. Unfortunately, I don't have a windows machine to
verify. Would you please test that?
---
If your
Github user falaki commented on the issue:
https://github.com/apache/spark/pull/11336
I think some of the old tests still rely on `sparkRSQL.init()`. I believe
the warning is OK.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub
Github user falaki commented on a diff in the pull request:
https://github.com/apache/spark/pull/11336#discussion_r82888215
--- Diff: R/pkg/inst/tests/testthat/test_sparkSQL.R ---
@@ -2252,6 +2252,31 @@ test_that("Method str()", {
expect_equal(capture.output(utils:
Github user falaki commented on a diff in the pull request:
https://github.com/apache/spark/pull/11336#discussion_r82887849
--- Diff: R/pkg/R/column.R ---
@@ -32,35 +34,65 @@ setOldClass("jobj")
#' @export
#' @note Column since 1.4.0
setCla
Github user falaki commented on a diff in the pull request:
https://github.com/apache/spark/pull/11336#discussion_r82888061
--- Diff: R/pkg/R/column.R ---
@@ -32,35 +34,65 @@ setOldClass("jobj")
#' @export
#' @note Column since 1.4.0
setCla
Github user falaki commented on a diff in the pull request:
https://github.com/apache/spark/pull/11336#discussion_r82887445
--- Diff: R/pkg/R/DataFrame.R ---
@@ -1035,10 +1035,17 @@ setMethod("dim",
c(count(x), ncol(x))
})
-#' Co
Github user falaki commented on the issue:
https://github.com/apache/spark/pull/15421
@wangmiao1981 thanks for testing on Windows. I added a check for this.
Would you please try again and let me know? Unfortunately, I don't have access
to a windows box for testing.
---
If your
Github user falaki commented on a diff in the pull request:
https://github.com/apache/spark/pull/15421#discussion_r82864117
--- Diff: core/src/main/scala/org/apache/spark/api/r/SerDe.scala ---
@@ -125,15 +125,24 @@ private[spark] object SerDe {
}
def readDate
Github user falaki commented on a diff in the pull request:
https://github.com/apache/spark/pull/15421#discussion_r82814431
--- Diff: R/pkg/DESCRIPTION ---
@@ -11,7 +11,8 @@ Authors@R: c(person("Shivaram", "Venkataraman", role =
c("aut", "cr
Github user falaki commented on the issue:
https://github.com/apache/spark/pull/15421
@shivaram can I nominate this patch for 2.0 branch?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user falaki commented on the issue:
https://github.com/apache/spark/pull/11336
@olarayej any update not his. If you are busy I can start another PR from
yours.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
GitHub user falaki opened a pull request:
https://github.com/apache/spark/pull/15421
[SPARK-17811] SparkR cannot parallelize data.frame with NA or NULL in Date
columns
## What changes were proposed in this pull request?
NA date values are serialized as "NA" and NA t
Github user falaki commented on a diff in the pull request:
https://github.com/apache/spark/pull/15375#discussion_r82462875
--- Diff: R/pkg/R/context.R ---
@@ -126,13 +126,13 @@ parallelize <- function(sc, coll, numSlices = 1) {
if (numSlices > lengt
Github user falaki commented on a diff in the pull request:
https://github.com/apache/spark/pull/15375#discussion_r82446808
--- Diff: R/pkg/R/context.R ---
@@ -123,19 +126,48 @@ parallelize <- function(sc, coll, numSlices = 1) {
if (numSlices > lengt
Github user falaki commented on the issue:
https://github.com/apache/spark/pull/15375
@felixcheung added clean up for the temp file and unit test. PTAL.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
Github user falaki commented on the issue:
https://github.com/apache/spark/pull/11336
@olarayej are you interested in rebasing this PR and limiting its user
impact to just `head` for now? We can continue the discussion on support for
distributed vector (a.k.a. `Column`) in JIRA
GitHub user falaki opened a pull request:
https://github.com/apache/spark/pull/15375
[SPARK-17790] Support for parallelizing R data.frame larger than 2GB
## What changes were proposed in this pull request?
If the R data structure that is being parallelized is larger than
Github user falaki closed the pull request at:
https://github.com/apache/spark/pull/15328
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
Github user falaki commented on the issue:
https://github.com/apache/spark/pull/15328
Closing this, as it will not address the issue. I will work on it under
SPARK-17790
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
GitHub user falaki reopened a pull request:
https://github.com/apache/spark/pull/15328
[SPARKR][SPARK-17762] invokeJava fails when serialized argument list is
larger than INT_MAX bytes
## What changes were proposed in this pull request?
* Updates implementation of `writeRaw
Github user falaki closed the pull request at:
https://github.com/apache/spark/pull/15328
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
Github user falaki commented on the issue:
https://github.com/apache/spark/pull/15328
A user ran into this limit when trying to parallelize a fairly large R
data.frame. The user has extensive logic implemented in R on that data.frame
and migrating it over to SparkDataFrame API
Github user falaki commented on the issue:
https://github.com/apache/spark/pull/15328
@shivaram thanks for pointing it. INT_MAX is indeed a limit in R as well. I
will update `RBackendHandler` to use lists. Sounds good?
---
If your project is set up for it, you can reply
Github user falaki commented on the issue:
https://github.com/apache/spark/pull/15328
@shivaram added unit tests.
On Java Array limitation, we deserialize all of the arguments as one
Array[Object]. So if even one of the arguments is larger than `INT_MAX` we will
fail on the R
1 - 100 of 235 matches
Mail list logo