[ https://issues.apache.org/jira/browse/SPARK-17790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Felix Cheung resolved SPARK-17790. ---------------------------------- Resolution: Fixed Assignee: Hossein Falaki Fix Version/s: 2.1.0 2.0.2 > Support for parallelizing R data.frame larger than 2GB > ------------------------------------------------------ > > Key: SPARK-17790 > URL: https://issues.apache.org/jira/browse/SPARK-17790 > Project: Spark > Issue Type: Sub-task > Components: SparkR > Affects Versions: 2.0.1 > Reporter: Hossein Falaki > Assignee: Hossein Falaki > Fix For: 2.0.2, 2.1.0 > > > This issue is a more specific version of SPARK-17762. > Supporting larger than 2GB arguments is more general and arguably harder to > do because the limit exists both in R and JVM (because we receive data as a > ByteArray). However, to support parallalizing R data.frames that are larger > than 2GB we can do what PySpark does. > PySpark uses files to transfer bulk data between Python and JVM. It has > worked well for the large community of Spark Python users. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org