Hossein Falaki created SPARK-17790: -------------------------------------- Summary: Support for parallelizing/creating DataFrame on data larger than 2GB Key: SPARK-17790 URL: https://issues.apache.org/jira/browse/SPARK-17790 Project: Spark Issue Type: Story Components: SparkR Affects Versions: 2.0.1 Reporter: Hossein Falaki
This issue is a more specific version of SPARK-17762. Supporting larger than 2GB arguments is more general and arguably harder to do because the limit exists both in R and JVM (because we receive data as a ByteArray). However, to support parallalizing R data.frames that are larger than 2GB we can do what PySpark does. PySpark uses files to transfer bulk data between Python and JVM. It has worked well for the large community of Spark Python users. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org