Re: Slow collecting of large Spark Data Frames into R

2016-06-11 Thread Sun Rui
Hi, Jonathan, Thanks for reporting. This is a known issue that the community would like to address later. Please refer to https://issues.apache.org/jira/browse/SPARK-14037. It would be better that you can profile your use case using the method discussed in the JIRA issue and paste the metrics

Slow collecting of large Spark Data Frames into R

2016-06-10 Thread Jonathan Mortensen
Hey Everyone! I've been converting between Parquet <-> Spark Data Frames <-> R Data Frames for larger data sets. I have found the conversion speed quite slow in the Spark <-> R side and am looking for some insight on how to speed it up (or determine what I have failed to do properly)! In R, "spar