[jira] [Comment Edited] (SPARK-17020) Materialization of RDD via DataFrame.rdd forces a poor re-distribution of data

2016-08-11 Thread Roi Reshef (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15417301#comment-15417301 ] Roi Reshef edited comment on SPARK-17020 at 8/11/16 2:09 PM: - Nevertheless,

[jira] [Comment Edited] (SPARK-17020) Materialization of RDD via DataFrame.rdd forces a poor re-distribution of data

2016-08-11 Thread Roi Reshef (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15417301#comment-15417301 ] Roi Reshef edited comment on SPARK-17020 at 8/11/16 2:09 PM: - Nevertheless,

[jira] [Commented] (SPARK-17020) Materialization of RDD via DataFrame.rdd forces a poor re-distribution of data

2016-08-11 Thread Roi Reshef (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15417301#comment-15417301 ] Roi Reshef commented on SPARK-17020: Nevertheless, any attempt to repartition the resulting RDD also

[jira] [Commented] (SPARK-17020) Materialization of RDD via DataFrame.rdd forces a poor re-distribution of data

2016-08-11 Thread Roi Reshef (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15417288#comment-15417288 ] Roi Reshef commented on SPARK-17020: The problem occurs only when calling **.rdd** on an

[jira] [Commented] (SPARK-17020) Materialization of RDD via DataFrame.rdd forces a poor re-distribution of data

2016-08-11 Thread Roi Reshef (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15417254#comment-15417254 ] Roi Reshef commented on SPARK-17020: Also note that I have just called: *data.cache().count()* val

[jira] [Commented] (SPARK-17020) Materialization of RDD via DataFrame.rdd forces a poor re-distribution of data

2016-08-11 Thread Roi Reshef (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15417250#comment-15417250 ] Roi Reshef commented on SPARK-17020: val ab = SomeReader.read(...) //some reader function that uses

[jira] [Commented] (SPARK-17020) Materialization of RDD via DataFrame.rdd forces a poor re-distribution of data

2016-08-11 Thread Roi Reshef (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15417218#comment-15417218 ] Roi Reshef commented on SPARK-17020: [~srowen] Should there be any effect on this if I cached and

[jira] [Comment Edited] (SPARK-17020) Materialization of RDD via DataFrame.rdd forces a poor re-distribution of data

2016-08-11 Thread Roi Reshef (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15417204#comment-15417204 ] Roi Reshef edited comment on SPARK-17020 at 8/11/16 1:13 PM: - [~srowen] I

[jira] [Commented] (SPARK-17020) Materialization of RDD via DataFrame.rdd forces a poor re-distribution of data

2016-08-11 Thread Roi Reshef (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15417204#comment-15417204 ] Roi Reshef commented on SPARK-17020: [~srowen] I have 2 DataFrames that are generated from spark-csv

[jira] [Updated] (SPARK-17020) Materialization of RDD via DataFrame.rdd forces a poor re-distribution of data

2016-08-11 Thread Roi Reshef (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roi Reshef updated SPARK-17020: --- Affects Version/s: 2.0.0 > Materialization of RDD via DataFrame.rdd forces a poor re-distribution of

[jira] [Updated] (SPARK-17020) Materialization of RDD via DataFrame.rdd forces a poor re-distribution of data

2016-08-11 Thread Roi Reshef (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roi Reshef updated SPARK-17020: --- Attachment: rdd_cache.PNG dataframe_cache.PNG > Materialization of RDD via

[jira] [Created] (SPARK-17020) Materialization of RDD via DataFrame.rdd forces a poor re-distribution of data

2016-08-11 Thread Roi Reshef (JIRA)
Roi Reshef created SPARK-17020: -- Summary: Materialization of RDD via DataFrame.rdd forces a poor re-distribution of data Key: SPARK-17020 URL: https://issues.apache.org/jira/browse/SPARK-17020 Project:

[jira] [Comment Edited] (SPARK-10789) Cluster mode SparkSubmit classpath only includes Spark assembly

2015-12-29 Thread Roi Reshef (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15073832#comment-15073832 ] Roi Reshef edited comment on SPARK-10789 at 12/29/15 11:56 AM: --- Thanks

[jira] [Commented] (SPARK-10789) Cluster mode SparkSubmit classpath only includes Spark assembly

2015-12-29 Thread Roi Reshef (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15073832#comment-15073832 ] Roi Reshef commented on SPARK-10789: Thanks [~jonathak]. That requires rebuilding spark and

[jira] [Comment Edited] (SPARK-10789) Cluster mode SparkSubmit classpath only includes Spark assembly

2015-12-29 Thread Roi Reshef (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15073832#comment-15073832 ] Roi Reshef edited comment on SPARK-10789 at 12/29/15 11:55 AM: --- Thanks

[jira] [Commented] (SPARK-10789) Cluster mode SparkSubmit classpath only includes Spark assembly

2015-12-21 Thread Roi Reshef (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15066304#comment-15066304 ] Roi Reshef commented on SPARK-10789: Any resolution on that? Can you elaborate more on how were you

[jira] [Issue Comment Deleted] (SPARK-5081) Shuffle write increases

2015-06-15 Thread Roi Reshef (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roi Reshef updated SPARK-5081: -- Comment: was deleted (was: Hi Guys, Was this issue already solved by any chance? I'm using Spark 1.3.1

[jira] [Commented] (SPARK-5081) Shuffle write increases

2015-06-15 Thread Roi Reshef (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14585641#comment-14585641 ] Roi Reshef commented on SPARK-5081: --- Hi Guys, Was this issue already solved by any

[jira] [Comment Edited] (SPARK-5081) Shuffle write increases

2015-06-15 Thread Roi Reshef (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14585641#comment-14585641 ] Roi Reshef edited comment on SPARK-5081 at 6/15/15 8:41 AM: Hi