[jira] [Created] (SPARK-4452) Enhance Sort-based Shuffle to avoid spilling small files

2014-11-17 Thread tianshuo (JIRA)
tianshuo created SPARK-4452: --- Summary: Enhance Sort-based Shuffle to avoid spilling small files Key: SPARK-4452 URL: https://issues.apache.org/jira/browse/SPARK-4452 Project: Spark Issue Type: Bug

[jira] [Updated] (SPARK-4452) Enhance Sort-based Shuffle to avoid spilling small files

2014-11-17 Thread tianshuo (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] tianshuo updated SPARK-4452: Description: When an Aggregator is used with ExternalSorter in a task, spark will create many small files

[jira] [Updated] (SPARK-4452) Enhance Sort-based Shuffle to avoid spilling small files

2014-11-17 Thread tianshuo (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] tianshuo updated SPARK-4452: Description: When an Aggregator is used with ExternalSorter in a task, spark will create many small files

[jira] [Commented] (SPARK-4452) Enhance Sort-based Shuffle to avoid spilling small files

2014-11-17 Thread tianshuo (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14214962#comment-14214962 ] tianshuo commented on SPARK-4452: - Originally, we found this problem by seeing Too Many

[jira] [Updated] (SPARK-4452) Enhance Sort-based Shuffle to avoid spilling small files

2014-11-17 Thread tianshuo (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] tianshuo updated SPARK-4452: Description: When an Aggregator is used with ExternalSorter in a task, spark will create many small files

[jira] [Commented] (SPARK-4452) Enhance Sort-based Shuffle to avoid spilling small files

2014-11-17 Thread tianshuo (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14215222#comment-14215222 ] tianshuo commented on SPARK-4452: - Hi, [~sandyr]: Your concern about data structures

[jira] [Commented] (SPARK-4452) Enhance Sort-based Shuffle to avoid spilling small files

2014-11-17 Thread tianshuo (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14215233#comment-14215233 ] tianshuo commented on SPARK-4452: - Currently, the two instances of Spillable,

[jira] [Comment Edited] (SPARK-4452) Enhance Sort-based Shuffle to avoid spilling small files

2014-11-17 Thread tianshuo (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14215222#comment-14215222 ] tianshuo edited comment on SPARK-4452 at 11/17/14 10:00 PM: