tianshuo created SPARK-4452:
---
Summary: Enhance Sort-based Shuffle to avoid spilling small files
Key: SPARK-4452
URL: https://issues.apache.org/jira/browse/SPARK-4452
Project: Spark
Issue Type: Bug
[
https://issues.apache.org/jira/browse/SPARK-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
tianshuo updated SPARK-4452:
Description:
When an Aggregator is used with ExternalSorter in a task, spark will create
many small files
[
https://issues.apache.org/jira/browse/SPARK-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
tianshuo updated SPARK-4452:
Description:
When an Aggregator is used with ExternalSorter in a task, spark will create
many small files
[
https://issues.apache.org/jira/browse/SPARK-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14214962#comment-14214962
]
tianshuo commented on SPARK-4452:
-
Originally, we found this problem by seeing Too Many
[
https://issues.apache.org/jira/browse/SPARK-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
tianshuo updated SPARK-4452:
Description:
When an Aggregator is used with ExternalSorter in a task, spark will create
many small files
[
https://issues.apache.org/jira/browse/SPARK-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14215222#comment-14215222
]
tianshuo commented on SPARK-4452:
-
Hi, [~sandyr]:
Your concern about data structures
[
https://issues.apache.org/jira/browse/SPARK-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14215233#comment-14215233
]
tianshuo commented on SPARK-4452:
-
Currently, the two instances of Spillable,
[
https://issues.apache.org/jira/browse/SPARK-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14215222#comment-14215222
]
tianshuo edited comment on SPARK-4452 at 11/17/14 10:00 PM: