[jira] [Updated] (SPARK-5920) Use a BufferedInputStream to read local shuffle data
[ https://issues.apache.org/jira/browse/SPARK-5920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-5920: --- Priority: Critical (was: Major) Use a BufferedInputStream to read local shuffle data Key: SPARK-5920 URL: https://issues.apache.org/jira/browse/SPARK-5920 Project: Spark Issue Type: Improvement Components: Shuffle Affects Versions: 1.3.0, 1.2.1 Reporter: Kay Ousterhout Assignee: Kay Ousterhout Priority: Critical When reading local shuffle data, Spark doesn't currently buffer the local reads into larger chunks, which can lead to terrible disk performance if many tasks are concurrently reading local data from the same disk. We should use a BufferedInputStream to mitigate this problem; we can lazily create the input stream to avoid allocating a bunch of in-memory buffers at the same time for tasks that read shuffle data from a large number of local blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5920) Use a BufferedInputStream to read local shuffle data
[ https://issues.apache.org/jira/browse/SPARK-5920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-5920: --- Priority: Blocker (was: Critical) Use a BufferedInputStream to read local shuffle data Key: SPARK-5920 URL: https://issues.apache.org/jira/browse/SPARK-5920 Project: Spark Issue Type: Improvement Components: Shuffle Affects Versions: 1.3.0, 1.2.1 Reporter: Kay Ousterhout Assignee: Kay Ousterhout Priority: Blocker When reading local shuffle data, Spark doesn't currently buffer the local reads into larger chunks, which can lead to terrible disk performance if many tasks are concurrently reading local data from the same disk. We should use a BufferedInputStream to mitigate this problem; we can lazily create the input stream to avoid allocating a bunch of in-memory buffers at the same time for tasks that read shuffle data from a large number of local blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org