[ https://issues.apache.org/jira/browse/SPARK-5920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Patrick Wendell resolved SPARK-5920. ------------------------------------ Resolution: Won't Fix Per the discussion on this PR I am resolving this as won't fix. https://github.com/apache/spark/pull/4878 [~kayousterhout] please feel free to re-open if I misinterpreted. > Use a BufferedInputStream to read local shuffle data > ---------------------------------------------------- > > Key: SPARK-5920 > URL: https://issues.apache.org/jira/browse/SPARK-5920 > Project: Spark > Issue Type: Improvement > Components: Shuffle > Affects Versions: 1.2.1, 1.3.0 > Reporter: Kay Ousterhout > Assignee: Kay Ousterhout > Priority: Blocker > > When reading local shuffle data, Spark doesn't currently buffer the local > reads into larger chunks, which can lead to terrible disk performance if many > tasks are concurrently reading local data from the same disk. We should use > a BufferedInputStream to mitigate this problem; we can lazily create the > input stream to avoid allocating a bunch of in-memory buffers at the same > time for tasks that read shuffle data from a large number of local blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org