[jira] [Commented] (SPARK-5920) Use a BufferedInputStream to read local shuffle data

2015-03-03 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14345495#comment-14345495
 ] 

Apache Spark commented on SPARK-5920:
-

User 'ravipesala' has created a pull request for this issue:
https://github.com/apache/spark/pull/4878

 Use a BufferedInputStream to read local shuffle data
 

 Key: SPARK-5920
 URL: https://issues.apache.org/jira/browse/SPARK-5920
 Project: Spark
  Issue Type: Improvement
  Components: Shuffle
Affects Versions: 1.3.0, 1.2.1
Reporter: Kay Ousterhout
Assignee: Kay Ousterhout
Priority: Blocker

 When reading local shuffle data, Spark doesn't currently buffer the local 
 reads into larger chunks, which can lead to terrible disk performance if many 
 tasks are concurrently reading local data from the same disk.  We should use 
 a BufferedInputStream to mitigate this problem; we can lazily create the 
 input stream to avoid allocating a bunch of in-memory buffers at the same 
 time for tasks that read shuffle data from a large number of local blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5920) Use a BufferedInputStream to read local shuffle data

2015-02-21 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14331986#comment-14331986
 ] 

Patrick Wendell commented on SPARK-5920:


We should definitely do this.

 Use a BufferedInputStream to read local shuffle data
 

 Key: SPARK-5920
 URL: https://issues.apache.org/jira/browse/SPARK-5920
 Project: Spark
  Issue Type: Improvement
  Components: Shuffle
Affects Versions: 1.3.0, 1.2.1
Reporter: Kay Ousterhout
Assignee: Kay Ousterhout
Priority: Blocker

 When reading local shuffle data, Spark doesn't currently buffer the local 
 reads into larger chunks, which can lead to terrible disk performance if many 
 tasks are concurrently reading local data from the same disk.  We should use 
 a BufferedInputStream to mitigate this problem; we can lazily create the 
 input stream to avoid allocating a bunch of in-memory buffers at the same 
 time for tasks that read shuffle data from a large number of local blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org