[GitHub] spark pull request: [SPARK-5920][CORE]BufferedInputStream is added...

kayousterhout Wed, 04 Mar 2015 14:29:38 -0800

Github user kayousterhout commented on the pull request:

    https://github.com/apache/spark/pull/4878#issuecomment-77264094
  
    Sorry to chime in late, but have you done performance tests with this to 
see if it makes a difference?  There two issues I see here:
    
    (1) This isn't the place where I observed issues.  I only observed problems 
for the read that's done in FileSegmentManagedBuffer: 
https://github.com/apache/spark/blob/master/network/common/src/main/java/org/apache/spark/network/buffer/FileSegmentManagedBuffer.java#L100
    which is the one I had thought should use a buffered input stream.
    (2) After much more experimentation, I found that the lack of buffering was 
only an issue when shuffle compression is turned off (I had turned it off in 
the earlier experiments I was running to do some network benchmarking).  When 
compression is on, the compression libraries read data in larger chunks, so 
essentially do buffering on their own.  Given that I don't know of any use 
cases where people turn compression off (this is recommended against in the 
Spark conf), I wonder if the added complexity from this is worthwhile?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-5920][CORE]BufferedInputStream is added...

Reply via email to