Github user suyanNone commented on the pull request:
https://github.com/apache/spark/pull/6586#issuecomment-108707568
@srowen
Today I spent some time to have a performance test.
If I just test 1 cycle, TestOutPutStream have a minor strength, may due to
directbuffer creation and destroy is a time cost thing.
cycle: 1, data: 10Mb
TestOutputStream: 12
TestChannel: 14
cycle: 1, data: 50MB
TestOutputStream: 46
TestChannel: 54
cycle: 1, data: 100MB
TestOutputStream: 110
TestChannel: 112
cycle: 1, data: 500MB
TestOutputStream: 620
TestChannel: 600
While cycle is increased to 10.
FileOutputStream is direct proportion. and channel thanks the directBuffer
pool, it just increase a little time on the "cycle 1" time.
cycle: 10, data 10MB
TestOutputStream: 100
TestChannel: 16
cycle: 10, data 50MB
TestOutputStream: 474
TestChannel: 63
cycle: 10, data 100MB
TestOutputStream: 1118
TestChannel: 138
cycle:10, data:500MB
TestOutputStream: 6332
TestChannel: 690
And also according to test, the time to create a direct buffer is in direct
proportion of data size.
so I think slice large data into small size will be good for performance
and can reduce direct buffer pool size.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]