On 31-10-2011 17:38, Erik Søe Sørensen wrote:
[snip]
Parallel Reads.
---------------
Within a vnode, bitcask read operations happen in serial.
Is there any reason for reads not happening in parallel?
[snip]
I've made a small test of this - just to check that my intuition isn't
off track.
In the test, I
- create a 2GB file
- clear the disk caches
- From Erlang, read 1000 randomly-placed 1KB blocks from the file.
The last two steps are repeated for different read strategies.
On my setup (Ubuntu laptop), I get the following read timings (per block):
- Calling file:pread/3 in one process: 8.2ms
- Same, but sort the reads by position: 5.7ms
- Calling file:pread/3 from separate processes (limited to 20
simultaneous outstanding reads): 5.8ms
- Calling file:pread/3 from separate processes (limited to 50
simultaneous outstanding reads): 5.4ms
(NB: This only works if a separate file descriptor is used for each
read, otherwise no improvement is observed.)
This means that read ordering really does matter - and that the
potential performance gains may be as much as 50% (i.e. significant).
As to whether this also holds in a Riak context, I've tried starting
multiple simultaneous instances of these strategies, each working on
different files (simulating multiple vnodes working from the same disk),
and observed similar improvements (30-45% for three instances).
(For completeness, I must add that this may be highly I/O system
dependent. The above numbers are from the 'anticipatory' I/O scheduler
strategy for Linux; switch to the 'CFQ' strategy reduces the benefits a
lot - and also makes the absolute numbers worse.)
Regards,
Erik Søe Sørensen
Trifork A/S
[Code is available on request.]
_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com