I have a persistent issue I am trying to diagnose. In our use of Riak we have multiple data creators writing into a 7 node cluster. The value size is a bit large at around 2MB. The behavior I am seeing is that if I delete all data out of bitcask, then test performance I get fast writes. As I keep doing the same work of writing to the cluster, then the Riak write times will start tailing off and getting really bad.
Initial write times seen by my application: 0.5 seconds for 100MB worth of values (~200MB/s) Subsequent write times: 11 seconds for 100MB worth of values (~9MB/s) This slow down can happen over roughly 20-40 minutes of writing or about 200GB worth of key/value pairs written. I can reset the cluster to get fast performance again by stopping Riak and deleting the bitcask directories, then starting Riak again. This step is not feasible for production, but during testing at least the write speed goes up by 20x. Watching iostat I see that every few seconds the disk io jumps to ~11%. It doesn't seem that highly loaded from my cursory look. Watching top I see that beam.smp runs at around 100 for CPU% or less when heavily loaded. I am not sure how to tell what it is doing though :-) Thanks for any suggestions!! -Matt ================ System Description avg value size = 2MB Riak version = 1.4.1 n_val = 2 client threads total = 105 backend = bitcask ring_creation_size = 128 node count = 7 node OS = RHEL 6.2 server RAM = 128GB RAID = RAID0 across 8 SAS drives FS = ext4 FS options = /dev/mapper/vg0-lv0 / ext4 rw,noatime,barrier=0,stripe=512,data=ordered 0 0 bitcask size on one server = 133GB AAE = off interface = protobuf client library = riak java client file-max = 65536
_______________________________________________ riak-users mailing list [email protected] http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
