node_get_fsm_time_100

Demetri Mouratis Wed, 16 May 2012 10:52:54 -0700

Greetings,

We have a three node Riak cluster set up in a pre-production environmentwith Level DB configured on the backend. Systems are beefy dual 6 core,96GB RAM, running all SSDs. Preliminary testing showed some issues withlong latencies (~10-30 seconds and increasing) shown innode_get_fsm_time_100. We raised our initial concerns at the Riakworkgroup in San Francisco last week.


After the workgroup, we made the following changes to our configuration:

1.  Tuned /etc/security/limits.conf to add:

riak            soft    nofile          2048
riak            hard    nofile          10240

2. Added noatime to riak filesystem mount (running on 6-device RAID6/RAID 10 Intel 710 200 GB SSD)


/dev/mapper/vg_raid10-lv_riak on /var/lib/riak type ext4 (rw,noatime)

3.  Edited eleveldb config to add write buffer and cache size


      %% eLevelDB Config
 {eleveldb, [
             {data_root, "/var/lib/riak/leveldb"},
             {write_buffer_size, 16777216},
             {cache_size, 1073741824}
            ]},

At first blush, this tuning seemed to correct the problem. Bash benchtesting failed to uncover any latency. The get_fsm_time returned tonear zero. However, over the weekend and into this week the peak delaysstarted to creep back up linearly. See graphs from Ganglia:


http://www.flickr.com/photos/dmourati/sets/72157629758658870/

Average get times remain constant.  Put times do not show similar delay.

In talking with Basho folks, we learned the behavior is likely caused by"LevelDB Compaction."


http://leveldb.googlecode.com/svn/trunk/doc/impl.html

Question:

What can we do to reduce/eliminate the latency shown innode_get_fsm_time_100?


Thanks,

Demetri

_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

node_get_fsm_time_100

Reply via email to