Re: Strange spike

Nam Nguyen Thu, 31 May 2012 13:22:29 -0700

Hi Seth,

Yes, I am using the default config.


Is it safe to change these values and restart riak?

Nam

On May 31, 2012, at 11:24 AM, Seth Benton wrote:

> Hey,
> 
> Apologies if this is the wrong place for this, but I just updated the 
> eLevelDB wiki page to mention randomization of the write buffer length (via 
> setting write_buffer_size_min and write_buffer_size_max).  Before there was 
> no mention of these config parameters.  Perhaps people were just using 
> levelDB's 4MB default buffer size, causing all the vnodes to compact at the 
> same time?  Or are there default write_buffer_size_min and 
> write_buffer_size_max parameters under the hood?
> 
> http://wiki.basho.com/LevelDB.html
> 
> P.S.  Mathew V is getting back to me shortly on changes to this page due to 
> changes in 1.2.
> 
> Seth
> (Tech Writer)
> 
> 
> On Thu, May 31, 2012 at 9:26 AM, Nam Nguyen <n...@tinyco.com> wrote:
> Hi Sean,
> 
> You are right. At first I thought it was localized to that one particular 
> node. Now others are also exhibiting the same symptom.
> 
> I am putting in another node.
> 
> Cheers,
> Nam
> 
> 
> On May 30, 2012, at 11:23 PM, Sean Cribbs wrote:
> 
>> Nam,
>> 
>> The LevelDB storage backend has a known issue where compaction can stall a 
>> heavily-loaded node for a long time (we've seen 60 seconds or more in 
>> production clusters). We're very sorry about this, but an improvement will 
>> be available in the next release. In the meantime, DO NOT make the node 
>> leave the cluster - this will only make things worse! It might be worth 
>> adding another node to the cluster, but I suggest you wait until the node 
>> finishes compaction.
>> 
>> On Wed, May 30, 2012 at 10:43 PM, Nam Nguyen <n...@tinyco.com> wrote:
>> Hi,
>> 
>> My 5-node cluster exhibits a strange spike on one particular node.
>> 
>> Overall, the mean get time is about 1ms. This node occasionally shoots up to 
>> 40ms.
>> 
>> During those times, %iowait is still the same as it is before the spike. No 
>> error. Console log shows many lines like the below, which I don't think 
>> relevant to the spike.
>> 
>> 2012-05-30 21:29:50.591 [info] 
>> <0.72.0>@riak_core_sysmon_handler:handle_event:85 monitor long_gc <0.938.0> 
>> [{initial_call,{riak_core_vnode,init,1}},{almost_current_function,{gen_fsm,loop,7}},{message_queue_len,0}]
>>  
>> [{timeout,185},{old_heap_block_size,0},{heap_block_size,2584},{mbuf_size,0},{stack_size,55},{old_heap_size,0},{heap_size,804}]
>> 
>> The cluster is set up uniformly. Ubuntu 64bit, m2.2xlarge instance. Riak 
>> 1.1.2 with LevelDB backend.
>> 
>> What would be the best course of actions for me?
>> 
>> I plan to:
>> 
>> - riak-admin leave on that node
>> - set up new instance
>> - riak-admin reip the new instance
>> - riak-admin join it to the cluster
>> 
>> Cheers,
>> Nam
>> 
>> 
>> _______________________________________________
>> riak-users mailing list
>> riak-users@lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>> 
>> 
>> 
>> -- 
>> Sean Cribbs <s...@basho.com>
>> Software Engineer
>> Basho Technologies, Inc.
>> http://basho.com/
>> 
> 
> 
> _______________________________________________
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> 
>

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Strange spike

Reply via email to