Thanks for the quick response Matthew!

I gave that a shot, and if anything the performance was worse.  When I picked 
128 I ran through the calculations on this page:

http://docs.basho.com/riak/latest/ops/advanced/backends/leveldb/#Parameter-Planning

and thought that would work, but it sounds like I was quite a bit off from what 
you have below.

Looking at risk control, the memory was staying pretty low, and watching top 
the CPU was well in hand.  iostat had very little of the CPU in iowait, 
although it was writing a lot.   I imagine, however, that this is missing a lot 
of the details.

Any other ideas?  I can't imagine one get/update/put cycle per second is the 
best I can do…

Thanks!

Paul Ingalls
Founder & CEO Fanzo
[email protected]
@paulingalls
http://www.linkedin.com/in/paulingalls



On Aug 1, 2013, at 7:12 PM, Matthew Von-Maszewski <[email protected]> wrote:

> Try cutting your max open files in half.  I am working from my iPad not my 
> workstation so my numbers are rough.  Will get better ones to you in the 
> morning.
> 
> The math goes like this: 
> 
> - vnode/partition heap usage is (4Mbytes * (max_open_files -10)) + 8Mbyte
> - you have 18 vnodes per server (multiply the above times 18)
> - AAE (active anti-entropy is"on") so that adds (4Mbyte* 10 + 8Mbyte) times 
> 18 vnodes 
> 
> The three lines above give the total memory leveldb will attempt to use per 
> server if your dataset is large enough to fill it.
> 
> Matthew
> 
> 
> On Aug 1, 2013, at 21:33, Paul Ingalls <[email protected]> wrote:
> 
>> I should add more details about the nodes that crashed.  I ran this for the 
>> first time for all of 10 minutes.
>> 
>> Here is the log from the first one:
>> 
>> 2013-08-02 00:09:44 =ERROR REPORT====
>> ** State machine <0.2368.0> terminating
>> ** Last event in was unregistered
>> ** When State == active
>> **      Data  == 
>> {state,114179815416476790484662877555959610910619729920,riak_kv_vnode,{deleted,{state,114179815416476790484662877555959610910619729920,riak_kv_eleveldb_backend,{state,<<>>,"/mnt/datadrive/riak/data/leveldb/114179815416476790484662877555959610910619729920",[{create_if_missing,true},{max_open_files,128},{use_bloomfilter,true},{write_buffer_size,58858594}],[{add_paths,[]},{allow_strfun,false},{anti_entropy,{on,[]}},{anti_entropy_build_limit,{1,3600000}},{anti_entropy_concurrency,2},{anti_entropy_data_dir,"/mnt/datadrive/riak/data/anti_entropy"},{anti_entropy_expire,604800000},{anti_entropy_leveldb_opts,[{write_buffer_size,4194304},{max_open_files,20}]},{anti_entropy_tick,15000},{create_if_missing,true},{data_root,"/mnt/datadrive/riak/data/leveldb"},{fsm_limit,50000},{hook_js_vm_count,2},{http_url_encoding,on},{included_applications,[]},{js_max_vm_mem,8},{js_thread_stack,16},{legacy_stats,true},{listkeys_backpressure,true},{map_js_vm_count,8},{mapred_2i_pipe,true},{mapred_name,"mapred"},{max_open_files,128},{object_format,v1},{reduce_js_vm_count,6},{stats_urlpath,"stats"},{storage_backend,riak_kv_eleveldb_backend},{use_bloomfilter,true},{vnode_vclocks,true},{write_buffer_size,58858594}],[],[],[{fill_cache,false}],true,false},{dict,0,16,16,8,80,48,{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},{{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}}},undefined,3000,1000,100,100,true,true,undefined}},riak@riak003,none,undefined,undefined,undefined,{pool,riak_kv_worker,10,[]},undefined,107615}
>> ** Reason for termination =
>> ** 
>> {badarg,[{eleveldb,close,[<<>>],[]},{riak_kv_eleveldb_backend,stop,1,[{file,"src/riak_kv_eleveldb_backend.erl"},{line,149}]},{riak_kv_vnode,terminate,2,[{file,"src/riak_kv_vnode.erl"},{line,836}]},{riak_core_vnode,terminate,3,[{file,"src/riak_core_vnode.erl"},{line,847}]},{gen_fsm,terminate,7,[{file,"gen_fsm.erl"},{line,586}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,227}]}]}
>> 2013-08-02 00:09:44 =CRASH REPORT====
>>   crasher:
>>     initial call: riak_core_vnode:init/1
>>     pid: <0.2368.0>
>>     registered_name: []
>>     exception exit: 
>> {{badarg,[{eleveldb,close,[<<>>],[]},{riak_kv_eleveldb_backend,stop,1,[{file,"src/riak_kv_eleveldb_backend.erl"},{line,149}]},{riak_kv_vnode,terminate,2,[{file,"src/riak_kv_vnode.erl"},{line,836}]},{riak_core_vnode,terminate,3,[{file,"src/riak_core_vnode.erl"},{line,847}]},{gen_fsm,terminate,7,[{file,"gen_fsm.erl"},{line,586}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,227}]}]},[{gen_fsm,terminate,7,[{file,"gen_fsm.erl"},{line,589}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,227}]}]}
>>     ancestors: [riak_core_vnode_sup,riak_core_sup,<0.139.0>]
>>     messages: []
>>     links: [<0.142.0>]
>>     dictionary: [{random_seed,{8115,23258,22987}}]
>>     trap_exit: true
>>     status: running
>>     heap_size: 196418
>>     stack_size: 24
>>     reductions: 12124
>>   neighbours:
>> 2013-08-02 00:09:44 =SUPERVISOR REPORT====
>>      Supervisor: {local,riak_core_vnode_sup}
>>      Context:    child_terminated
>>      Reason:     
>> {badarg,[{eleveldb,close,[<<>>],[]},{riak_kv_eleveldb_backend,stop,1,[{file,"src/riak_kv_eleveldb_backend.erl"},{line,149}]},{riak_kv_vnode,terminate,2,[{file,"src/riak_kv_vnode.erl"},{line,836}]},{riak_core_vnode,terminate,3,[{file,"src/riak_core_vnode.erl"},{line,847}]},{gen_fsm,terminate,7,[{file,"gen_fsm.erl"},{line,586}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,227}]}]}
>>      Offender:   
>> [{pid,<0.2368.0>},{name,undefined},{mfargs,{riak_core_vnode,start_link,undefined}},{restart_type,temporary},{shutdown,300000},{child_type,worker}]
>> 
>> The second one looks like it ran out of heap, I assume I have something miss 
>> configured here...
>> 
>> ===== Fri Aug  2 00:51:28 UTC 2013
>> Erlang has closed
>> /home/fanzo/riak/rel/riak/bin/../lib/os_mon-2.2.9/priv/bin/memsup: Erlang 
>> has closed.
>> ^M
>> Crash dump was written to: ./log/erl_crash.dump^M
>> eheap_alloc: Cannot allocate 5568010120 bytes of memory (of type "heap").^M
>> 
>> 
>> Paul Ingalls
>> Founder & CEO Fanzo
>> [email protected]
>> @paulingalls
>> http://www.linkedin.com/in/paulingalls
>> 
>> 
>> 
>> On Aug 1, 2013, at 6:28 PM, Paul Ingalls <[email protected]> wrote:
>> 
>>> Couple of questions.
>>> 
>>> I have migrated my system to use Riak on the back end.  I have setup a 1.4 
>>> cluster with 128 partitions on 7 nodes with LevelDB as the store.  Each 
>>> node looks like:
>>> 
>>> Azure Large instance (4CPU 7GB RAM)
>>> data directory is on a RAID 0
>>> max files is set to 128
>>> async thread on the VM is 16
>>> everything else is defaults
>>> 
>>> I'm using the 1.4.1 java client, connecting via the protocol buffer cluster.
>>> 
>>> With this setup, I'm seeing poor throughput on my service load.  I ran a 
>>> test for a bit and was seeing only a few gets/puts per second.   And then 
>>> when I stopped the client two of the nodes crashed.
>>> 
>>> I'm very new with Riak, so I figure I'm doing something wrong.  I saw a 
>>> note on the list earlier of someone getting well over 1000 puts per second, 
>>> so I know it can move pretty fast.  
>>> 
>>> What is a good strategy for troubleshooting?
>>> 
>>> How many fetch/update/store loops per second should I expect to see on a 
>>> cluster of this size?
>>> 
>>> Thanks!
>>> 
>>> Paul
>>> 
>>> Paul Ingalls
>>> Founder & CEO Fanzo
>>> [email protected]
>>> @paulingalls
>>> http://www.linkedin.com/in/paulingalls
>>> 
>>> 
>>> 
>> 
>> _______________________________________________
>> riak-users mailing list
>> [email protected]
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to