I should add more details about the nodes that crashed. I ran this for the
first time for all of 10 minutes.
Here is the log from the first one:
2013-08-02 00:09:44 =ERROR REPORT====
** State machine <0.2368.0> terminating
** Last event in was unregistered
** When State == active
** Data ==
{state,114179815416476790484662877555959610910619729920,riak_kv_vnode,{deleted,{state,114179815416476790484662877555959610910619729920,riak_kv_eleveldb_backend,{state,<<>>,"/mnt/datadrive/riak/data/leveldb/114179815416476790484662877555959610910619729920",[{create_if_missing,true},{max_open_files,128},{use_bloomfilter,true},{write_buffer_size,58858594}],[{add_paths,[]},{allow_strfun,false},{anti_entropy,{on,[]}},{anti_entropy_build_limit,{1,3600000}},{anti_entropy_concurrency,2},{anti_entropy_data_dir,"/mnt/datadrive/riak/data/anti_entropy"},{anti_entropy_expire,604800000},{anti_entropy_leveldb_opts,[{write_buffer_size,4194304},{max_open_files,20}]},{anti_entropy_tick,15000},{create_if_missing,true},{data_root,"/mnt/datadrive/riak/data/leveldb"},{fsm_limit,50000},{hook_js_vm_count,2},{http_url_encoding,on},{included_applications,[]},{js_max_vm_mem,8},{js_thread_stack,16},{legacy_stats,true},{listkeys_backpressure,true},{map_js_vm_count,8},{mapred_2i_pipe,true},{mapred_name,"mapred"},{max_open_files,128},{object_format,v1},{reduce_js_vm_count,6},{stats_urlpath,"stats"},{storage_backend,riak_kv_eleveldb_backend},{use_bloomfilter,true},{vnode_vclocks,true},{write_buffer_size,58858594}],[],[],[{fill_cache,false}],true,false},{dict,0,16,16,8,80,48,{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},{{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}}},undefined,3000,1000,100,100,true,true,undefined}},riak@riak003,none,undefined,undefined,undefined,{pool,riak_kv_worker,10,[]},undefined,107615}
** Reason for termination =
**
{badarg,[{eleveldb,close,[<<>>],[]},{riak_kv_eleveldb_backend,stop,1,[{file,"src/riak_kv_eleveldb_backend.erl"},{line,149}]},{riak_kv_vnode,terminate,2,[{file,"src/riak_kv_vnode.erl"},{line,836}]},{riak_core_vnode,terminate,3,[{file,"src/riak_core_vnode.erl"},{line,847}]},{gen_fsm,terminate,7,[{file,"gen_fsm.erl"},{line,586}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,227}]}]}
2013-08-02 00:09:44 =CRASH REPORT====
crasher:
initial call: riak_core_vnode:init/1
pid: <0.2368.0>
registered_name: []
exception exit:
{{badarg,[{eleveldb,close,[<<>>],[]},{riak_kv_eleveldb_backend,stop,1,[{file,"src/riak_kv_eleveldb_backend.erl"},{line,149}]},{riak_kv_vnode,terminate,2,[{file,"src/riak_kv_vnode.erl"},{line,836}]},{riak_core_vnode,terminate,3,[{file,"src/riak_core_vnode.erl"},{line,847}]},{gen_fsm,terminate,7,[{file,"gen_fsm.erl"},{line,586}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,227}]}]},[{gen_fsm,terminate,7,[{file,"gen_fsm.erl"},{line,589}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,227}]}]}
ancestors: [riak_core_vnode_sup,riak_core_sup,<0.139.0>]
messages: []
links: [<0.142.0>]
dictionary: [{random_seed,{8115,23258,22987}}]
trap_exit: true
status: running
heap_size: 196418
stack_size: 24
reductions: 12124
neighbours:
2013-08-02 00:09:44 =SUPERVISOR REPORT====
Supervisor: {local,riak_core_vnode_sup}
Context: child_terminated
Reason:
{badarg,[{eleveldb,close,[<<>>],[]},{riak_kv_eleveldb_backend,stop,1,[{file,"src/riak_kv_eleveldb_backend.erl"},{line,149}]},{riak_kv_vnode,terminate,2,[{file,"src/riak_kv_vnode.erl"},{line,836}]},{riak_core_vnode,terminate,3,[{file,"src/riak_core_vnode.erl"},{line,847}]},{gen_fsm,terminate,7,[{file,"gen_fsm.erl"},{line,586}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,227}]}]}
Offender:
[{pid,<0.2368.0>},{name,undefined},{mfargs,{riak_core_vnode,start_link,undefined}},{restart_type,temporary},{shutdown,300000},{child_type,worker}]
The second one looks like it ran out of heap, I assume I have something miss
configured here...
===== Fri Aug 2 00:51:28 UTC 2013
Erlang has closed
/home/fanzo/riak/rel/riak/bin/../lib/os_mon-2.2.9/priv/bin/memsup: Erlang has
closed.
^M
Crash dump was written to: ./log/erl_crash.dump^M
eheap_alloc: Cannot allocate 5568010120 bytes of memory (of type "heap").^M
Paul Ingalls
Founder & CEO Fanzo
[email protected]
@paulingalls
http://www.linkedin.com/in/paulingalls
On Aug 1, 2013, at 6:28 PM, Paul Ingalls <[email protected]> wrote:
> Couple of questions.
>
> I have migrated my system to use Riak on the back end. I have setup a 1.4
> cluster with 128 partitions on 7 nodes with LevelDB as the store. Each node
> looks like:
>
> Azure Large instance (4CPU 7GB RAM)
> data directory is on a RAID 0
> max files is set to 128
> async thread on the VM is 16
> everything else is defaults
>
> I'm using the 1.4.1 java client, connecting via the protocol buffer cluster.
>
> With this setup, I'm seeing poor throughput on my service load. I ran a test
> for a bit and was seeing only a few gets/puts per second. And then when I
> stopped the client two of the nodes crashed.
>
> I'm very new with Riak, so I figure I'm doing something wrong. I saw a note
> on the list earlier of someone getting well over 1000 puts per second, so I
> know it can move pretty fast.
>
> What is a good strategy for troubleshooting?
>
> How many fetch/update/store loops per second should I expect to see on a
> cluster of this size?
>
> Thanks!
>
> Paul
>
> Paul Ingalls
> Founder & CEO Fanzo
> [email protected]
> @paulingalls
> http://www.linkedin.com/in/paulingalls
>
>
>
_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com