Hey Luke, Here you go. Last night I believe we had ~70 partitions still to transfer.
[[email protected] ~] # [[email protected] ~] # [[email protected] ~] # riak-admin transfers '[email protected]' waiting to handoff 2 partitions '[email protected]' waiting to handoff 2 partitions '[email protected]' waiting to handoff 21 partitions Active Transfers: transfer type: ownership_transfer vnode type: riak_kv_vnode partition: 422465317040964124793252646957050560369293000704 started: 2014-09-09 16:15:11 [48.93 min ago] last update: 2014-09-09 17:04:06 [887.56 ms ago] total size: 47814241779 bytes objects transferred: 326825 111 Objs/s xxxx_prod_cluster =======> xxxx_prod_cluster @192.168.72.176 @192.168.72.19 |=================== | 45% 7.13 MB/s transfer type: hinted_handoff vnode type: riak_kv_vnode partition: 1004782375664995756265033322492444576013453623296 started: 2014-09-09 16:36:37 [27.50 min ago] last update: 2014-09-09 17:04:07 [411.93 ms ago] total size: 47179582874 bytes objects transferred: 213001 129 Objs/s xxxx_prod_cluster =======> xxxx_prod_cluster @192.168.72.19 @192.168.72.7 |============= | 31% 8.48 MB/s On Tue, Sep 9, 2014 at 7:22 AM, Luke Bakken <[email protected]> wrote: > Hi Peter, > > Could you please provide the output of "riak-admin transfers" ? > -- > Luke Bakken > Engineer / CSE > [email protected] > > > On Mon, Sep 8, 2014 at 10:01 AM, Peter Bakkum <[email protected]> wrote: > > Hey all, > > > > Looking for some guidance on a problem we're seeing in production right > now. > > We're not Riak experts so please bear with us. > > > > We had a member of our 6-node Riak cluster appear to fall out (riak-admin > > member status on that node only showed itself). So I ran a riak-admin > join > > and riak-admin commit to get the node back in the cluster. Node discovery > > appears to work now, but for some reason that node is now using a huge > > amount of disk space. It appears that the partition balancing process is > > creating this condition, and still hasn't completed after ~16 hours. The > > cluster is still functional and serving our production traffic, and > taking > > the entire cluster offline isn't an option for us. > > > > Most of our nodes use about 450GB of space, this node in particular is > using > > around 1.2TB, which is pushing the limit of its disk. > > > > Questions: > > Whats happening here? Is this expected? > > > > Whats the best course of action? Should we clear out this node and > attempt > > to join the cluster again? > > > > Here are some stats from the node in question. Let me know if anything > else > > would be helpful. > > > > Thanks for your help. > > > > > > [[email protected] /data/lib/riak] # riak-admin member-status > > ================================= Membership > > ================================== > > Status Ring Pending Node > > > ------------------------------------------------------------------------------- > > valid 20.3% 16.4% '[email protected]' > > valid 18.0% 17.2% '[email protected]' > > valid 20.3% 17.2% '[email protected]' > > valid 7.0% 16.4% '[email protected]' > > valid 17.2% 16.4% '[email protected]' > > valid 17.2% 16.4% '[email protected]' > > > > > > [[email protected] /data/lib/riak] # riak-admin status > > 1-minute stats for '[email protected]' > > ------------------------------------------- > > riak_kv_stat_ts : 1410194287 > > vnode_gets : 1607 > > vnode_gets_total : 563683 > > vnode_puts : 39 > > vnode_puts_total : 5459724 > > vnode_index_refreshes : 0 > > vnode_index_refreshes_total : 0 > > vnode_index_reads : 0 > > vnode_index_reads_total : 0 > > vnode_index_writes : 39 > > vnode_index_writes_total : 5459724 > > vnode_index_writes_postings : 0 > > vnode_index_writes_postings_total : 5227558 > > vnode_index_deletes : 0 > > vnode_index_deletes_total : 0 > > vnode_index_deletes_postings : 39 > > vnode_index_deletes_postings_total : 30613 > > node_gets : 3602 > > node_gets_total : 2463956 > > node_get_fsm_siblings_mean : 1 > > node_get_fsm_siblings_median : 1 > > node_get_fsm_siblings_95 : 2 > > node_get_fsm_siblings_99 : 3 > > node_get_fsm_siblings_100 : 12 > > node_get_fsm_objsize_mean : 52047 > > node_get_fsm_objsize_median : 26936 > > node_get_fsm_objsize_95 : 167435 > > node_get_fsm_objsize_99 : 267979 > > node_get_fsm_objsize_100 : 1313716 > > node_get_fsm_time_mean : 12223 > > node_get_fsm_time_median : 6675 > > node_get_fsm_time_95 : 37390 > > node_get_fsm_time_99 : 87046 > > node_get_fsm_time_100 : 345380 > > node_puts : 39 > > node_puts_total : 24915 > > node_put_fsm_time_mean : 4419 > > node_put_fsm_time_median : 2444 > > node_put_fsm_time_95 : 12890 > > node_put_fsm_time_99 : 18775 > > node_put_fsm_time_100 : 18775 > > read_repairs : 0 > > read_repairs_total : 0 > > coord_redirs_total : 17022 > > executing_mappers : 0 > > precommit_fail : 0 > > postcommit_fail : 0 > > index_fsm_create : 0 > > index_fsm_create_error : 0 > > index_fsm_active : 0 > > list_fsm_create : 0 > > list_fsm_create_error : 0 > > list_fsm_active : 0 > > pbc_active : 0 > > pbc_connects : 1 > > pbc_connects_total : 508 > > node_get_fsm_active : 1 > > node_get_fsm_active_60s : 3530 > > node_get_fsm_in_rate : 55 > > node_get_fsm_out_rate : 56 > > node_get_fsm_rejected : 0 > > node_get_fsm_rejected_60s : 0 > > node_get_fsm_rejected_total : 0 > > node_put_fsm_active : 0 > > node_put_fsm_active_60s : 67 > > node_put_fsm_in_rate : 1 > > node_put_fsm_out_rate : 1 > > node_put_fsm_rejected : 0 > > node_put_fsm_rejected_60s : 0 > > node_put_fsm_rejected_total : 0 > > leveldb_read_block_error : 0 > > riak_pipe_stat_ts : 1410194286 > > pipeline_active : 0 > > pipeline_create_count : 0 > > pipeline_create_one : 0 > > pipeline_create_error_count : 0 > > pipeline_create_error_one : 0 > > cpu_nprocs : 426 > > cpu_avg1 : 1352 > > cpu_avg5 : 1260 > > cpu_avg15 : 1137 > > mem_total : 15666507776 > > mem_allocated : 15479640064 > > disk : [{"/",8256952,60}, > > {"/dev/shm",7649660,0}, > > {"/tmpfs",1048576,14}, > > {"/tmpfs_mp3",1048576,0}, > > {"/data",1514123712,81}] > > nodename : '[email protected]' > > connected_nodes : ['[email protected]', > > '[email protected]', > > '[email protected]', > > '[email protected]', > > '[email protected]'] > > sys_driver_version : <<"2.0">> > > sys_global_heaps_size : 0 > > sys_heap_type : private > > sys_logical_processors : 4 > > sys_otp_release : <<"R15B01">> > > sys_process_count : 2469 > > sys_smp_support : true > > sys_system_version : <<"Erlang R15B01 (erts-5.9.1) [source] [64-bit] > > [smp:4:4] [async-threads:64] [kernel-poll:true]">> > > sys_system_architecture : <<"x86_64-unknown-linux-gnu">> > > sys_threads_enabled : true > > sys_thread_pool_size : 64 > > sys_wordsize : 8 > > ring_members : ['[email protected]', > > '[email protected]', > > '[email protected]', > > '[email protected]', > > '[email protected]', > > '[email protected]'] > > ring_num_partitions : 128 > > ring_ownership : <<"[{'[email protected]',23},\n > > {'[email protected]',22},\n > > {'[email protected]',26},\n > > {'[email protected]',26},\n > > {'[email protected]',22},\n > > {'[email protected]',9}]">> > > ring_creation_size : 128 > > storage_backend : riak_kv_eleveldb_backend > > erlydtl_version : <<"0.7.0">> > > riak_control_version : <<"1.4.10-0-g73c43c3">> > > cluster_info_version : <<"1.2.4">> > > riak_search_version : <<"1.4.10-0-g6e548e7">> > > merge_index_version : <<"1.3.2-0-gcb38ee7">> > > riak_kv_version : <<"1.4.10-0-g64b6ad8">> > > sidejob_version : <<"0.2.0">> > > riak_api_version : <<"1.4.10-0-gc407ac0">> > > riak_pipe_version : <<"1.4.10-0-g9353526">> > > riak_core_version : <<"1.4.10">> > > bitcask_version : <<"1.6.6-0-g230b6d6">> > > basho_stats_version : <<"1.0.3">> > > webmachine_version : <<"1.10.4-0-gfcff795">> > > mochiweb_version : <<"1.5.1p6">> > > inets_version : <<"5.9">> > > erlang_js_version : <<"1.2.2">> > > runtime_tools_version : <<"1.8.8">> > > os_mon_version : <<"2.2.9">> > > riak_sysmon_version : <<"1.1.3">> > > ssl_version : <<"5.0.1">> > > public_key_version : <<"0.15">> > > crypto_version : <<"2.1">> > > sasl_version : <<"2.2.1">> > > lager_version : <<"2.0.1">> > > goldrush_version : <<"0.1.5">> > > compiler_version : <<"4.8.1">> > > syntax_tools_version : <<"1.6.8">> > > stdlib_version : <<"1.18.1">> > > kernel_version : <<"2.15.1">> > > memory_total : 130705264 > > memory_processes : 55557705 > > memory_processes_used : 55341757 > > memory_system : 75147559 > > memory_atom : 545377 > > memory_atom_used : 527226 > > memory_binary : 12172712 > > memory_code : 11674242 > > memory_ets : 11913912 > > > > > > > > _______________________________________________ > > riak-users mailing list > > [email protected] > > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > >
_______________________________________________ riak-users mailing list [email protected] http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
