Matthew, 

I forgot to add "thanks" for the spreadsheet! I will go through tomorrow (it's 
10 PM here). 


I have turned off AAE for the time being. 



-- 
Dave Brady 

----- Original Message -----

From: "Dave Brady" <[email protected]> 
To: "Matthew Von-Maszewski" <[email protected]> 
Cc: [email protected] 
Sent: Lundi 11 Novembre 2013 21:42:32 
Subject: Re: max_files_limit and AAE 


Hi Matthew, 


Yes, I *absolutely* agree that the current setting is too high. I was just 
hoping to give the nodes way more than enough headroom than I thought they 
needed to run. I planned to reduce the limit if I saw memory pressure. 


I originally had AAE at the default of 20. We first got those errors a couple 
of days ago (which was debilitating; the affected nodes went unresponsive). I 
set it 3000, restarted, and less than one day later the same problem occurred 
(a few hours ago.) 

-- 
Dave Brady 

----- Original Message -----

From: "Matthew Von-Maszewski" <[email protected]> 
To: "Dave Brady" <[email protected]> 
Cc: [email protected] 
Sent: Lundi 11 Novembre 2013 21:16:46 
Subject: Re: max_files_limit and AAE 

hmm … 


128 partitions divide by 5 nodes is ~26 vnodes per server. 


AAE creates a parallel number of vnodes, so your servers have ~52 vnodes each. 


52 x 3,000 is 156,000 files … 156,000 > 65,536 ulimit. Sooner or later 65,536 
will be too small. But ... 


Now, the primary account method in 1.4.2 is memory size allocated based upon 
max_open_files. So you have allocated 4Mbytes x 3,000 x 52 or 624,000Mbytes of 
RAM for leveldb. If you truly have a 624Gbyte machine, sweet! Otherwise, it 
might be time to scale back the max_open_files … and put AAE back to its 
default because it does not need a high max_open_files. 




The tricky part to leveldb configuration is that the max_open_files parameter 
is per vnode / partition, not for the entire server. This per vnode setting has 
caused many to over allocate, and is mostly inconsistent with every other piece 
of software on a Linux server. (And caused a couple of justified rants on this 
user list.) A more sane approach is coming out in Riak 2.0. 


But until then, here is a spreadsheet that can help planning: 









Matthew 



On Nov 11, 2013, at 2:56 PM, Dave Brady < [email protected] > wrote: 




Hey Everyone, 


We have a five-node, 128 partition cluster running 1.4.2 on Debian. 


Is there a doc somewhere that explains how to size max_open_files as it applies 
to AAE? 


I have max_open_files for eLevelDB set to 3000, as we have about 1500 .sst 
files in one VNode's data directory, and the boxes have plenty of RAM. 


I set max_open_files in the AAE section to 3000, too, on whim after we had our 
first issue. Still got these in the logs on a couple of nodes after running for 
less than one day: 



=============== 
2013-11-09 11:37:12.438 [info] 
<0.857.0>@riak_kv_vnode:maybe_create_hashtrees:142 
riak_kv/125597796958124469533129165311555572001681702912: unable to start 
index_hashtree: {error,{{badmatch,{error,{db_open,"IO error: 
/var/lib/riak/anti_entropy/125597796958124469533129165311555572001681702912/LOCK:
 Too many open 
files"}}},[{hashtree,new_segment_store,2,[{file,"src/hashtree.erl"},{line,499}]},{hashtree,new,2,[{file,"src/hashtree.erl"},{line,215}]},{riak_kv_index_hashtree,do_new_tree,2,[{file,"src/riak_kv_index_hashtree.erl"},{line,426}]},{lists,foldl,3,[{file,"lists.erl"},{line,1197}]},{riak_kv_index_hashtree,init_trees,2,[{file,"src/riak_kv_index_hashtree.erl"},{line,366}]},{riak_kv_index_hashtree,init,1,[{file,"src/riak_kv_index_hashtree.erl"},{line,226}]},{gen_server,init_it,6,[{file,"gen_server.erl"},{line,304}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,227}]}]}}
 
2013-11-09 11:37:12.441 [error] <0.5209.2422> gen_server <0.5209.2422> 
terminated with reason: no match of right hand value {error,{db_write,"IO 
error: 
/var/lib/riak/anti_entropy/125597796958124469533129165311555572001681702912/011260.log:
 Too many open files"}} in hashtree:flush_buffer/1 line 302 
2013-11-09 11:37:12.441 [error] <0.5209.2422> CRASH REPORT Process 
<0.5209.2422> with 1 neighbours exited with reason: no match of right hand 
value {error,{db_write,"IO error: 
/var/lib/riak/anti_entropy/125597796958124469533129165311555572001681702912/011260.log:
 Too many open files"}} in hashtree:flush_buffer/1 line 302 in 
gen_server:terminate/6 line 747 
2013-11-09 11:37:12.441 [error] <0.19959.2426> CRASH REPORT Process 
<0.19959.2426> with 0 neighbours exited with reason: no match of right hand 
value {error,{db_open,"IO error: 
/var/lib/riak/anti_entropy/125597796958124469533129165311555572001681702912/LOCK:
 Too many open files"}} in hashtree:new_segment_store/2 line 499 in 
gen_server:init_it/6 line 328 
=============== 


Our init script has "ulimit -n 65536" in it, which I *thought* that would be 
high enough. Maybe not? 



I also made the necessary tweaks to /etc/pam.d/common-session*, so that 
/etc/security/limts.conf would be read, and that did not help. 


Much obliged for any suggestions! 
-- 
Dave Brady 

_______________________________________________ 
riak-users mailing list 
[email protected] 
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com 





_______________________________________________ 
riak-users mailing list 
[email protected] 
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com 

_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to