Hi Matthew,
Yes, I *absolutely* agree that the current setting is too high. I was just hoping to give the nodes way more than enough headroom than I thought they needed to run. I planned to reduce the limit if I saw memory pressure. I originally had AAE at the default of 20. We first got those errors a couple of days ago (which was debilitating; the affected nodes went unresponsive). I set it 3000, restarted, and less than one day later the same problem occurred (a few hours ago.) -- Dave Brady ----- Original Message ----- From: "Matthew Von-Maszewski" <[email protected]> To: "Dave Brady" <[email protected]> Cc: [email protected] Sent: Lundi 11 Novembre 2013 21:16:46 Subject: Re: max_files_limit and AAE hmm … 128 partitions divide by 5 nodes is ~26 vnodes per server. AAE creates a parallel number of vnodes, so your servers have ~52 vnodes each. 52 x 3,000 is 156,000 files … 156,000 > 65,536 ulimit. Sooner or later 65,536 will be too small. But ... Now, the primary account method in 1.4.2 is memory size allocated based upon max_open_files. So you have allocated 4Mbytes x 3,000 x 52 or 624,000Mbytes of RAM for leveldb. If you truly have a 624Gbyte machine, sweet! Otherwise, it might be time to scale back the max_open_files … and put AAE back to its default because it does not need a high max_open_files. The tricky part to leveldb configuration is that the max_open_files parameter is per vnode / partition, not for the entire server. This per vnode setting has caused many to over allocate, and is mostly inconsistent with every other piece of software on a Linux server. (And caused a couple of justified rants on this user list.) A more sane approach is coming out in Riak 2.0. But until then, here is a spreadsheet that can help planning: Matthew On Nov 11, 2013, at 2:56 PM, Dave Brady < [email protected] > wrote: Hey Everyone, We have a five-node, 128 partition cluster running 1.4.2 on Debian. Is there a doc somewhere that explains how to size max_open_files as it applies to AAE? I have max_open_files for eLevelDB set to 3000, as we have about 1500 .sst files in one VNode's data directory, and the boxes have plenty of RAM. I set max_open_files in the AAE section to 3000, too, on whim after we had our first issue. Still got these in the logs on a couple of nodes after running for less than one day: =============== 2013-11-09 11:37:12.438 [info] <0.857.0>@riak_kv_vnode:maybe_create_hashtrees:142 riak_kv/125597796958124469533129165311555572001681702912: unable to start index_hashtree: {error,{{badmatch,{error,{db_open,"IO error: /var/lib/riak/anti_entropy/125597796958124469533129165311555572001681702912/LOCK: Too many open files"}}},[{hashtree,new_segment_store,2,[{file,"src/hashtree.erl"},{line,499}]},{hashtree,new,2,[{file,"src/hashtree.erl"},{line,215}]},{riak_kv_index_hashtree,do_new_tree,2,[{file,"src/riak_kv_index_hashtree.erl"},{line,426}]},{lists,foldl,3,[{file,"lists.erl"},{line,1197}]},{riak_kv_index_hashtree,init_trees,2,[{file,"src/riak_kv_index_hashtree.erl"},{line,366}]},{riak_kv_index_hashtree,init,1,[{file,"src/riak_kv_index_hashtree.erl"},{line,226}]},{gen_server,init_it,6,[{file,"gen_server.erl"},{line,304}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,227}]}]}} 2013-11-09 11:37:12.441 [error] <0.5209.2422> gen_server <0.5209.2422> terminated with reason: no match of right hand value {error,{db_write,"IO error: /var/lib/riak/anti_entropy/125597796958124469533129165311555572001681702912/011260.log: Too many open files"}} in hashtree:flush_buffer/1 line 302 2013-11-09 11:37:12.441 [error] <0.5209.2422> CRASH REPORT Process <0.5209.2422> with 1 neighbours exited with reason: no match of right hand value {error,{db_write,"IO error: /var/lib/riak/anti_entropy/125597796958124469533129165311555572001681702912/011260.log: Too many open files"}} in hashtree:flush_buffer/1 line 302 in gen_server:terminate/6 line 747 2013-11-09 11:37:12.441 [error] <0.19959.2426> CRASH REPORT Process <0.19959.2426> with 0 neighbours exited with reason: no match of right hand value {error,{db_open,"IO error: /var/lib/riak/anti_entropy/125597796958124469533129165311555572001681702912/LOCK: Too many open files"}} in hashtree:new_segment_store/2 line 499 in gen_server:init_it/6 line 328 =============== Our init script has "ulimit -n 65536" in it, which I *thought* that would be high enough. Maybe not? I also made the necessary tweaks to /etc/pam.d/common-session*, so that /etc/security/limts.conf would be read, and that did not help. Much obliged for any suggestions! -- Dave Brady _______________________________________________ riak-users mailing list [email protected] http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
_______________________________________________ riak-users mailing list [email protected] http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
