We need all the results right away anyway, so we don't paginate, so once we get to 1.4.6+, being able to skip sorting ought to return some speed to us (and maybe we will leave +S at 6:6). With our small ring size and SSDs we see 3M keys returning in about 120 sec. While that case isn't rare, there are only a handful of queries we run that return over 1M. Will be interesting to compare the speed of unordered result sets in 1.4.6.
So far, we did run into one case for a few days where some servers had gotten some 2i corruption and were returning subsets. We had to make multiple simultaneous requests and union the result sets to compensate. Luckily we completely migrated (manually) to a new cluster soon after, which resolved the issue. The additions to 1.4.6 seem like they will be very helpful, should we encounter something similar again. I realize we use 2i in an atypical way, but by using meaningful keys it is the fastest solution we've come across for retrieval of a set of high churn, tag-indexed keys that won't fit in RAM. We do hope that Yokozuna may replace 2i for us in a more horizontally-scalable way with 2.0, but we haven't yet tested with that. Thanks, Sean > On Jan 10, 2014, at 7:09 AM, Matthew Von-Maszewski <[email protected]> wrote: > > Sean, > > Also you mentioned concern about +S 6:6. 2i queries in 1.4 added "sorting". > Another heavy 2i user noticed that the sorting need more CPU for Erlang. > They were happier after removing the +S. > > And finally, those 2i queries that return "millions of results" … how long do > those queries take to execute? > > Matthew > >> On Jan 9, 2014, at 9:33 PM, Sean McKibben <[email protected]> wrote: >> >> We have a 5 node cluster using elevelDB (1.4.2) and 2i, and this afternoon >> it started responding extremely slowly. CPU on member 4 was extremely high >> and we restarted that process, but it didn’t help. We temporarily shut down >> member 4 and cluster speed returned to normal, but as soon as we boot member >> 4 back up, the cluster performance goes to shit. >> >> We’ve run in to this before but were able to just start with a fresh set of >> data after wiping machines as it was before we migrated to this bare-metal >> cluster. Now it is causing some pretty significant issues and we’re not sure >> what we can do to get it back to normal, many of our queues are filling up >> and we’ll probably have to take node 4 off again just so we can provide a >> regular quality of service. >> >> We’ve turned off AAE on node 4 but it hasn’t helped. We have some transfers >> that need to happen but they are going very slowly. >> >> 'riak-admin top’ on node 4 reports this: >> Load: cpu 610 Memory: total 503852 binary >> 231544 >> procs 804 processes 179850 code >> 11588 >> runq 134 atom 533 ets >> 4581 >> >> Pid Name or Initial Func Time Reds Memory >> MsgQ Current Function >> ------------------------------------------------------------------------------------------------------------------------------- >> <6175.29048.3> proc_lib:init_p/5 '-' 462231 51356760 >> 0 mochijson2:json_bin_is_safe/1 >> <6175.12281.6> proc_lib:init_p/5 '-' 307183 64195856 >> 1 gen_fsm:loop/7 >> <6175.1581.5> proc_lib:init_p/5 '-' 286143 41085600 >> 0 mochijson2:json_bin_is_safe/1 >> <6175.6659.0> proc_lib:init_p/5 '-' 281845 13752 >> 0 sext:decode_binary/3 >> <6175.6666.0> proc_lib:init_p/5 '-' 209113 21648 >> 0 sext:decode_binary/3 >> <6175.12219.6> proc_lib:init_p/5 '-' 168832 16829200 >> 0 riak_client:wait_for_query_results/4 >> <6175.8403.0> proc_lib:init_p/5 '-' 133333 13880 >> 1 eleveldb:iterator_move/2 >> <6175.8813.0> proc_lib:init_p/5 '-' 119548 9000 >> 1 eleveldb:iterator/3 >> <6175.8411.0> proc_lib:init_p/5 '-' 115759 34472 >> 0 riak_kv_vnode:'-result_fun_ack/2-fun-0-' >> <6175.5679.0> proc_lib:init_p/5 '-' 109577 8952 >> 0 riak_kv_vnode:'-result_fun_ack/2-fun-0-' >> Output server crashed: connection_lost >> >> Based on that, is there anything anyone can think to do to try to bring >> performance back in to the land of usability? Does this thing appear to be >> something that may have been resolved in 1.4.6 or 1.4.7? >> >> Only thing we can think of at this point might be to remove or force remove >> the member and join in a new freshly built one, but last time we attempted >> that (on a different cluster) our secondary indexes got irreparably damaged >> and only regained consistency when we copied every individual key to (this) >> new cluster! Not a good experience :( but i’m hopeful that 1.4.6 may have >> addressed some of our issues. >> >> Any help is appreciated. >> >> Thank you, >> Sean McKibben >> >> >> _______________________________________________ >> riak-users mailing list >> [email protected] >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > _______________________________________________ riak-users mailing list [email protected] http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
