Thanks Kelly. Much appreciated! I'll try your suggestions and get back. Jim
From: Kelly McLaughlin <[email protected]> Date: Sun, 23 Oct 2011 22:02:52 -0600 To: Jim Adler <[email protected]> Cc: "[email protected]" <[email protected]> Subject: Re: Key Filter Timeout Jim, A couple of things to note. First, bitcask stores all keys in memory, but eleveldb does not necessarliy, so the performance of your disks could be a factor. Not saying it is, but just a difference to be aware of between bitcask and eleveldb. Second, the latest error you shared was a timeout from the mapreduce operation. You can increase the timeout for the operation by modifying your original query like this: curl -v -d '{"inputs":{"bucket":"nodes","key_filters":[["eq","user_id-xxxxxxx-info"]]}, "query":[{"reduce":{"language":"erlang","module":"riak_kv_mapreduce","functi on":"reduce_identity"}}], "timeout", 120}' -H "Content-Type: application/json" http://xx.xx.xx.xx:8098/mapred Finally, you're using a reduce phase in the query when I think you might be better served by a map phase which will allow you to get more parallelization during the query execution. Try using a map phase with the map_identity function instead of reduce_identity and I suspect you will get better results. Hope that helps and please respond if you have any further questions or problems. Cheers. Kelly On Oct 23, 2011, at 5:40 PM, Jim Adler wrote: > A little context on my use-case here. I've got about 8M keys in this 3 node > cluster. I need to clean out some bad keys and some bad data. So, I'm using > the key filter and search functionality to accomplish this (I tend to use the > riak python client). But, to be honest, I'm having a helluva time getting > these basic tasks accomplished before I ramp to hundreds of millions of keys. > > Thanks for any help. > > Jim > > From: Kelly McLaughlin <[email protected]> > Date: Sun, 23 Oct 2011 14:13:09 -0600 > To: Jim Adler <[email protected]> > Cc: "[email protected]" <[email protected]> > Subject: Re: Key Filter Timeout > > Jim, > > Looks like you are possibly using both the legacy key listing option and the > legacy map reduce. Assuming all your nodes are on Riak 1.0, check your > app.config files on all nodes and make sure mapred_system is set to pipe and > legacy_keylisting is set to false. If that's not already the case you should > see better performance. If you are still getting the same or similar errors > with those setting in place, please respond with what they are so we can look > into it more. Thanks. > > Kelly > > On Oct 23, 2011, at 12:38 PM, Jim Adler wrote: > >> I'm trying to run a very simplified key filter that's timing out. I've got >> about 8M keys in a 3-node cluster, 15 GB memory, num_partitions=256, LevelDB >> backend. >> >> I'm thinking this should be pretty quick. What am I doing wrong? >> >> Jim >> >> Here's the query: >> >> curl -v -d >> '{"inputs":{"bucket":"nodes","key_filters":[["eq","user_id-xxxxxxx-info"]]}," >> query":[{"reduce":{"language":"erlang","module":"riak_kv_mapreduce","function >> ":"reduce_identity"}}]}' -H "Content-Type: application/json" >> http://xx.xx.xx.xx:8098/mapred >> >> Here's the log: >> >> 18:25:08.892 [error] gen_fsm <0.20795.0> in state executing terminated with >> reason: {error,flow_timeout} >> 18:25:08.961 [error] CRASH REPORT Process <0.20795.0> with 2 neighbours >> crashed with reason: {error,flow_timeout} >> 18:25:08.963 [error] Supervisor luke_flow_sup had child undefined started >> with {luke_flow,start_link,undefined} at <0.20795.0> exit with reason >> {error,flow_timeout} in context child_terminated >> 18:25:08.966 [error] gen_fsm <0.20798.0> in state waiting_kl terminated with >> reason: {error,flow_timeout} >> 18:25:08.971 [error] CRASH REPORT Process <0.20798.0> with 0 neighbours >> crashed with reason: {error,flow_timeout} >> 18:25:08.980 [error] Supervisor riak_kv_keys_fsm_legacy_sup had child >> undefined started with {riak_kv_keys_fsm_legacy,start_link,undefined} at >> <0.20798.0> exit with reason {error,flow_timeout} in context child_terminated >> 18:25:08.983 [error] Supervisor luke_phase_sup had child undefined started >> with {luke_phase,start_link,undefined} at <0.20797.0> exit with reason >> {error,flow_timeout} in context child_terminated >> 18:25:08.996 [error] Supervisor luke_phase_sup had child undefined started >> with {luke_phase,start_link,undefined} at <0.20796.0> exit with reason >> {error,flow_timeout} in context child_terminated >> >> >> _______________________________________________ >> riak-users mailing list >> [email protected] >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >
_______________________________________________ riak-users mailing list [email protected] http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
