Hello, We store the following in our Riak cluster: - feeds as a list of 10 keys to entries. All the keys are like this: feedKey-entryKey - entries as a complex JSON object.
We try to avoid losing track of any entryKey by deleting it from the feed object only when corresponding object has been deleted. Yet, due to a bug in our implementation, we have 'lost' some entries. In other words, some feedKey-entryKey elements are not in any feed object. We're now trying to find the best way to "clean" that mess :) Our initial solution was to list all the feed keys, and then, for each, issue a mapReduce object to list all entries whose key start with feedKey. Then, we can compare the expected list of entryKey (stored in the feedKey) with the actual list of feedKey-* elements and delete the extra ones. In practice, that would be about 500,000 map reduce jobs. We're thinking that may not be the solution (and it can take litterally weeks to complete as each mapReduce job takes about 10secs to complete) We're now thinking there may be a better way? Maybe with a single mapReduce job which would iterate over all the entry keys and then only keep track of the feedKey that have more than 10 elements? This would probably cut down very significantly the number of map reduce as we would run them only on the few (maybe 1%?) feedKey for which there are 'lost' entries? Maybe there would be a better way? Any idea? Thanks
_______________________________________________ riak-users mailing list [email protected] http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
