Hello. I have a pretty simple pair of map and reduce functions. The first is basically just emitting a key and a 1, and the reduce is the built-in _sum function. This works fine, and tells me how many times every key has been seen.
Now, the problem is that I'm actually only interested in the handful of keys that have been seen the most often. The data fits a power-law distribution, which means that there is a long tail that I'm not at all interested in. And by "long" here I'm talking about tens of thousands of rows. At the moment, my client-side code spends more than 99.9% of its runtime receiving and parsing JSON from the CouchDB server, very nearly all of which it will promptly throw away as soon as it's been parsed. This is annoying and silly. Is there any way at all to filter the results of a reduced query on the CouchDB end? Alternatively, is there a way for a reduce function to know that it's the final stage in the re-reduce chain (if I could drop all keys with a final value of 1, I'd save an order of magnitude of runtime)? I can't be the first one ever to run into a problem like this, but I've failed to find any solutions on the net. -- Calle Dybedahl [email protected] -*- +46 703 - 970 612
