On 12 Mar 2010, at 11:56, Julian Stahnke wrote:

> Am 12.03.2010 um 17:24 schrieb J Chris Anderson:
> 
>> 
>> On Mar 12, 2010, at 7:10 AM, Julian Stahnke wrote:
>> 
>>> Hello!
>>> 
>>> I have a problem with a view being slow, even though it’s indexed and 
>>> cached and so on. I have database of books (–120,000 documents) and a 
>>> map/reduce function that counts how many books there are per author. I’m 
>>> then calling the view with ?group=true to get the list. I’m neither 
>>> emitting nor outputting any actual documents, only the counts. This results 
>>> in an output of about 78,000 key/value pairs that look like the following: 
>>> {"key":"Albert Kapr","value":3}.
>>> 
>>> Now, even when the view is indexed and cached, it still takes 60 seconds to 
>>> receive the output, using PHP’s cURL functions, the browser, whatever I’ve 
>>> tried. Getting the same output served from a static file takes only a 
>>> fraction of a second.
>>> 
>>> When I set limit=100, it’s basically instantaneous. I want to sort the 
>>> output by value though, so I can’t really limit it or use ranges. Trying it 
>>> with about 7,000 books, the request takes about 5 seconds, so it seems to 
>>> be linear to the number of lines being output?
>> 
>> For each line of output in the group reduce view, CouchDB must calculate 1 
>> final reduction (even when the intermediate reductions are already cached in 
>> the btree). This is because the btree nodes might not have the exact same 
>> boundaries as your group keys.
>> 
>> There is a remedy. You can replace your simple summing reduce with the text 
>> "_sum" (without quotes). This triggers the same function, but implemented in 
>> Erlang by CouchDB. Most of your slowness is probably due to IO between 
>> CouchDB and serverside JavaScript. Using the _sum function will help with 
>> this.
>> 
>> There will still be a calculation per group reduce row, but the cost is much 
>> lower.
>> 
>> Let us know how much faster this is!
>> 
>> Chris
> 
> Oh wow, thanks! It’s now taking about 4 seconds instead of a minute!
> 
> Is this function documented somewhere? I didn’t come across it anywhere, so I 
> added it to the Performance page in the wiki: 
> http://wiki.apache.org/couchdb/Performance I hope that is okay.

Thanks for adding it :)

Cheers
Jan
--

Reply via email to