On Mar 12, 2010, at 7:10 AM, Julian Stahnke wrote:

> Hello!
> 
> I have a problem with a view being slow, even though it’s indexed and cached 
> and so on. I have database of books (–120,000 documents) and a map/reduce 
> function that counts how many books there are per author. I’m then calling 
> the view with ?group=true to get the list. I’m neither emitting nor 
> outputting any actual documents, only the counts. This results in an output 
> of about 78,000 key/value pairs that look like the following: {"key":"Albert 
> Kapr","value":3}.
> 
> Now, even when the view is indexed and cached, it still takes 60 seconds to 
> receive the output, using PHP’s cURL functions, the browser, whatever I’ve 
> tried. Getting the same output served from a static file takes only a 
> fraction of a second.
> 
> When I set limit=100, it’s basically instantaneous. I want to sort the output 
> by value though, so I can’t really limit it or use ranges. Trying it with 
> about 7,000 books, the request takes about 5 seconds, so it seems to be 
> linear to the number of lines being output?

For each line of output in the group reduce view, CouchDB must calculate 1 
final reduction (even when the intermediate reductions are already cached in 
the btree). This is because the btree nodes might not have the exact same 
boundaries as your group keys.

There is a remedy. You can replace your simple summing reduce with the text 
"_sum" (without quotes). This triggers the same function, but implemented in 
Erlang by CouchDB. Most of your slowness is probably due to IO between CouchDB 
and serverside JavaScript. Using the _sum function will help with this.

There will still be a calculation per group reduce row, but the cost is much 
lower.

Let us know how much faster this is!

Chris


> 
> I’m using CouchDB 0.10.1 (the one that’s in homebrew) on a 2006 MacBook Pro.
> 
> Am I doing anything wrong, or should this really take so long? I wasn’t able 
> to find any information about this—only about indexing being slow, but that 
> doesn’t seem to be my problem.
> 
> Maybe I should also mention that I’m an interaction design student who used 
> to be a front-end dev, but not a ‘real’ programmer.
> 
> Thanks for any help!
> 
> Best,
> Julian
> 
> 
> For reference, the map function:
> 
> function (doc)
> {
>    if (doc.author) {
>               for (i = 0; i < doc.author.length; i++) {
>                       emit(doc.author[i], 1);
>               }
>    } else {
>        emit(null, 1);        
>    }
> }
> 
> The reduce function: 
> 
> function (keys, values, rereduce)
> {
>    return sum(values);
> }
> 
> Some sample output:
> 
> {"rows":[
> {"key":null,"value":1542},
> {"key":"... Hans Arp ... /Konzept: Hans Christian Tavel .../","value":1},
> ---more rows---
> {"key":"Zwi Erich Kurzweil","value":1}
> ]}

Reply via email to