Re: Why I think view generation should be done concurrent.

J Chris Anderson Sun, 04 Jul 2010 07:10:44 -0700

On Jul 4, 2010, at 2:36 AM, Julian Moritz wrote:

> Hi,
> 
> a few days ago I've tweeted a wish to have view generation done
> concurrent. I'll tell you why (because @janl doesn't think so).
> 
> I've got some documents in the form of:
> 
> _id: 1,
> _rev: 3-abc, 
> url: http://www.abc.com,
> hrefs: [http://www.xyz.com, 
>       http://www.nbc.com,
>       ...,
>       ...,
>       ...]
> 
> As you can imagine me crawling the web, I got plenty of them. And every
> second thousands more. I've got a view, map.py is:
> 
> def fun(doc):    
>    h = hash
>    if doc.has_key("hrefs"):
>        for href in doc["hrefs"]:
>            yield (h(href), href), None
> 
> reduce.py is:
> 
> def fun(key, value, rereduce):
>    return True
>


You should remove this reduce function. It's not doing you any good and it's 
burning up your CPU. Things will be much faster without it.

Chris

> If you're not able to read python code: it's generating a large list of
> unique pseudo-randomly ordered urls. I'm calling this view quite often
> (to get new urls to be crawled). 
> 
> What is my problem now? My couchdb process is at 100%cpu and the view
> needs sometimes quite long to be generated (even if I got only testing
> data about 5-10 GB). I've got 4 cores and 3 of them are sleeping. I
> think it could be way more faster if every core was used. What does
> couchdb do with a very large system, let's say 64 atom cores (which
> would be in an idle mode energy saving) and 20TB of data? Using 1 core
> with let's say 1ghz to munch down 20TB? Oh please. 
> 
> Why doesn't couchdb use all cores to generate views?
> 
> Regards
> Julian
> 
> P.S.: Maybe I'm totally wrong and the way you do it is right, but ATM it
> makes me mad to see one core out of four working and the rest is idle.
> 
> 
> 
> 
>

Re: Why I think view generation should be done concurrent.

Reply via email to