Hi,

Am Sonntag, den 04.07.2010, 09:37 -0700 schrieb J Chris Anderson:
> On Jul 4, 2010, at 9:21 AM, Julian Moritz wrote:
> 
> > Am Sonntag, den 04.07.2010, 07:10 -0700 schrieb J Chris Anderson:
> > 
> >>> reduce.py is:
> >>> 
> >>> def fun(key, value, rereduce):
> >>>   return True
> >>> 
> >> 
> >> You should remove this reduce function. It's not doing you any good and 
> >> it's burning up your CPU. Things will be much faster without it.
> >> 
> > 
> > But does the view then still what I want to? I need the keys to be
> > unique.
> > 
> 
> if you just need unique keys, you can replace the text of the python reduce 
> function with "_count" and you will avoid the python overhead for reduce, 
> which will help alot.
> 

ok, thanks.

> also, if what you are really saying is that you only want each URL in your 
> database once, it might make sense to consider using URLs (or URL hashes) as 
> your docids, to prevent duplicates.
> 

nope. I'm yield'ing the _outgoing_ urls of each url. Having one document
per url is another topic (and I do that already).

Regards
Julian

> 
> > Regards
> > Julian
> > 
> >> Chris
> >> 
> >>> If you're not able to read python code: it's generating a large list of
> >>> unique pseudo-randomly ordered urls. I'm calling this view quite often
> >>> (to get new urls to be crawled). 
> >>> 
> >>> What is my problem now? My couchdb process is at 100%cpu and the view
> >>> needs sometimes quite long to be generated (even if I got only testing
> >>> data about 5-10 GB). I've got 4 cores and 3 of them are sleeping. I
> >>> think it could be way more faster if every core was used. What does
> >>> couchdb do with a very large system, let's say 64 atom cores (which
> >>> would be in an idle mode energy saving) and 20TB of data? Using 1 core
> >>> with let's say 1ghz to munch down 20TB? Oh please. 
> >>> 
> >>> Why doesn't couchdb use all cores to generate views?
> >>> 
> >>> Regards
> >>> Julian
> >>> 
> >>> P.S.: Maybe I'm totally wrong and the way you do it is right, but ATM it
> >>> makes me mad to see one core out of four working and the rest is idle.
> >>> 
> >>> 
> >>> 
> >>> 
> >>> 
> >> 
> > 
> > 
> 


Reply via email to