Hi, Am Sonntag, den 04.07.2010, 09:37 -0700 schrieb J Chris Anderson: > On Jul 4, 2010, at 9:21 AM, Julian Moritz wrote: > > > Am Sonntag, den 04.07.2010, 07:10 -0700 schrieb J Chris Anderson: > > > >>> reduce.py is: > >>> > >>> def fun(key, value, rereduce): > >>> return True > >>> > >> > >> You should remove this reduce function. It's not doing you any good and > >> it's burning up your CPU. Things will be much faster without it. > >> > > > > But does the view then still what I want to? I need the keys to be > > unique. > > > > if you just need unique keys, you can replace the text of the python reduce > function with "_count" and you will avoid the python overhead for reduce, > which will help alot. >
ok, thanks. > also, if what you are really saying is that you only want each URL in your > database once, it might make sense to consider using URLs (or URL hashes) as > your docids, to prevent duplicates. > nope. I'm yield'ing the _outgoing_ urls of each url. Having one document per url is another topic (and I do that already). Regards Julian > > > Regards > > Julian > > > >> Chris > >> > >>> If you're not able to read python code: it's generating a large list of > >>> unique pseudo-randomly ordered urls. I'm calling this view quite often > >>> (to get new urls to be crawled). > >>> > >>> What is my problem now? My couchdb process is at 100%cpu and the view > >>> needs sometimes quite long to be generated (even if I got only testing > >>> data about 5-10 GB). I've got 4 cores and 3 of them are sleeping. I > >>> think it could be way more faster if every core was used. What does > >>> couchdb do with a very large system, let's say 64 atom cores (which > >>> would be in an idle mode energy saving) and 20TB of data? Using 1 core > >>> with let's say 1ghz to munch down 20TB? Oh please. > >>> > >>> Why doesn't couchdb use all cores to generate views? > >>> > >>> Regards > >>> Julian > >>> > >>> P.S.: Maybe I'm totally wrong and the way you do it is right, but ATM it > >>> makes me mad to see one core out of four working and the rest is idle. > >>> > >>> > >>> > >>> > >>> > >> > > > > >
