Hello all,

I wanted to inquiry about the general performance of nutch. I have seen
this page here
(http://digitalpebble.blogspot.cz/2013/09/nutch-fight-17-vs-221.html)
where it takes

78minutes

for 1 iteration with 3M urls/ 5K per iteration with 100 urls/host.

I have myself the same setup as in the test but with currently only
around 70k urls in the database.

The steps fetch/parse go very quick but the steps generate/update take
both _forever_. I have for 1 run about 12 hours and by far the most time
is spent at update followed by generate.

Is there ANYTHING I can do to speedup the process? I have a strong
dedicated server with 52GB RAM. One thing I notice is that during
generate/update ALL available RAM is used (Mem:     52438M total,   
52267M used,      170M free,      191M buffers).

I am thankful for any help/feedback!

Domi


Reply via email to