You might try using the old fetcher, net.nutch.fetcher.Fetcher. Mike & I have rejuvinated this some recently. It now observes robots.txt, and it doesn't seem to hang. Be sure to get the latest version, either from CVS or in a nightly build. (I'm fixing some bugs in it today so grab it tomorrow.)

To use the old fetcher:

bin/nutch net.nutch.fetcher.Fetcher ...

Doug

Byron Miller wrote:
I am running a fetch process on a P4 2.6ghz HT with 1
gig ram and 4x120 gig drives in a raid 0 (stripped)
format.

nutch fetch process was started from a fresh index
with the entire dmoz rdf process imported.  The first
12 hours or so of the fetch seemed to sustain 3.5 to
4.5mpbs and now it seems to be swapped out about 1.5
gigs and having a 10-15 minute pause after a 10-15
minute fetch (while kswapd appears to go nuts
swapping)

Is there some tweaking i can do to fix this? Too many
threads going?? (pretty much a default nutch config).

If i kill the process and restart - i know i will have
to touch the fetch.done files and such, but will i
have to re-inject the db with the urls to spider or
will they be picked up the next time i restart after a
few db analyze processes?


------------------------------------------------------- This SF.Net email is sponsored by: IBM Linux Tutorials Free Linux tutorial presented by Daniel Robbins, President and CEO of GenToo technologies. Learn everything from fundamentals to system administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click _______________________________________________ Nutch-developers mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/nutch-developers


-------------------------------------------------------
This SF.net email is sponsored by: The Robotic Monkeys at ThinkGeek
For a limited time only, get FREE Ground shipping on all orders of $35
or more. Hurry up and shop folks, this offer expires April 30th!
http://www.thinkgeek.com/freeshipping/?cpg=12297
_______________________________________________
Nutch-developers mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to