Matthias Jaekle wrote:
I have a few quesions about nutch. Currently I am trying to understand how everyting is working.

* Is there any more documentation about nutch, then I found on nutch.org and sourceforge.net ?

Mike has written some internal documentation that recently posted to the nutch-developers list. This should make it to the website soon.


* Would it be useful to set up a wiki where everybody could take part on writing a detailed documentation?

Maybe. I just set one up, at:


http://www.nutch.org/cgi-bin/twiki/view/Main/Nutch

Mike, do you want to add your documents here instead?

* Is there anything about the format of banned-hosts.txt ?

I would recommend using regex-urlfilter-default.txt instead of banned-hosts.txt.


* And last, a question about fetcher:
Sometimes my fetcher does not finish its job. Over a long time I get the massege that there are still a few elements remaining in the HostQueues.
In that cases I kill the fetcher and add a fetcher.done file to continue.
I added the last output of fetcher. Maybe somebody could tell me what's going on.

There's a bug in the fetcher, and our fetcher developer is no longer permitted to maintain it by his employer. Until this is fixed, you're doing the right thing.


Doug


------------------------------------------------------- SF.Net is sponsored by: Speed Start Your Linux Apps Now. Build and deploy apps & Web services for Linux with a free DVD software kit from IBM. Click Now! http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click _______________________________________________ Nutch-general mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to