Re: Fetcher threads & automation

Dennis Kubes Sun, 28 Jan 2007 08:47:46 -0800

We have a python script with logging which fully automates the fetchingand updating process, not the invert links or the indexing process. Ifanybody wants a copy, send me an email and I will send you a copy.

We are currently working on a more in-depth framework for automatingthese types of job streams in python but that is not complete yet.


Andrzej, do you think this is something we should post to the wiki?

Dennis Kubes

Justin Hartman wrote:

Hi all

Just have a couple more questions which remain unclear to me at this stage.

1. I'm fetching urls on a P4 2.8ghz machine with 1GB ram and 100mbps
connection. Based on this config what would you recommend the maximum
fetcher threads should be?

2. Does anyone know of a script or plugin that can automate the
segment/fetch/indexing process? Basicallly I'm fetching about 20
million pages and I have to run the segment, fetch and index process
myself in a shell (which takes some time). I really would like some
sort of a shell script that I can run and the whole process can run as
a daemon in the background and I can worry about other issues.

Thank you in advance!!!!

Re: Fetcher threads & automation

Reply via email to