Re: Continuous crawling

Markus Jelsma Thu, 10 Nov 2011 11:02:40 -0800

I prefer a suite of shell scripts and cron jobs. We simply generate many 
segments at once, have a cron job checking for available segments we can fetch 
and fetch them. If all are fetched, the segemnts are moved to a queue 
directory for updating the DB. Once the DB has been updated the generators are 
triggered and the whole circus repeats.



> I've done some searching on this, but haven't found any real solutions.  Is
> there an existing way to do a continuous crawl using Nutch?  I know I can
> use the bin/nutch crawl command, but that stops after a certain number of
> iterations.
> 
> Right now I'm working on a java class to do it, but I would assume it's a
> problem that's been solved already.  Unfortunately I can't seem to find any
> evidence of this.
> 
> Thanks.

Re: Continuous crawling

Reply via email to