Hi, We want to have continuous indexing with NutchGora and are wondering what implementation others might already use?
Our current thinking is along these lines: A helper script (perl) to start the Crawler. The helper script runs often from cron, it has 2 main tasks -Overruns are avoided by use of a pid file, so only 1 Crawler at a time runs. -Faliures are recorded with a lock file, this stops subsequent runs until the problem is resolved. The custom Crawler records the batch-id in the log so we can resolve issue and run manually before removing the lockfile. One thing that might also be useful in NutchGora is something like a batch-prefix, e.g. covert a date to Epoch-secs, then e,g, NutchJob.shouldProcess can compare current marks with the prefix to index all recods since that date for example, any thoughts on this ? Cheers, Dan

