so imagine a scenario in which there is a crawldb and we run generate on it to get three different segments directories. then we start concurrent fetch job jobs and parse job on those three segments.

then one of them finishes sooner and we then run updatedb on it.

now here is the question is it safe to run invertlink and solrindex only on the segment directory that has finished. I am not sure if I am asking this correctly but my problem is basically with crawl db. both of solrindex and invertlinks get crawldb as one of their inputs. Is it safe to use crawldb for these purposes while there are still "checked out" urls (the two other fetch job that are still running) out there?

thanks,
--
Kaveh Minooie

www.plutoz.com

Reply via email to