On Tuesday 22 March 2011 14:14:06 Gabriele Kahlout wrote: > > > > Yes, you need to wait. You must finish the fetch, then parse the fetch > > and update the crawldb (and optionally the linkdb). Finally you must > > index and only then are your documents searchable. > > > > I can see injecting fewer urls at a time. I.e. I complete a > inject-fetch-index cycle and then re-start it with new urls.
You don't need to inject every cycle. Inject once then repeat the following cycle: - fetch - parse - update linkdb and crawldb - index > Q1: After the 1st iteration can I start searching, while the 2nd iteration > is in progress? Yes. Once you indexed the data you can start the 2nd iteration and search. > Q2: during the fetch of the 2nd iteration, what prevents fetch from > fetching again what was fetched in the 1st iteration (assuming it's still > before db.fetch.interval.default)? Well, if fetch_time + interval < NOW then it won't get fetched. > > > I'm not sure if fetching fewer segments and index them, and then fetch more > (i.e. iterate only fetch-index) is a better option, such that after the 1st > iteration I can start searching. > > > Thank you. > > > > >but remember that results don't come available for searching > > > >immediately after > > > > > > *fetching*. *all* pages must be fetched andf then* indexed* first to > > > be searchable. > > > > -- > > Markus Jelsma - CTO - Openindex > > http://www.linkedin.com/in/markus17 > > 050-8536620 / 06-50258350 -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350

