Hi José-Marcio, it's possible to do the indexing at the end or somewhere in the middle indexing multiple segments in one turn. The same applies to LinkDb updates (for anchor texts) and (optionally) the link rank calculation.
If there is no need to update the index as soon as possible (that's what most users probably want), you could change the crawl script: keep fetched segments in a list and path them to the "invertlinks" (if desired) and "index" tools. If the crawl runs only once and is started from scratch the next time, the "-dir" argument allows to index all segments in one turn. Cheers, Sebastian On 07/03/2016 09:49 AM, Jose Marcio Martins da Cruz wrote: > > Hello > > bin/crawl algorithm looks something like : > > ******************************* > # prepare > inject > > while ... > # crawl > generate > fetch > parse > > # post-processing > updatedb > invertlinks > dedup > > # do index > if $DoIndex > then > index > clean > endif > > # do webgraph > if $DoWebgraph > then > webgraph > linkrank > scoreupdater > nodedumper > endif > done > ************************ > > is there a reason to maintain "index" and "webgraph" parts inside the loop ? > > What happens if I put them out of the loop and run them after all rounds ? > What about the > "post-processing" part ?" > > OBS : I'm crawling in small rounds (30 minutes) because "Crawl-delay" of > sites I'm crawling are > heterogeneous and doing multiple small rounds is more efficient than a single > long round. > > Regards > > José-Marcio > >

