I have also run into this issue. Our problem was that we were performing analysis on the URLs in Solr and adding data in various fields which get overwritten at the next index. We had to edit the source to fix our issue.
In terms of solving it - what is your main issue with that? Is it that you are looking for a more efficient workflow or is it something else? On Wed, May 1, 2013 at 7:32 AM, Bai Shen <[email protected]> wrote: > My crawl loop consists of the following. > > generate -topN > fetch -all > parse -all > updatedb > solrindex -all > > With the fetch and parse the -all only pulls the batch that was generated, > skipping all of the other urls. However, the solrindex seems to be > equivalent to -reindex, commiting everything not just what hasn't been > sent. > > Anyone else run into this issue? > > Thanks. >

