Yes I have been warned that query index each time before adding doc to index might be resource consuming. Will check it.
As for the overwrite parameter I think the name is not the best then. People outside the "business" like me misuse it and assume what I wrote. Overwrite shall mean what it means. But I understand what it does in fact and so my way is to write custom update processor plugin. Best Regards Alexander Aristov On 28 December 2011 22:16, Chris Hostetter <hossman_luc...@fucit.org> wrote: > > : That said, writing your own update request handler > : that detected this case isn't very difficult, > : extend UpdateRequestProcessorFactory/UpdateRequestProcessor > : and use it as a plugin. > > i can't find the thread at the moment, but the general issue that has > caused people headaches with this type of approach in the past has been > that the performance of doing a query on every update (to see if the doc > is already in the index) can slow things down quite a bit -- in your > usecase it may not be a significant bottleneck, but that's the general > issue that has come up i nthe past. > > If you look at systems (like nutch) that do large scale crawling, they > treat the crawl phrase independent from the indexing phase precisesly for > reasons like this -- so the crawler can dedup the documents (by unique > URL) and eliminate duplication before ever even adding them to the index. > > : >> > I wonder why simple the overwrite parameter doesn't work here. > ... > : >> > 2. overwrite=false and uniqueID exists then newer doc must be > skipped > : >> since > : >> > old exists. > > that is not what overwrite=false does (or was ever designed to do). > overwrite=false is a way to tell Solr that you are already certain that > the documents being added do not exist in the index, therefore Solr can > save time by not attempting to overwrite an existing document. It is > intended for situations where you are bulk loading documents, ie: doing an > initial build of an index from a system of record (ie: a single pass over > adatabase that uses the same unique key) or importing documents from a > new system of record with a completley differnet id space. > > > > -Hoss >