RE: Replacing existing documents
Hello, Recently someone mentioned that it would be possible to have a 'replace existing document' feature rather than just dropping and adding documents with the same unique id. AFAIK, this is not possible. You have the update in lucene, but internally it just does a delete/add operation We have a few use cases in this area and I'm researching whether it is effective to check for a document via Solr queries, or whether it is worthwhile to add this to the Solr implementation. What are the usecases?? I do not see what you mean. Does anyone have an estimate for the difference between querying, day, 100 documents by unique ID from the network v.s. fetching them directly from the index? Depends of course from the networkfetching them from the index is fast normally. One use case is that we would like to use the index as our one database for documents, and if we delete a document we want it to stay deleted. Thus we would mark it deleted and check for its existence. I suppose you mark it deleted by setting some flag (like lucene Field: isDeleted set to true). I am not sure wether using the lucene index as your database is really smart...i might get corrupt. I would at least suggest to backup it frequently Regards Ard ps sry for my annoying .. because i am using a web mail client Another use case is that we are re-adding the same document a few times a day, and the commit times are ballooning. Where would I implement this? Thanks, Lance
Re: Replacing existing documents
On Aug 21, 2007, at 9:25 PM, Lance Norskog wrote: Recently someone mentioned that it would be possible to have a 'replace existing document' feature rather than just dropping and adding documents with the same unique id. There is such a patch: https://issues.apache.org/jira/browse/SOLR-139 I'm experimenting with it right now and it works well for my cases. However, it is still under the covers a delete/add and One use case is that we would like to use the index as our one database for documents, and if we delete a document we want it to stay deleted. Thus we would mark it deleted and check for its existence. Another use case is that we are re-adding the same document a few times a day, and the commit times are ballooning. ...you still have to commit for changes to be visible. Erik
Re: Replacing existing documents in the index
It sounds like it might be more efficient to implement this at the crawler level to short-circuit crawling whole sites. Baring that, a separate database sounds more flexible. Non-deletable docs doesn't sound like something that should be a general feature. However, one would probably be able to implement custom logic to do this using an update-processor plugin (should be in the next version of Solr) -Yonik On 8/16/07, Lance Norskog [EMAIL PROTECTED] wrote: Hi- We recrawl the same places and update blindly without checking if a document is already in the index. We have a use case where we would like to delete documents (porn) and have them stay deleted. To implement this use case now, we would need to check the existence of the document and check for a 'deleted' flag. Or, we would maintain a separate database of deleted documents that we check against. A more efficient way to do this would be to have a 'do not delete' flag in the document. Delete failures are currently ignored and they would continue to be ignored. Is this a worthwhile addition to 1.3 or 1.4? Thanks for your time, Lance