RE: Replacing existing documents

2007-08-22 Thread Ard Schrijvers
Hello,

Recently someone mentioned that it would be possible to have a 'replace
existing document' feature rather than just dropping and adding documents
with the same unique id.

AFAIK, this is not possible. You have the update in lucene, but internally it 
just does a delete/add operation

We have a few use cases in this area and I'm
researching whether it is effective to check for a document via Solr
queries, or whether it is worthwhile to add this to the Solr implementation.

What are the usecases?? I do not see what you mean.

Does anyone have an estimate for the difference between querying, day, 100
documents by unique ID from the network v.s. fetching them directly from the
index?

Depends of course from the networkfetching them from the index is fast 
normally.
 
One use case is that we would like to use the index as our one database for
documents, and if we delete a document we want it to stay deleted. Thus we
would mark it deleted and check for its existence.

I suppose you mark it deleted by setting some flag (like lucene Field: 
isDeleted set to true). I am not sure wether using the lucene index as your 
database is really smart...i might get corrupt. I would at least suggest to 
backup it frequently

Regards Ard

ps sry for my annoying .. because i am using a web mail client

Another use case is that we are re-adding the same document a few times a day, 
and the commit times
are ballooning.

 
Where would I implement this?
 
Thanks,
 
Lance





Re: Replacing existing documents

2007-08-22 Thread Erik Hatcher


On Aug 21, 2007, at 9:25 PM, Lance Norskog wrote:

Recently someone mentioned that it would be possible to have a  
'replace
existing document' feature rather than just dropping and adding  
documents

with the same unique id.


There is such a patch: https://issues.apache.org/jira/browse/SOLR-139

I'm experimenting with it right now and it works well for my cases.

However, it is still under the covers a delete/add and

One use case is that we would like to use the index as our one  
database for
documents, and if we delete a document we want it to stay deleted.  
Thus we
would mark it deleted and check for its existence. Another use case  
is that
we are re-adding the same document a few times a day, and the  
commit times

are ballooning.


...you still have to commit for changes to be visible.

Erik



Re: Replacing existing documents in the index

2007-08-16 Thread Yonik Seeley
It sounds like it might be more efficient to implement this at the
crawler level to short-circuit crawling whole sites.  Baring that, a
separate database sounds more flexible.
Non-deletable docs doesn't sound like something that should be a
general feature.

However, one would probably be able to implement custom logic to do
this using an update-processor plugin (should be in the next version
of Solr)

-Yonik

On 8/16/07, Lance Norskog [EMAIL PROTECTED] wrote:
 Hi-

 We recrawl the same places and update blindly without checking if a document
 is already in the index.   We have a use case where we would like to delete
 documents (porn) and have them stay deleted. To implement this use case now,
 we would need to check the existence of the document and check for a
 'deleted' flag. Or, we would maintain a separate database of deleted
 documents that we check against.

 A more efficient way to do this would be to have a 'do not delete' flag in
 the document. Delete failures are currently ignored and they would continue
 to be ignored.

 Is this a worthwhile addition to 1.3 or 1.4?

 Thanks for your time,

 Lance