Solr's updateRequestHandler does not have a fast way of guaranteeing document 
delivery
--------------------------------------------------------------------------------------

                 Key: SOLR-1924
                 URL: https://issues.apache.org/jira/browse/SOLR-1924
             Project: Solr
          Issue Type: Bug
    Affects Versions: 1.4
            Reporter: Karl Wright


It is currently not possible, without performing a commit on every document, to 
use updateRequestHandler to guarantee delivery into the index of any document.  
The reason is that whenever Solr is restarted, some or all documents that have 
not been committed yet are dropped on the floor, and there is no way for a 
client of updateRequestHandler to know which ones this happened to.

I believe it is not even possible to write a middleware-style layer that stores 
documents and performs periodic commits on its own, because the update request 
handler never ACKs individual documents on a commit, but merely everything it 
has seen since the last time Solr bounced.  So you have this potential scenario:

- middleware layer receives document 1, saves it
- middleware layer receives document 2, saves it
Now it's time for the commit, so:
- middleware layer sends document 1 to updateRequestHandler
- solr is restarted, dropping all uncommitted documents on the floor
- middleware layer sends document 2 to updateRequestHandler
- middleware layer sends COMMIT to updateRequestHandler, but solr adds only 
document 2 to the index
- middleware believes incorrectly that it has successfully committed both 
documents

An ideal solution would be for Solr to separate the semantics of commit (the 
index building variety) from the semantics of commit (the 'I got the document' 
variety).  Perhaps this will involve a persistent document queue that will 
persist over a Solr restart.

An alternative mechanism might be for updateRequestHandler to acknowledge 
specifically committed documents in its response to an explicit commit.  But 
this would make it difficult or impossible to use autocommit usefully in such 
situations.  The only other alternative is to require clients that need 
guaranteed delivery to commit on every document, with a considerable 
performance penalty.

This ticket is related to LCF in that LCF is one of the clients that really 
needs some kind of guaranteed delivery mechanism.



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to