Re: Commit after every document - alternate approach
On 3/3/2016 11:36 PM, sangs8788 wrote: > When a commit fails, the document doesnt get cleared out from MQ and there is > a task which runs in a background to republish the files to SOLR. If we do a > batch commit we will not know we will end up redoing the same batch commit > again. We currenlty have a client side commit which issue the command to > SOLR. commit() returns a status code. If we are planning to use > commitwithin(), I dont think it will actually return any result from solr > since it is time oriented. Do your indexing and commits in batches, as already recommended. I'd start with 1000 and go up or down from there as needed. If the batch indexing fails, or the commit fails, consider the entire batch failed. That may not be the end of the world, though -- if the indexing was successful (and didn't use ConcurrentUpdateSolrClient), then those updates will be stored in the Solr transaction log, and will be replayed if Solr is restarted or the core is reloaded. If you want to be absolutely certain that the update/commit succeeded by verifying data, one thing you *could* do is send a batch update, do a commit, and then request every document in the batch with a query that includes a limited fl parameter, and verify that the document is present and the values of the fields requested in the fl parameter are correct. I would probably do that query with {!cache=false} to avoid polluting Solr's caches. Almost every index update you make can be simply made again without danger. The exceptions are certain kinds of atomic updates, and certain situations with deletes. It's probably best to avoid doing those kinds of updates, which are described below: If you're doing atomic updates that increment or decrement a field value, or atomic updates that add a new value to a multivalued field, the results will be wrong if that update is repeated, although they would be correct if the update is replayed from Solr's transaction log, because atomic updates are no longer atomic when they hit the transaction log -- they include values for every field in the document, as if the document were built from scratch. If you are explicitly deleting a document before replacing it, and those actions were re-done in the opposite order, then the document would be missing from the index. Because Solr handles the deletion automatically when a document is being updated/replaced, explicit deleting is not recommended for those situations. > If we go with SOLR autocommit, is there a way to send a response to MQ > saying commit successful ? If commits are completely automatic (autoCommit, autoSoftCommit, or commitWithin), there's no way for a program to be sure that they have completed. The general recommendation for Solr indexing, especially if your pipeline is multi-threaded, is to simply send your updates, let Solr handle commits, and rely on the design of Lucene combined with Solr's transaction logs to keep your data safe. This approach does mean that when things go wrong it may be a while before new data is searchable. Emir's reply is spot on. Solr is not recommended as a primary data store. Thanks, Shawn
Re: Commit after every document - alternate approach
Hi Sangeetha, It seems to me that you are using Solr as primary data store? If that is true, you should not do that - you should have some other store that is transactional and can support what you are trying to do with Solr. If you are not using Solr as primary store, and it is critical to have Solr in sync, you can run periodical (about same frequency as Solr commits) checks that will ensure the latest data reached Solr. Regards, Emir On 04.03.2016 05:46, sangs8788 wrote: Hi Emir, Right now we are having only inserts into SOLR. The main reason for having commit after each document is to get a guarantee that the document has got indexed in solr. Until the commit status is received back the document will not be deleted from MQ. So that even if there is a commit failure the document can be resent from MQ. Thanks Sangeetha -- View this message in context: http://lucene.472066.n3.nabble.com/Commit-after-every-document-alternate-approach-tp4260946p4261575.html Sent from the Solr - User mailing list archive at Nabble.com. -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr & Elasticsearch Support * http://sematext.com/
Re: Commit after every document - alternate approach
When a commit fails, the document doesnt get cleared out from MQ and there is a task which runs in a background to republish the files to SOLR. If we do a batch commit we will not know we will end up redoing the same batch commit again. We currenlty have a client side commit which issue the command to SOLR. commit() returns a status code. If we are planning to use commitwithin(), I dont think it will actually return any result from solr since it is time oriented. If we go with SOLR autocommit, is there a way to send a response to MQ saying commit successful ? Thanks Sangeetha -- View this message in context: http://lucene.472066.n3.nabble.com/Commit-after-every-document-alternate-approach-tp4260946p4261587.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Commit after every document - alternate approach
So batch them. You get a response back from Solr whether the document was accepted. If that fail, there is a failure. What do you do then? After every 100 docs or one minute, do a commit. Then delete the documents from the input queue. What do you do when the commit fails? wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Mar 3, 2016, at 8:46 PM, sangs8788 > wrote: > > Hi Emir, > > Right now we are having only inserts into SOLR. The main reason for having > commit after each document is to get a guarantee that the document has got > indexed in solr. Until the commit status is received back the document will > not be deleted from MQ. So that even if there is a commit failure the > document can be resent from MQ. > > Thanks > Sangeetha > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Commit-after-every-document-alternate-approach-tp4260946p4261575.html > Sent from the Solr - User mailing list archive at Nabble.com.
Re: Commit after every document - alternate approach
If you need transactions, you should use a different system, like MarkLogic. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Mar 3, 2016, at 8:46 PM, sangs8788 > wrote: > > Hi Emir, > > Right now we are having only inserts into SOLR. The main reason for having > commit after each document is to get a guarantee that the document has got > indexed in solr. Until the commit status is received back the document will > not be deleted from MQ. So that even if there is a commit failure the > document can be resent from MQ. > > Thanks > Sangeetha > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Commit-after-every-document-alternate-approach-tp4260946p4261575.html > Sent from the Solr - User mailing list archive at Nabble.com.
Re: Commit after every document - alternate approach
Hi Varun, We dont have SOLR Cloud setup in our system. We have Master-Slave architecture setup. In that case i dont see a way where SOLR can guarantee whether a document got indexed/commited successfully or not. Even thought about having a flag setup in db for whichever documents commited to SOLR. But that also not feasible because it again requires a return status from SOLR. The other option is to run a dataimport periodically to verify if all the documents got indexed. Is there anyother option which i have missed out. Please let me know. Thanks Sangeetha -- View this message in context: http://lucene.472066.n3.nabble.com/Commit-after-every-document-alternate-approach-tp4260946p4261576.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Commit after every document - alternate approach
Hi Emir, Right now we are having only inserts into SOLR. The main reason for having commit after each document is to get a guarantee that the document has got indexed in solr. Until the commit status is received back the document will not be deleted from MQ. So that even if there is a commit failure the document can be resent from MQ. Thanks Sangeetha -- View this message in context: http://lucene.472066.n3.nabble.com/Commit-after-every-document-alternate-approach-tp4260946p4261575.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Commit after every document - alternate approach
Hi Sangeetha, Well I don't think you need to commit after every document add. You can rely on Solr's transaction log feature . If you are using SolrCloud it's mandatory to have a transaction log . So every documents get written to the tlog . Now say a node crashes even if documents were not committed , since it's present in the tlog Solr will replay then on startup. Also if you are using SolrCloud and have multiple replicas , you should use the min_rf feature to make sure that N replicas acknowledge the write before you get back success - https://cwiki.apache.org/confluence/display/solr/Read+and+Write+Side+Fault+Tolerance On Wed, Mar 2, 2016 at 3:41 PM, Emir Arnautovic < emir.arnauto...@sematext.com> wrote: > Hi Sangeetha, > What is sure is that it is not going to work - with 200-300K doc/hour, > there will be >50 commits/second, meaning there are <20ms time for > doc+commit. > You can do is let Solr handle commits and maybe use real time get to > verify doc is in Solr or do some periodic sanity checks. > Are you doing document updates so in order Solr updates are reason why you > commit each doc before moving to next doc? > > Regards, > Emir > > > On 02.03.2016 09:06, sangeetha.subraman...@gtnexus.com wrote: > >> Hi All, >> >> I am trying to understand on how we can have commit issued to solr while >> indexing documents. Around 200K to 300K document/per hour with an avg size >> of 10 KB size each will be getting into SOLR . JAVA code fetches the >> document from MQ and streamlines it to SOLR. The problem is the client code >> issues hard-commit after each document which is sent to SOLR for indexing >> and it waits for the response from SOLR to get assurance whether the >> document got indexed successfully. Only if it gets a OK status from SOLR >> the document is cleared out from SOLR. >> >> As far as I understand doing a commit after each document is an expensive >> operation. But we need to make sure that all the documents which are put >> into MQ gets indexed in SOLR. Is there any other way of getting this done ? >> Please let me know. >> If we do a batch indexing, is there any chances we can identify if some >> documents is missed from indexing ? >> >> Thanks >> Sangeetha >> >> > -- > Monitoring * Alerting * Anomaly Detection * Centralized Log Management > Solr & Elasticsearch Support * http://sematext.com/ > > -- Regards, Varun Thacker
Re: Commit after every document - alternate approach
Hi Sangeetha, What is sure is that it is not going to work - with 200-300K doc/hour, there will be >50 commits/second, meaning there are <20ms time for doc+commit. You can do is let Solr handle commits and maybe use real time get to verify doc is in Solr or do some periodic sanity checks. Are you doing document updates so in order Solr updates are reason why you commit each doc before moving to next doc? Regards, Emir On 02.03.2016 09:06, sangeetha.subraman...@gtnexus.com wrote: Hi All, I am trying to understand on how we can have commit issued to solr while indexing documents. Around 200K to 300K document/per hour with an avg size of 10 KB size each will be getting into SOLR . JAVA code fetches the document from MQ and streamlines it to SOLR. The problem is the client code issues hard-commit after each document which is sent to SOLR for indexing and it waits for the response from SOLR to get assurance whether the document got indexed successfully. Only if it gets a OK status from SOLR the document is cleared out from SOLR. As far as I understand doing a commit after each document is an expensive operation. But we need to make sure that all the documents which are put into MQ gets indexed in SOLR. Is there any other way of getting this done ? Please let me know. If we do a batch indexing, is there any chances we can identify if some documents is missed from indexing ? Thanks Sangeetha -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr & Elasticsearch Support * http://sematext.com/
Commit after every document - alternate approach
Hi All, I am trying to understand on how we can have commit issued to solr while indexing documents. Around 200K to 300K document/per hour with an avg size of 10 KB size each will be getting into SOLR . JAVA code fetches the document from MQ and streamlines it to SOLR. The problem is the client code issues hard-commit after each document which is sent to SOLR for indexing and it waits for the response from SOLR to get assurance whether the document got indexed successfully. Only if it gets a OK status from SOLR the document is cleared out from SOLR. As far as I understand doing a commit after each document is an expensive operation. But we need to make sure that all the documents which are put into MQ gets indexed in SOLR. Is there any other way of getting this done ? Please let me know. If we do a batch indexing, is there any chances we can identify if some documents is missed from indexing ? Thanks Sangeetha