I'm just catching up on reading solr emails, so forgive me for being late to this dance....
I've just gone through a project to enable CDCR on our Solr, and I also experienced a small period of time where the commits on the source server just seemed to stop. This was during a period of intense experimentation where I was mucking around with configurations, turning CDCR on/off, etc. At some point the commits stopped occurring, and it drove me nuts for a couple of days - tried everything - restarting Solr, reloading, turned buffering on, turned buffering off, etc. I finally threw up my hands and rebooted the server out of desperation (it was a physical Linux box). Commits worked fine after that. I don't know what caused the commits to stop, and why re-booting (and not just restarting Solr) caused them to work fine. Wondering if you ever found a solution to your situation? On Fri, Feb 16, 2018 at 2:44 PM, Webster Homer <webster.ho...@sial.com> wrote: > I meant to get back to this sooner. > > When I say I issued a commit I do issue it as collection/update?commit=true > > The soft commit interval is set to 3000, but I don't have a problem with > soft commits ( I think). I was responding > > I am concerned that some hard commits don't seem to happen, but I think > many commits do occur. I'd like suggestions on how to diagnose this, and > perhaps an idea of where to look. Typically I believe that issues like this > are from our configuration. > > Our indexing job is pretty simple, we send blocks of JSON to > <collection>/update/json. We have either re-index the whole collection, or > just apply updates. Typically we reindex the data once a week and delete > any records that are older than the last full index. This does lead to a > fair number of deleted records in the index especially if commits fail. > Most of our collections are not large between 2 and 3 million records. > > The collections are hosted in google cloud > > On Mon, Feb 12, 2018 at 5:00 PM, Erick Erickson <erickerick...@gmail.com> > wrote: > > > bq: But if 3 seconds is aggressive what would be a good value for soft > > commit? > > > > The usual answer is "as long as you can stand". All top-level caches are > > invalidated, autowarming is done etc. on each soft commit. That can be a > > lot of > > work and if your users are comfortable with docs not showing up for, > > say, 10 minutes > > then use 10 minutes. As always "it depends" here, the point is not to > > do unnecessary > > work if possible. > > > > bq: If a commit doesn't happen how would there ever be an index merge > > that would remove the deleted documents. > > > > Right, it wouldn't. It's a little more subtle than that though. > > Segments on various > > replicas will contain different docs, thus the term/doc statistics can be > > a bit > > different between multiple replicas. None of the stats will change > > until the commit > > though. You might try turning no distributed doc/term stats though. > > > > Your comments about PULL or TLOG replicas are well taken. However, even > > those > > won't be absolutely in sync since they'll replicate from the master at > > slightly > > different times and _could_ get slightly different segments _if_ > > there's indexing > > going on. But let's say you stop indexing. After the next poll > > interval all the replicas > > will have identical characteristics and will score the docs the same. > > > > I don't have any signifiant wisdom to offer here, except this is really > the > > first time I've heard of this behavior. About all I can imagine is > > that _somehow_ > > the soft commit interval is -1. When you say you "issue a commit" I'm > > assuming > > it's via ....collection/update?commit=true or some such which issues a > > hard > > commit with openSearcher=true. And it's on a _collection_ basis, right? > > > > Sorry I can't be more help > > Erick > > > > > > > > > > On Mon, Feb 12, 2018 at 10:44 AM, Webster Homer <webster.ho...@sial.com> > > wrote: > > > Erick, I am aware of the CDCR buffering problem causing tlog retention, > > we > > > always turn buffering off in our cdcr configurations. > > > > > > My post was precipitated by seeing that we had uncommitted data in > > > collections > 24 hours after it was loaded. The collections I was > looking > > > at are in our development environment, where we do not use CDCR. > However > > > I'm pretty sure that I've seen situations in production where commits > > were > > > also long overdue. > > > > > > the "autoSoftcommit" was a typo. The soft commit logic seems to be > fine, > > I > > > don't see an issue with data visibility. But if 3 seconds is aggressive > > > what would be a good value for soft commit? We have a couple of > > > collections that are updated every minute although most of them are > > updated > > > much less frequently. > > > > > > My reason for raising this commit issue is that we see problems with > the > > > relevancy of solrcloud searches, and the NRT replica type. Sometimes > the > > > results flip where the best hit varies by what replica serviced the > > search. > > > This is hard to explain to management. Doing an optimized does address > > the > > > problem for a while. I try to avoid optimizing for the reasons you and > > Sean > > > list. If a commit doesn't happen how would there ever be an index merge > > > that would remove the deleted documents. > > > > > > The problem with deletes and relevancy don't seem to occur when we use > > TLOG > > > replicas, probably because they don't do their own indexing but get > > copies > > > from their leader. We are testing them now eventually we may abandon > the > > > use of NRT replicas for most of our collections. > > > > > > I am quite concerned about this commit issue. What kinds of things > would > > > influence whether a commit occurs? One commonality for our systems is > > that > > > they are hosted in a Google cloud. We have a number of collections that > > > share configurations, but others that do not. I think commits do > happen, > > > but I don't trust that autoCommit is reliable. What can we do to make > it > > > reliable? > > > > > > Most of our collections are reindexed weekly with partial updates > applied > > > daily, that at least is what happens in production, our development > > clouds > > > are not as regular. > > > > > > Our solr startup script sets the following values: > > > -Dsolr.autoCommit.maxDocs=35000 > > > -Dsolr.autoCommit.maxTime=60000 > > > -Dsolr.autoSoftCommit.maxTime=3000 > > > > > > I don't think we reference solr.autoCommit.maxDocs in our > solrconfig.xml > > > files. > > > > > > here are our settings for autoCommit and autoSoftCommit > > > > > > We had a lot of issues with missing commits when we didn't set > > > solr.autoCommit.maxTime > > > <autoCommit> > > > <maxTime>${solr.autoCommit.maxTime:60000}</maxTime> > > > <openSearcher>false</openSearcher> > > > </autoCommit> > > > > > > <autoSoftCommit> > > > <maxTime>${solr.autoSoftCommit.maxTime:5000}</maxTime> > > > </autoSoftCommit> > > > > > > > > > > > > On Fri, Feb 9, 2018 at 3:49 PM, Shawn Heisey <apa...@elyograg.org> > > wrote: > > > > > >> On 2/9/2018 9:29 AM, Webster Homer wrote: > > >> > > >>> A little more background. Our production Solrclouds are populated via > > >>> CDCR, > > >>> CDCR does not replicate commits, Commits to the target clouds happen > > via > > >>> autoCommit settings > > >>> > > >>> We see relvancy scores get inconsistent when there are too many > deletes > > >>> which seems to happen when hard commits don't happen. > > >>> > > >>> On Fri, Feb 9, 2018 at 10:25 AM, Webster Homer < > webster.ho...@sial.com > > > > > >>> wrote: > > >>> > > >>> I we do have autoSoftcommit set to 3 seconds. It is NOT the > visibility > > of > > >>>> the records that is my primary concern. I am concerned about is the > > >>>> accumulation of uncommitted tlog files and the larger number of > > deleted > > >>>> documents. > > >>>> > > >>> > > >> For the deleted documents: Have you ever done an optimize on the > > >> collection? If so, you're going to need to re-do the optimize > > regularly to > > >> keep deleted documents from growing out of control. See this issue > for > > a > > >> very technical discussion about it: > > >> > > >> https://issues.apache.org/jira/browse/LUCENE-7976 > > >> > > >> Deleted documents probably aren't really related to what we've been > > >> discussing. That shouldn't really be strongly affected by commit > > settings. > > >> > > >> ----- > > >> > > >> A 3 second autoSoftCommit is VERY aggressive. If your soft commits > are > > >> taking longer than 3 seconds to complete, which is often what happens, > > then > > >> that will lead to problems. I wouldn't expect it to cause the kinds > of > > >> problems you describe, though. It would manifest as Solr working too > > hard, > > >> logging warnings or errors, and changes taking too long to show up. > > >> > > >> Assuming that the config for autoSoftCommit doesn't have the typo that > > >> Erick mentioned. > > >> > > >> ---- > > >> > > >> I have never used CDCR, so I know very little about it. But I have > seen > > >> reports on this mailing list saying that transaction logs never get > > deleted > > >> when CDCR is configured. > > >> > > >> Below is a link to a mailing list discussion related to CDCR not > > deleting > > >> transaction logs. Looks like for it to work right a buffer needs to > be > > >> disabled, and there may also be problems caused by not having a > complete > > >> zkHost string in the CDCR config: > > >> > > >> http://lucene.472066.n3.nabble.com/CDCR-how-to-deal-with- > > >> the-transaction-log-files-td4345062.html > > >> > > >> Erick also mentioned this. > > >> > > >> Thanks, > > >> Shawn > > >> > > > > > > -- > > > > > > > > > This message and any attachment are confidential and may be privileged > or > > > otherwise protected from disclosure. If you are not the intended > > recipient, > > > you must not copy this message or attachment or disclose the contents > to > > > any other person. If you have received this transmission in error, > please > > > notify the sender immediately and delete the message and any attachment > > > from your system. Merck KGaA, Darmstadt, Germany and any of its > > > subsidiaries do not accept liability for any omissions or errors in > this > > > message which may arise as a result of E-Mail-transmission or for > damages > > > resulting from any unauthorized changes of the content of this message > > and > > > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its > > > subsidiaries do not guarantee that this message is free of viruses and > > does > > > not accept liability for any damages caused by any virus transmitted > > > therewith. > > > > > > Click http://www.emdgroup.com/disclaimer to access the German, French, > > > Spanish and Portuguese versions of this disclaimer. > > > > -- > > > This message and any attachment are confidential and may be privileged or > otherwise protected from disclosure. If you are not the intended recipient, > you must not copy this message or attachment or disclose the contents to > any other person. If you have received this transmission in error, please > notify the sender immediately and delete the message and any attachment > from your system. Merck KGaA, Darmstadt, Germany and any of its > subsidiaries do not accept liability for any omissions or errors in this > message which may arise as a result of E-Mail-transmission or for damages > resulting from any unauthorized changes of the content of this message and > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its > subsidiaries do not guarantee that this message is free of viruses and does > not accept liability for any damages caused by any virus transmitted > therewith. > > Click http://www.emdgroup.com/disclaimer to access the German, French, > Spanish and Portuguese versions of this disclaimer. >