Re: Replicationhandler with TLOG replicas
In that case, it seems to be a reporting issue, not a fundamental replication problem. Whew! So it's worth raising a JIRA, but since you report the indexes are identical, I'm not sure how high a priority it would be. If you do raise a JIRA, you should reference this discussion. Best, Erick On Mon, Nov 12, 2018 at 3:00 AM Vadim Ivanov wrote: > > Hi, Erick > I have about 1300 cores in my test environment for 159 colllections > Today I have wrote a script to check all of them. > For 138 out of 1300 cores "generation" and "indexversion" information > returned by mbeans and replicationhandler do not match. > Most of the replicas has more than 1 gap in generation (fro ex. 14 - returned > by mbeans. 6 - returned by RH) (so it's not indexing for sure) > None of these 138 replicas are leader of corresponding shards. > All of these 138 replicas when queried with =false returned > absolutely the same documents as their leaders. > I've checked some replicas for segments - yes they have the same segments as > their leaders with absolutely same sizes in bytes. > > It seems to me this issue does not affect indexing or searching... it's just > curious misread of some information I faced. > > My autocommit is: > > >${solr.autoCommit.maxTime:6} >false > > > >${solr.autoSoftCommit.maxTime:30} > > > -- > BR, Vadim > > > > -Original Message- > > From: Erick Erickson [mailto:erickerick...@gmail.com] > > Sent: Sunday, November 11, 2018 9:51 PM > > To: solr-user > > Subject: Re: Replicationhandler with TLOG replicas > > > > Vadim: > > > > The next time you see this, is it possible to check that the replicas > > showing different index versions have the same documents? Actually, it > > should be sufficient to verify that they have the same segments in > > their data/index directory, and they should match the segments on the > > leader _assuming_ you're not actively indexing and you stopped > > indexing more than the polling interval ago. > > > > If you are actively indexing, it should be sufficient to check that > > the questionable replica's index files are changing over time, that > > would mean that replication is happening. > > > > And what's your commit interval? The polling interval on the followers is: > > 1> 1/2 the hard commit interval if defined to be > -1. If not > > 2> 1/2 the soft commit interval if defined to be > -1. If not > > 3> 3000ms > > > > There are two possibilities here as I see it. > > 1> this is just a reporting error, which we should still address but > > doesn't worry me much. > > 2> the TLOG/PULL replication process has some bug and the indexes are, > > indeed different > > 2a> when you reloaded the collection, it's possible that the startup > > progress kicked off a replication > >and if there's really a bug reloading just masked it. > > > > Best, > > Erick > > On Sun, Nov 11, 2018 at 2:34 AM Vadim Ivanov > > wrote: > > > > > > Reload collection helps ! > > > After reloading collection generation and indexversion returned by > > Replicationhandler catch up with the leader > > > > > > > > > > -Original Message- > > > > From: Vadim Ivanov [mailto:vadim.iva...@spb.ntk-intourist.ru] > > > > Sent: Sunday, November 11, 2018 1:09 PM > > > > To: solr-user@lucene.apache.org > > > > Subject: RE: Replicationhandler with TLOG replicas > > > > > > > > Thanks, Shawn > > > > I have anticipated the answer about information returned by > > > > ReplicationHandler. > > > > What baffled me is that usually on most of replicas indexversion and > > generation > > > > returned by ReplicationHandler is right and it increases with commits. > > > > But on some replicas it's not - it stops changing at some moment in the > > > > past > > > > forever. > > > > For example, I have 5 TLOG replicas: > > > > For leader(and all good 3 replicas) > > > > http://host_n:8983/solr/core_n/replication?command=indexversion > > returnes > > > > { > > > > "responseHeader":{ > > > > "status":0, > > > > "QTime":0}, > > > > "indexversion":1541885907200, > > > > "generation":1704} > > > > > > > > But for one replica: > > > > { > > > > "responseHeader":{
RE: Replicationhandler with TLOG replicas
Hi, Erick I have about 1300 cores in my test environment for 159 colllections Today I have wrote a script to check all of them. For 138 out of 1300 cores "generation" and "indexversion" information returned by mbeans and replicationhandler do not match. Most of the replicas has more than 1 gap in generation (fro ex. 14 - returned by mbeans. 6 - returned by RH) (so it's not indexing for sure) None of these 138 replicas are leader of corresponding shards. All of these 138 replicas when queried with =false returned absolutely the same documents as their leaders. I've checked some replicas for segments - yes they have the same segments as their leaders with absolutely same sizes in bytes. It seems to me this issue does not affect indexing or searching... it's just curious misread of some information I faced. My autocommit is: ${solr.autoCommit.maxTime:6} false ${solr.autoSoftCommit.maxTime:30} -- BR, Vadim > -Original Message- > From: Erick Erickson [mailto:erickerick...@gmail.com] > Sent: Sunday, November 11, 2018 9:51 PM > To: solr-user > Subject: Re: Replicationhandler with TLOG replicas > > Vadim: > > The next time you see this, is it possible to check that the replicas > showing different index versions have the same documents? Actually, it > should be sufficient to verify that they have the same segments in > their data/index directory, and they should match the segments on the > leader _assuming_ you're not actively indexing and you stopped > indexing more than the polling interval ago. > > If you are actively indexing, it should be sufficient to check that > the questionable replica's index files are changing over time, that > would mean that replication is happening. > > And what's your commit interval? The polling interval on the followers is: > 1> 1/2 the hard commit interval if defined to be > -1. If not > 2> 1/2 the soft commit interval if defined to be > -1. If not > 3> 3000ms > > There are two possibilities here as I see it. > 1> this is just a reporting error, which we should still address but > doesn't worry me much. > 2> the TLOG/PULL replication process has some bug and the indexes are, > indeed different > 2a> when you reloaded the collection, it's possible that the startup > progress kicked off a replication >and if there's really a bug reloading just masked it. > > Best, > Erick > On Sun, Nov 11, 2018 at 2:34 AM Vadim Ivanov > wrote: > > > > Reload collection helps ! > > After reloading collection generation and indexversion returned by > Replicationhandler catch up with the leader > > > > > > > -Original Message- > > > From: Vadim Ivanov [mailto:vadim.iva...@spb.ntk-intourist.ru] > > > Sent: Sunday, November 11, 2018 1:09 PM > > > To: solr-user@lucene.apache.org > > > Subject: RE: Replicationhandler with TLOG replicas > > > > > > Thanks, Shawn > > > I have anticipated the answer about information returned by > > > ReplicationHandler. > > > What baffled me is that usually on most of replicas indexversion and > generation > > > returned by ReplicationHandler is right and it increases with commits. > > > But on some replicas it's not - it stops changing at some moment in the > > > past > > > forever. > > > For example, I have 5 TLOG replicas: > > > For leader(and all good 3 replicas) > > > http://host_n:8983/solr/core_n/replication?command=indexversion > returnes > > > { > > > "responseHeader":{ > > > "status":0, > > > "QTime":0}, > > > "indexversion":1541885907200, > > > "generation":1704} > > > > > > But for one replica: > > > { > > > "responseHeader":{ > > > "status":0, > > > "QTime":0}, > > > "indexversion":1540842454653, > > > "generation":1216} > > > > > > Could it be sign of some hidden issue? Where that information stored and > why > > > it stops changing at some moment? > > > No indexing is going on of that collection at the moment of request. I'm > > > "deltaimporting" that collection ones per hour and only if needed. > > > So usually there is only 5-10 commits per day. > > > It's not a crucial issue for my use case as I have adequate information of > > > indexversion > > > and generation returned by mbeans, just curious of that strange behavior. > > > > > > > -O
Re: Replicationhandler with TLOG replicas
Vadim: The next time you see this, is it possible to check that the replicas showing different index versions have the same documents? Actually, it should be sufficient to verify that they have the same segments in their data/index directory, and they should match the segments on the leader _assuming_ you're not actively indexing and you stopped indexing more than the polling interval ago. If you are actively indexing, it should be sufficient to check that the questionable replica's index files are changing over time, that would mean that replication is happening. And what's your commit interval? The polling interval on the followers is: 1> 1/2 the hard commit interval if defined to be > -1. If not 2> 1/2 the soft commit interval if defined to be > -1. If not 3> 3000ms There are two possibilities here as I see it. 1> this is just a reporting error, which we should still address but doesn't worry me much. 2> the TLOG/PULL replication process has some bug and the indexes are, indeed different 2a> when you reloaded the collection, it's possible that the startup progress kicked off a replication and if there's really a bug reloading just masked it. Best, Erick On Sun, Nov 11, 2018 at 2:34 AM Vadim Ivanov wrote: > > Reload collection helps ! > After reloading collection generation and indexversion returned by > Replicationhandler catch up with the leader > > > > -Original Message- > > From: Vadim Ivanov [mailto:vadim.iva...@spb.ntk-intourist.ru] > > Sent: Sunday, November 11, 2018 1:09 PM > > To: solr-user@lucene.apache.org > > Subject: RE: Replicationhandler with TLOG replicas > > > > Thanks, Shawn > > I have anticipated the answer about information returned by > > ReplicationHandler. > > What baffled me is that usually on most of replicas indexversion and > > generation > > returned by ReplicationHandler is right and it increases with commits. > > But on some replicas it's not - it stops changing at some moment in the past > > forever. > > For example, I have 5 TLOG replicas: > > For leader(and all good 3 replicas) > > http://host_n:8983/solr/core_n/replication?command=indexversion returnes > > { > > "responseHeader":{ > > "status":0, > > "QTime":0}, > > "indexversion":1541885907200, > > "generation":1704} > > > > But for one replica: > > { > > "responseHeader":{ > > "status":0, > > "QTime":0}, > > "indexversion":1540842454653, > > "generation":1216} > > > > Could it be sign of some hidden issue? Where that information stored and why > > it stops changing at some moment? > > No indexing is going on of that collection at the moment of request. I'm > > "deltaimporting" that collection ones per hour and only if needed. > > So usually there is only 5-10 commits per day. > > It's not a crucial issue for my use case as I have adequate information of > > indexversion > > and generation returned by mbeans, just curious of that strange behavior. > > > > > -Original Message- > > > From: Shawn Heisey [mailto:apa...@elyograg.org] > > > Sent: Saturday, November 10, 2018 6:46 PM > > > To: solr-user@lucene.apache.org > > > Subject: Re: Replicationhandler with TLOG replicas > > > > > > On 11/10/2018 8:05 AM, Vadim Ivanov wrote: > > > > Seems, the latter gets some wrong information as indexversion and > > > generation > > > > is far behind then leader. > > > > But core index seems up to date and healthy. > > > > Why such things could happen on some replicas? (Most of the replicas > > > retuned > > > > the same information by both commands) > > > > Is information returned by Replicationhandler not applicable to > > > > tlog/pull > > > > replicas and is not reliable ? > > > > > > SolrCloud does not use the replication handler in the same way that > > > master/slave replication does. It "manually" initiates any replication > > > that takes place -- the replication handler is not in charge. You > > > cannot be sure that the indexes the replication handler thinks are > > > master and slave are in fact the indexes that will be replicated next. > > > Just ignore anything that the replication handler tells you. It may > > > have absolutely no bearing on what's happening. > > > > > > Was indexing happening when you looked, or was it entirely stopped? If > > > indexing is ongoing, you may have seen the difference in the index > > > versions in between data being indexed on the leader and the time that > > > the replication is initiated. > > > > > > Thanks, > > > Shawn >
RE: Replicationhandler with TLOG replicas
Reload collection helps ! After reloading collection generation and indexversion returned by Replicationhandler catch up with the leader > -Original Message- > From: Vadim Ivanov [mailto:vadim.iva...@spb.ntk-intourist.ru] > Sent: Sunday, November 11, 2018 1:09 PM > To: solr-user@lucene.apache.org > Subject: RE: Replicationhandler with TLOG replicas > > Thanks, Shawn > I have anticipated the answer about information returned by > ReplicationHandler. > What baffled me is that usually on most of replicas indexversion and > generation > returned by ReplicationHandler is right and it increases with commits. > But on some replicas it's not - it stops changing at some moment in the past > forever. > For example, I have 5 TLOG replicas: > For leader(and all good 3 replicas) > http://host_n:8983/solr/core_n/replication?command=indexversion returnes > { > "responseHeader":{ > "status":0, > "QTime":0}, > "indexversion":1541885907200, > "generation":1704} > > But for one replica: > { > "responseHeader":{ > "status":0, > "QTime":0}, > "indexversion":1540842454653, > "generation":1216} > > Could it be sign of some hidden issue? Where that information stored and why > it stops changing at some moment? > No indexing is going on of that collection at the moment of request. I'm > "deltaimporting" that collection ones per hour and only if needed. > So usually there is only 5-10 commits per day. > It's not a crucial issue for my use case as I have adequate information of > indexversion > and generation returned by mbeans, just curious of that strange behavior. > > > -Original Message- > > From: Shawn Heisey [mailto:apa...@elyograg.org] > > Sent: Saturday, November 10, 2018 6:46 PM > > To: solr-user@lucene.apache.org > > Subject: Re: Replicationhandler with TLOG replicas > > > > On 11/10/2018 8:05 AM, Vadim Ivanov wrote: > > > Seems, the latter gets some wrong information as indexversion and > > generation > > > is far behind then leader. > > > But core index seems up to date and healthy. > > > Why such things could happen on some replicas? (Most of the replicas > > retuned > > > the same information by both commands) > > > Is information returned by Replicationhandler not applicable to > > > tlog/pull > > > replicas and is not reliable ? > > > > SolrCloud does not use the replication handler in the same way that > > master/slave replication does. It "manually" initiates any replication > > that takes place -- the replication handler is not in charge. You > > cannot be sure that the indexes the replication handler thinks are > > master and slave are in fact the indexes that will be replicated next. > > Just ignore anything that the replication handler tells you. It may > > have absolutely no bearing on what's happening. > > > > Was indexing happening when you looked, or was it entirely stopped? If > > indexing is ongoing, you may have seen the difference in the index > > versions in between data being indexed on the leader and the time that > > the replication is initiated. > > > > Thanks, > > Shawn
RE: Replicationhandler with TLOG replicas
Thanks, Shawn I have anticipated the answer about information returned by ReplicationHandler. What baffled me is that usually on most of replicas indexversion and generation returned by ReplicationHandler is right and it increases with commits. But on some replicas it's not - it stops changing at some moment in the past forever. For example, I have 5 TLOG replicas: For leader(and all good 3 replicas) http://host_n:8983/solr/core_n/replication?command=indexversion returnes { "responseHeader":{ "status":0, "QTime":0}, "indexversion":1541885907200, "generation":1704} But for one replica: { "responseHeader":{ "status":0, "QTime":0}, "indexversion":1540842454653, "generation":1216} Could it be sign of some hidden issue? Where that information stored and why it stops changing at some moment? No indexing is going on of that collection at the moment of request. I'm "deltaimporting" that collection ones per hour and only if needed. So usually there is only 5-10 commits per day. It's not a crucial issue for my use case as I have adequate information of indexversion and generation returned by mbeans, just curious of that strange behavior. > -Original Message- > From: Shawn Heisey [mailto:apa...@elyograg.org] > Sent: Saturday, November 10, 2018 6:46 PM > To: solr-user@lucene.apache.org > Subject: Re: Replicationhandler with TLOG replicas > > On 11/10/2018 8:05 AM, Vadim Ivanov wrote: > > Seems, the latter gets some wrong information as indexversion and > generation > > is far behind then leader. > > But core index seems up to date and healthy. > > Why such things could happen on some replicas? (Most of the replicas > retuned > > the same information by both commands) > > Is information returned by Replicationhandler not applicable to tlog/pull > > replicas and is not reliable ? > > SolrCloud does not use the replication handler in the same way that > master/slave replication does. It "manually" initiates any replication > that takes place -- the replication handler is not in charge. You > cannot be sure that the indexes the replication handler thinks are > master and slave are in fact the indexes that will be replicated next. > Just ignore anything that the replication handler tells you. It may > have absolutely no bearing on what's happening. > > Was indexing happening when you looked, or was it entirely stopped? If > indexing is ongoing, you may have seen the difference in the index > versions in between data being indexed on the leader and the time that > the replication is initiated. > > Thanks, > Shawn
Re: Replicationhandler with TLOG replicas
On 11/10/2018 8:05 AM, Vadim Ivanov wrote: Seems, the latter gets some wrong information as indexversion and generation is far behind then leader. But core index seems up to date and healthy. Why such things could happen on some replicas? (Most of the replicas retuned the same information by both commands) Is information returned by Replicationhandler not applicable to tlog/pull replicas and is not reliable ? SolrCloud does not use the replication handler in the same way that master/slave replication does. It "manually" initiates any replication that takes place -- the replication handler is not in charge. You cannot be sure that the indexes the replication handler thinks are master and slave are in fact the indexes that will be replicated next. Just ignore anything that the replication handler tells you. It may have absolutely no bearing on what's happening. Was indexing happening when you looked, or was it entirely stopped? If indexing is ongoing, you may have seen the difference in the index versions in between data being indexed on the leader and the time that the replication is initiated. Thanks, Shawn
Replicationhandler with TLOG replicas
Hello! I have SolrCloud 7.5 with TLOG replicas. I have noticed that information about replication state of replicas differs when received from ...core/admin/mbeans?stats=true=replication=true=REPLICATION And ...core/replication?command=indexversion Seems, the latter gets some wrong information as indexversion and generation is far behind then leader. But core index seems up to date and healthy. Why such things could happen on some replicas? (Most of the replicas retuned the same information by both commands) Is information returned by Replicationhandler not applicable to tlog/pull replicas and is not reliable ? -- Vadim