Re: Replicationhandler with TLOG replicas

2018-11-12 Thread Erick Erickson
In that case, it seems to be a reporting issue, not a fundamental
replication problem. Whew!

So it's worth raising a JIRA, but since you report the indexes are
identical, I'm not sure how
high a priority it would be. If you do raise a JIRA, you should
reference this discussion.

Best,
Erick
On Mon, Nov 12, 2018 at 3:00 AM Vadim Ivanov
 wrote:
>
> Hi, Erick
> I have about 1300 cores in my test environment for 159 colllections
> Today I have wrote a script to check all of them.
> For 138 out of 1300 cores "generation" and "indexversion" information 
> returned by mbeans and replicationhandler do not match.
> Most of the replicas has more than 1 gap in generation (fro ex. 14 - returned 
> by mbeans. 6 - returned by RH) (so  it's not indexing for sure)
> None of these 138 replicas are leader of corresponding shards.
> All of these 138 replicas when queried with =false returned 
> absolutely the same documents as their leaders.
> I've checked some replicas for segments - yes they have the same segments as 
> their  leaders with absolutely same sizes in bytes.
>
> It seems to me this issue does not affect indexing or searching... it's just 
> curious misread of some information I faced.
>
> My autocommit is:
>
> 
>${solr.autoCommit.maxTime:6}
>false
>  
>
>  
>${solr.autoSoftCommit.maxTime:30}
>  
>
> --
> BR, Vadim
>
>
> > -Original Message-
> > From: Erick Erickson [mailto:erickerick...@gmail.com]
> > Sent: Sunday, November 11, 2018 9:51 PM
> > To: solr-user
> > Subject: Re: Replicationhandler with TLOG replicas
> >
> > Vadim:
> >
> > The next time you see this, is it possible to check that the replicas
> > showing different index versions have the same documents? Actually, it
> > should be sufficient to verify that they have the same segments in
> > their data/index directory, and they should match the segments on the
> > leader _assuming_ you're not actively indexing and you stopped
> > indexing more than the polling interval ago.
> >
> > If you are actively indexing, it should be sufficient to check that
> > the questionable replica's index files are changing over time, that
> > would mean that replication is happening.
> >
> > And what's your commit interval? The polling interval on the followers is:
> > 1> 1/2 the hard commit interval if defined to be > -1. If not
> > 2> 1/2 the soft commit interval if defined to be > -1. If not
> > 3> 3000ms
> >
> > There are two possibilities here as I see it.
> > 1> this is just a reporting error, which we should still address but
> > doesn't worry me much.
> > 2> the TLOG/PULL replication process has some bug and the indexes are,
> > indeed different
> > 2a> when you reloaded the collection, it's possible that the startup
> > progress kicked off a replication
> >and if there's really a bug reloading just masked it.
> >
> > Best,
> > Erick
> > On Sun, Nov 11, 2018 at 2:34 AM Vadim Ivanov
> >  wrote:
> > >
> > > Reload collection helps !
> > > After reloading collection  generation and indexversion returned by
> > Replicationhandler  catch up with the leader
> > >
> > >
> > > > -Original Message-
> > > > From: Vadim Ivanov [mailto:vadim.iva...@spb.ntk-intourist.ru]
> > > > Sent: Sunday, November 11, 2018 1:09 PM
> > > > To: solr-user@lucene.apache.org
> > > > Subject: RE: Replicationhandler with TLOG replicas
> > > >
> > > > Thanks, Shawn
> > > > I have anticipated the answer about information returned by
> > > > ReplicationHandler.
> > > > What baffled me is that usually on most of replicas indexversion and
> > generation
> > > > returned by ReplicationHandler is right and it increases with commits.
> > > > But on some replicas it's not - it stops changing at some moment in the 
> > > > past
> > > > forever.
> > > > For example, I have 5 TLOG replicas:
> > > > For leader(and all good 3 replicas)
> > > > http://host_n:8983/solr/core_n/replication?command=indexversion
> > returnes
> > > > {
> > > >   "responseHeader":{
> > > > "status":0,
> > > > "QTime":0},
> > > >   "indexversion":1541885907200,
> > > >   "generation":1704}
> > > >
> > > > But for one replica:
> > > > {
> > > >   "responseHeader":{

RE: Replicationhandler with TLOG replicas

2018-11-12 Thread Vadim Ivanov
Hi, Erick
I have about 1300 cores in my test environment for 159 colllections
Today I have wrote a script to check all of them.
For 138 out of 1300 cores "generation" and "indexversion" information returned 
by mbeans and replicationhandler do not match.
Most of the replicas has more than 1 gap in generation (fro ex. 14 - returned 
by mbeans. 6 - returned by RH) (so  it's not indexing for sure)
None of these 138 replicas are leader of corresponding shards.
All of these 138 replicas when queried with =false returned absolutely 
the same documents as their leaders.
I've checked some replicas for segments - yes they have the same segments as 
their  leaders with absolutely same sizes in bytes.

It seems to me this issue does not affect indexing or searching... it's just 
curious misread of some information I faced.

My autocommit is:

 
   ${solr.autoCommit.maxTime:6} 
   false 
 

  
   ${solr.autoSoftCommit.maxTime:30} 
 

-- 
BR, Vadim


> -Original Message-
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Sunday, November 11, 2018 9:51 PM
> To: solr-user
> Subject: Re: Replicationhandler with TLOG replicas
> 
> Vadim:
> 
> The next time you see this, is it possible to check that the replicas
> showing different index versions have the same documents? Actually, it
> should be sufficient to verify that they have the same segments in
> their data/index directory, and they should match the segments on the
> leader _assuming_ you're not actively indexing and you stopped
> indexing more than the polling interval ago.
> 
> If you are actively indexing, it should be sufficient to check that
> the questionable replica's index files are changing over time, that
> would mean that replication is happening.
> 
> And what's your commit interval? The polling interval on the followers is:
> 1> 1/2 the hard commit interval if defined to be > -1. If not
> 2> 1/2 the soft commit interval if defined to be > -1. If not
> 3> 3000ms
> 
> There are two possibilities here as I see it.
> 1> this is just a reporting error, which we should still address but
> doesn't worry me much.
> 2> the TLOG/PULL replication process has some bug and the indexes are,
> indeed different
> 2a> when you reloaded the collection, it's possible that the startup
> progress kicked off a replication
>and if there's really a bug reloading just masked it.
> 
> Best,
> Erick
> On Sun, Nov 11, 2018 at 2:34 AM Vadim Ivanov
>  wrote:
> >
> > Reload collection helps !
> > After reloading collection  generation and indexversion returned by
> Replicationhandler  catch up with the leader
> >
> >
> > > -Original Message-
> > > From: Vadim Ivanov [mailto:vadim.iva...@spb.ntk-intourist.ru]
> > > Sent: Sunday, November 11, 2018 1:09 PM
> > > To: solr-user@lucene.apache.org
> > > Subject: RE: Replicationhandler with TLOG replicas
> > >
> > > Thanks, Shawn
> > > I have anticipated the answer about information returned by
> > > ReplicationHandler.
> > > What baffled me is that usually on most of replicas indexversion and
> generation
> > > returned by ReplicationHandler is right and it increases with commits.
> > > But on some replicas it's not - it stops changing at some moment in the 
> > > past
> > > forever.
> > > For example, I have 5 TLOG replicas:
> > > For leader(and all good 3 replicas)
> > > http://host_n:8983/solr/core_n/replication?command=indexversion
> returnes
> > > {
> > >   "responseHeader":{
> > > "status":0,
> > > "QTime":0},
> > >   "indexversion":1541885907200,
> > >   "generation":1704}
> > >
> > > But for one replica:
> > > {
> > >   "responseHeader":{
> > > "status":0,
> > > "QTime":0},
> > >   "indexversion":1540842454653,
> > >   "generation":1216}
> > >
> > > Could it be sign of some hidden issue? Where that information stored and
> why
> > > it stops changing at some moment?
> > > No indexing is going on of that collection at the moment of request. I'm
> > > "deltaimporting" that collection ones per hour and only if needed.
> > > So usually there is only 5-10 commits per day.
> > > It's not a crucial issue for my use case as I have adequate information of
> > > indexversion
> > > and generation returned by mbeans, just curious of that strange behavior.
> > >
> > > > -O

Re: Replicationhandler with TLOG replicas

2018-11-11 Thread Erick Erickson
Vadim:

The next time you see this, is it possible to check that the replicas
showing different index versions have the same documents? Actually, it
should be sufficient to verify that they have the same segments in
their data/index directory, and they should match the segments on the
leader _assuming_ you're not actively indexing and you stopped
indexing more than the polling interval ago.

If you are actively indexing, it should be sufficient to check that
the questionable replica's index files are changing over time, that
would mean that replication is happening.

And what's your commit interval? The polling interval on the followers is:
1> 1/2 the hard commit interval if defined to be > -1. If not
2> 1/2 the soft commit interval if defined to be > -1. If not
3> 3000ms

There are two possibilities here as I see it.
1> this is just a reporting error, which we should still address but
doesn't worry me much.
2> the TLOG/PULL replication process has some bug and the indexes are,
indeed different
2a> when you reloaded the collection, it's possible that the startup
progress kicked off a replication
   and if there's really a bug reloading just masked it.

Best,
Erick
On Sun, Nov 11, 2018 at 2:34 AM Vadim Ivanov
 wrote:
>
> Reload collection helps !
> After reloading collection  generation and indexversion returned by 
> Replicationhandler  catch up with the leader
>
>
> > -Original Message-
> > From: Vadim Ivanov [mailto:vadim.iva...@spb.ntk-intourist.ru]
> > Sent: Sunday, November 11, 2018 1:09 PM
> > To: solr-user@lucene.apache.org
> > Subject: RE: Replicationhandler with TLOG replicas
> >
> > Thanks, Shawn
> > I have anticipated the answer about information returned by
> > ReplicationHandler.
> > What baffled me is that usually on most of replicas indexversion and 
> > generation
> > returned by ReplicationHandler is right and it increases with commits.
> > But on some replicas it's not - it stops changing at some moment in the past
> > forever.
> > For example, I have 5 TLOG replicas:
> > For leader(and all good 3 replicas)
> > http://host_n:8983/solr/core_n/replication?command=indexversion returnes
> > {
> >   "responseHeader":{
> > "status":0,
> > "QTime":0},
> >   "indexversion":1541885907200,
> >   "generation":1704}
> >
> > But for one replica:
> > {
> >   "responseHeader":{
> > "status":0,
> > "QTime":0},
> >   "indexversion":1540842454653,
> >   "generation":1216}
> >
> > Could it be sign of some hidden issue? Where that information stored and why
> > it stops changing at some moment?
> > No indexing is going on of that collection at the moment of request. I'm
> > "deltaimporting" that collection ones per hour and only if needed.
> > So usually there is only 5-10 commits per day.
> > It's not a crucial issue for my use case as I have adequate information of
> > indexversion
> > and generation returned by mbeans, just curious of that strange behavior.
> >
> > > -Original Message-
> > > From: Shawn Heisey [mailto:apa...@elyograg.org]
> > > Sent: Saturday, November 10, 2018 6:46 PM
> > > To: solr-user@lucene.apache.org
> > > Subject: Re: Replicationhandler with TLOG replicas
> > >
> > > On 11/10/2018 8:05 AM, Vadim Ivanov wrote:
> > > > Seems, the latter gets some wrong information as indexversion and
> > > generation
> > > > is far behind then leader.
> > > > But core index seems up to date and healthy.
> > > > Why such things could happen on some replicas? (Most of the replicas
> > > retuned
> > > > the same information by both commands)
> > > > Is information returned  by Replicationhandler  not applicable to 
> > > > tlog/pull
> > > > replicas and is not reliable ?
> > >
> > > SolrCloud does not use the replication handler in the same way that
> > > master/slave replication does.  It "manually" initiates any replication
> > > that takes place -- the replication handler is not in charge.  You
> > > cannot be sure that the indexes the replication handler thinks are
> > > master and slave are in fact the indexes that will be replicated next.
> > > Just ignore anything that the replication handler tells you.  It may
> > > have absolutely no bearing on what's happening.
> > >
> > > Was indexing happening when you looked, or was it entirely stopped?  If
> > > indexing is ongoing, you may have seen the difference in the index
> > > versions in between data being indexed on the leader and the time that
> > > the replication is initiated.
> > >
> > > Thanks,
> > > Shawn
>


RE: Replicationhandler with TLOG replicas

2018-11-11 Thread Vadim Ivanov
Reload collection helps !
After reloading collection  generation and indexversion returned by 
Replicationhandler  catch up with the leader


> -Original Message-
> From: Vadim Ivanov [mailto:vadim.iva...@spb.ntk-intourist.ru]
> Sent: Sunday, November 11, 2018 1:09 PM
> To: solr-user@lucene.apache.org
> Subject: RE: Replicationhandler with TLOG replicas
> 
> Thanks, Shawn
> I have anticipated the answer about information returned by
> ReplicationHandler.
> What baffled me is that usually on most of replicas indexversion and 
> generation
> returned by ReplicationHandler is right and it increases with commits.
> But on some replicas it's not - it stops changing at some moment in the past
> forever.
> For example, I have 5 TLOG replicas:
> For leader(and all good 3 replicas)
> http://host_n:8983/solr/core_n/replication?command=indexversion returnes
> {
>   "responseHeader":{
> "status":0,
> "QTime":0},
>   "indexversion":1541885907200,
>   "generation":1704}
> 
> But for one replica:
> {
>   "responseHeader":{
> "status":0,
> "QTime":0},
>   "indexversion":1540842454653,
>   "generation":1216}
> 
> Could it be sign of some hidden issue? Where that information stored and why
> it stops changing at some moment?
> No indexing is going on of that collection at the moment of request. I'm
> "deltaimporting" that collection ones per hour and only if needed.
> So usually there is only 5-10 commits per day.
> It's not a crucial issue for my use case as I have adequate information of
> indexversion
> and generation returned by mbeans, just curious of that strange behavior.
> 
> > -Original Message-
> > From: Shawn Heisey [mailto:apa...@elyograg.org]
> > Sent: Saturday, November 10, 2018 6:46 PM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Replicationhandler with TLOG replicas
> >
> > On 11/10/2018 8:05 AM, Vadim Ivanov wrote:
> > > Seems, the latter gets some wrong information as indexversion and
> > generation
> > > is far behind then leader.
> > > But core index seems up to date and healthy.
> > > Why such things could happen on some replicas? (Most of the replicas
> > retuned
> > > the same information by both commands)
> > > Is information returned  by Replicationhandler  not applicable to 
> > > tlog/pull
> > > replicas and is not reliable ?
> >
> > SolrCloud does not use the replication handler in the same way that
> > master/slave replication does.  It "manually" initiates any replication
> > that takes place -- the replication handler is not in charge.  You
> > cannot be sure that the indexes the replication handler thinks are
> > master and slave are in fact the indexes that will be replicated next.
> > Just ignore anything that the replication handler tells you.  It may
> > have absolutely no bearing on what's happening.
> >
> > Was indexing happening when you looked, or was it entirely stopped?  If
> > indexing is ongoing, you may have seen the difference in the index
> > versions in between data being indexed on the leader and the time that
> > the replication is initiated.
> >
> > Thanks,
> > Shawn



RE: Replicationhandler with TLOG replicas

2018-11-11 Thread Vadim Ivanov
Thanks, Shawn
I have anticipated the answer about information returned by ReplicationHandler. 
What baffled me is that usually on most of replicas indexversion and generation 
returned by ReplicationHandler is right and it increases with commits.
But on some replicas it's not - it stops changing at some moment in the past 
forever.
For example, I have 5 TLOG replicas:
For leader(and all good 3 replicas)  
http://host_n:8983/solr/core_n/replication?command=indexversion returnes
{
  "responseHeader":{
"status":0,
"QTime":0},
  "indexversion":1541885907200,
  "generation":1704}

But for one replica:
{
  "responseHeader":{
"status":0,
"QTime":0},
  "indexversion":1540842454653,
  "generation":1216}
 
Could it be sign of some hidden issue? Where that information stored and why it 
stops changing at some moment?
No indexing is going on of that collection at the moment of request. I'm 
"deltaimporting" that collection ones per hour and only if needed.
So usually there is only 5-10 commits per day.
It's not a crucial issue for my use case as I have adequate information of 
indexversion 
and generation returned by mbeans, just curious of that strange behavior.
 
> -Original Message-
> From: Shawn Heisey [mailto:apa...@elyograg.org]
> Sent: Saturday, November 10, 2018 6:46 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Replicationhandler with TLOG replicas
> 
> On 11/10/2018 8:05 AM, Vadim Ivanov wrote:
> > Seems, the latter gets some wrong information as indexversion and
> generation
> > is far behind then leader.
> > But core index seems up to date and healthy.
> > Why such things could happen on some replicas? (Most of the replicas
> retuned
> > the same information by both commands)
> > Is information returned  by Replicationhandler  not applicable to tlog/pull
> > replicas and is not reliable ?
> 
> SolrCloud does not use the replication handler in the same way that
> master/slave replication does.  It "manually" initiates any replication
> that takes place -- the replication handler is not in charge.  You
> cannot be sure that the indexes the replication handler thinks are
> master and slave are in fact the indexes that will be replicated next.
> Just ignore anything that the replication handler tells you.  It may
> have absolutely no bearing on what's happening.
> 
> Was indexing happening when you looked, or was it entirely stopped?  If
> indexing is ongoing, you may have seen the difference in the index
> versions in between data being indexed on the leader and the time that
> the replication is initiated.
> 
> Thanks,
> Shawn



Re: Replicationhandler with TLOG replicas

2018-11-10 Thread Shawn Heisey

On 11/10/2018 8:05 AM, Vadim Ivanov wrote:

Seems, the latter gets some wrong information as indexversion and generation
is far behind then leader.
But core index seems up to date and healthy.
Why such things could happen on some replicas? (Most of the replicas retuned
the same information by both commands)
Is information returned  by Replicationhandler  not applicable to tlog/pull
replicas and is not reliable ?


SolrCloud does not use the replication handler in the same way that 
master/slave replication does.  It "manually" initiates any replication 
that takes place -- the replication handler is not in charge.  You 
cannot be sure that the indexes the replication handler thinks are 
master and slave are in fact the indexes that will be replicated next.  
Just ignore anything that the replication handler tells you.  It may 
have absolutely no bearing on what's happening.


Was indexing happening when you looked, or was it entirely stopped?  If 
indexing is ongoing, you may have seen the difference in the index 
versions in between data being indexed on the leader and the time that 
the replication is initiated.


Thanks,
Shawn



Replicationhandler with TLOG replicas

2018-11-10 Thread Vadim Ivanov
Hello!
I have SolrCloud 7.5 with TLOG replicas.
I have noticed that information about replication state of replicas differs
when received from 
...core/admin/mbeans?stats=true=replication=true=REPLICATION
And 
...core/replication?command=indexversion

Seems, the latter gets some wrong information as indexversion and generation
is far behind then leader.
But core index seems up to date and healthy.
Why such things could happen on some replicas? (Most of the replicas retuned
the same information by both commands)
Is information returned  by Replicationhandler  not applicable to tlog/pull
replicas and is not reliable ?

-- 
Vadim