[jira] [Commented] (SOLR-11069) LASTPROCESSEDVERSION for CDCR is flawed when buffering is enabled

2017-08-16 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16128716#comment-16128716
 ] 

Shalin Shekhar Mangar commented on SOLR-11069:
--

Looks good to me, Erick! Thanks for fixing this.

> LASTPROCESSEDVERSION for CDCR is flawed when buffering is enabled
> -
>
> Key: SOLR-11069
> URL: https://issues.apache.org/jira/browse/SOLR-11069
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: CDCR
>Affects Versions: 7.0
>Reporter: Amrit Sarkar
>Assignee: Erick Erickson
> Attachments: SOLR-11069.patch, SOLR-11069.patch, SOLR-11069.patch
>
>
> {{LASTPROCESSEDVERSION}} (a.b.v. LPV) action for CDCR breaks down due to 
> poorly initialised and maintained buffer log for either source or target 
> cluster core nodes.
> If buffer is enabled for cores of either source or target cluster, it return 
> {{-1}}, *irrespective of number of entries in tlog read by the {{leader}}* 
> node of each shard of respective collection of respective cluster. Once 
> disabled, it starts telling us the correct LPV for each core.
> Due to the same flawed behavior, Update Log Synchroniser may doesn't work 
> properly as expected, i.e. provides incorrect seek to the {{non-leader}} 
> nodes to advance at. I am not sure whether this is an intended behavior for 
> sync but it surely doesn't feel right.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11069) LASTPROCESSEDVERSION for CDCR is flawed when buffering is enabled

2017-08-14 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16126249#comment-16126249
 ] 

Erick Erickson commented on SOLR-11069:
---

Thanks for testing! So net-net is that with this patch, with the exception of 
the tlog purging being a little confusing, the patch seems to fix CDCR?

On a relatively brief inspection of the code the 10 tlog bit is unimportant. 
The loop in CdcrUpdateLog.addOldLog removes old logs if and only if there's 
nothing pointing to it. In fact I don't really see the reason for even testing 
it, assuming that the "if (!this.hasLogPointer(log)) {"
line preserves tlogs necessary for CDCR.

I'm not sure we need to fix the fact that tlogs aren't getting purged quite the 
way we'd expect on this ticket, perhaps raise another one? Especially if this 
behavior is also present on 6.1, which I believe it is. CDCR is pretty broken 
with the infinite bootstrapping, but just a little confusing with the tlog 
retention.

> LASTPROCESSEDVERSION for CDCR is flawed when buffering is enabled
> -
>
> Key: SOLR-11069
> URL: https://issues.apache.org/jira/browse/SOLR-11069
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: CDCR
>Affects Versions: 7.0
>Reporter: Amrit Sarkar
>Assignee: Erick Erickson
> Attachments: SOLR-11069.patch
>
>
> {{LASTPROCESSEDVERSION}} (a.b.v. LPV) action for CDCR breaks down due to 
> poorly initialised and maintained buffer log for either source or target 
> cluster core nodes.
> If buffer is enabled for cores of either source or target cluster, it return 
> {{-1}}, *irrespective of number of entries in tlog read by the {{leader}}* 
> node of each shard of respective collection of respective cluster. Once 
> disabled, it starts telling us the correct LPV for each core.
> Due to the same flawed behavior, Update Log Synchroniser may doesn't work 
> properly as expected, i.e. provides incorrect seek to the {{non-leader}} 
> nodes to advance at. I am not sure whether this is an intended behavior for 
> sync but it surely doesn't feel right.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11069) LASTPROCESSEDVERSION for CDCR is flawed when buffering is enabled

2017-08-14 Thread Amrit Sarkar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16125589#comment-16125589
 ] 

Amrit Sarkar commented on SOLR-11069:
-

Thank you Erick for clarifying the root cause. I see LPV may very well not be 
the issue we are facing here, pardon my limited testing on this.

Three things I tested on limited schedule to see if bootstrapping is happening 
with Erick's patch on {{branch_6x}}:

1. Restart source and target clusters at different intervals, see if bootstrap 
is happening.
2. On 2x2 source and target collection - clusters, shut down one node / leader 
to get the other nodes / follower as leader, see if bootstrap is happening.
3. Observe behaviour of source and target tlogs across all cores in both source 
and target collections.

bq. 1. Restart source and target clusters, see if bootstrap is happening.
 
No bootstrap except the obvious, when's required. The combinations I tested:
1. CDCR stop, buffer enable, index X documents and then CDCR on, multiple 
restarts
2. CDCR stop, buffer disable, index X documents and then CDCR on, multiple 
restarts
3. CDCR stop, buffer enable,  index X documents and then CDCR on, buffer 
enable, multiple restarts
4. CDCR stop, buffer disable,  index X documents and then CDCR on, buffer 
disable, multiple restarts
5. Above 4 steps one after another on singly created source and target 
collections - clusters.

The expected behavior is observed, bootstrap when CDCR on.

bq. 2.  On 2x2 source and target collection - clusters, shut down one node / 
leader to get the other nodes / follower as leader, see if bootstrap is 
happening.

No bootstrap except the obvious, when's required. The combinations I tested:
1. CDCR stop, buffer enable, index X documents and then CDCR on, shut down the 
leader node
2. CDCR stop, buffer disable, index X documents and then CDCR on, shut down the 
leader node
3. CDCR stop, buffer enable,  index X documents and then CDCR on, buffer 
enable, shut down the leader node
4. CDCR stop, buffer disable,  index X documents and then CDCR on, buffer 
disable, shut down the leader node
5. Above 4 steps one after another on singly created source and target 
collections - clusters.

The expected behavior is observed, bootstrap when CDCR on. 
{{COLLECTIONCHECKPOINT}} and {{LASTPROCESSESVERSION}} are transferred / 
referred to corresponding new leader elected successfully. 

bq. 3. Observe behaviour of source and target tlogs across all cores in both 
source and target collections.

This was peculiar and as stated by Erick on an offline discussion, I had the 
same observations;
a) When buffer enable, all the tlogs are maintained forever on disk.
b) Once we disable, when no indexing is taking place, it remains as it is.
c) When a single document is indexed after that, the old tlogs gets purged, *it 
doesn't maintain 10 tlogs ONLY as expected*, but more which gradually decreases 
as we index along.
d) There are times only 1-2 tlogs will be present in each core of source 
collections, as observed by Erick too, when we stop indexing all together or 
index slowly. *Not sure of the reason*, didn't had a chance to look into, but I 
speculate there is no need to maintain 10 or N definite number but to keep a 
tab on the last processed tlog version, I suppose, that could be 2nd, 10th or 
30th, depends ?!


> LASTPROCESSEDVERSION for CDCR is flawed when buffering is enabled
> -
>
> Key: SOLR-11069
> URL: https://issues.apache.org/jira/browse/SOLR-11069
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: CDCR
>Affects Versions: 7.0
>Reporter: Amrit Sarkar
>Assignee: Erick Erickson
> Attachments: SOLR-11069.patch
>
>
> {{LASTPROCESSEDVERSION}} (a.b.v. LPV) action for CDCR breaks down due to 
> poorly initialised and maintained buffer log for either source or target 
> cluster core nodes.
> If buffer is enabled for cores of either source or target cluster, it return 
> {{-1}}, *irrespective of number of entries in tlog read by the {{leader}}* 
> node of each shard of respective collection of respective cluster. Once 
> disabled, it starts telling us the correct LPV for each core.
> Due to the same flawed behavior, Update Log Synchroniser may doesn't work 
> properly as expected, i.e. provides incorrect seek to the {{non-leader}} 
> nodes to advance at. I am not sure whether this is an intended behavior for 
> sync but it surely doesn't feel right.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11069) LASTPROCESSEDVERSION for CDCR is flawed when buffering is enabled

2017-08-12 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16124804#comment-16124804
 ] 

Erick Erickson commented on SOLR-11069:
---

I'm dithering back and forth about this. I suspect that we're conflating a 
couple of issues. There's definitely a problem with bootstrapping (I'll attach 
a patch in a minute). It may well be that the LASTPROCESSEDVERSION is not 
actually a problem, at least in some testing (with the attached patch) the fact 
that it is -1 when buffering is enabled seems to be OK.

I propose we use the patch as a starting point to see if this 
LASTPROCESSEDVERSION is a problem or not.

1> when buffering is enabled, tlogs will accrue forever according to the 
original intent. From Renaud:

The original goal of the buffer on cdcr is to indeed keep indefinitely the 
tlogs until the buffer is deactivated 
(https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=62687462#CrossDataCenterReplication(CDCR)-TheBufferElement).
 This was useful for example during maintenance operations, to ensure that the 
source cluster will keep all the tlogs until the target clsuter is properly 
initialised. In this scenario, one will activate the buffer on the source. The 
source will start to store all the tlogs (and does not purge them). Once the 
target cluster is initialised, and has register a tlog pointer on the source, 
one can deactivate the buffer on the source and the tlog will start to be 
purged once they are read by the target cluster.

But additionally he had this to say:
Regarding the issue about LPV = -1, I am a bit surprised as this sentinel value 
should be used only when the source cluster does not have any log pointers, 
i.e., no target cluster were configured and initialised with this source 
cluster. In this case it indicates that there is no registered log reader, and 
that we should not remove any tlogs if buffer is enabled (as we have to wait 
for the target to register a log reader and log pointer). 

And enabling buffering definitely causes LASTPROCESSEDVERSION to return -1. 
However, with the patch LPV immediately goes back to a reasonable value as soon 
as buffering is disabled, the tlogs get cleaned up etc. without bootstrapping. 
So I do wonder if the -1 value is just overloaded in this case to also mean 
"don't purge tlogs".

We need to unentangle a couple of things. I'll attach a patch in a few minutes 
that might help.

> LASTPROCESSEDVERSION for CDCR is flawed when buffering is enabled
> -
>
> Key: SOLR-11069
> URL: https://issues.apache.org/jira/browse/SOLR-11069
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: CDCR
>Affects Versions: 7.0
>Reporter: Amrit Sarkar
>Assignee: Erick Erickson
>
> {{LASTPROCESSEDVERSION}} (a.b.v. LPV) action for CDCR breaks down due to 
> poorly initialised and maintained buffer log for either source or target 
> cluster core nodes.
> If buffer is enabled for cores of either source or target cluster, it return 
> {{-1}}, *irrespective of number of entries in tlog read by the {{leader}}* 
> node of each shard of respective collection of respective cluster. Once 
> disabled, it starts telling us the correct LPV for each core.
> Due to the same flawed behavior, Update Log Synchroniser may doesn't work 
> properly as expected, i.e. provides incorrect seek to the {{non-leader}} 
> nodes to advance at. I am not sure whether this is an intended behavior for 
> sync but it surely doesn't feel right.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11069) LASTPROCESSEDVERSION for CDCR is flawed when buffering is enabled

2017-07-14 Thread Amrit Sarkar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16088095#comment-16088095
 ] 

Amrit Sarkar commented on SOLR-11069:
-

{quote}
I just saw a case where restarting the source cluster triggered bootstrap.
{quote}
When leader of shard of source collection goes down and non-leader is selected, 
it triggers {{bootstrap}} due to above stated reason, LPV set to {{-1}}.

> LASTPROCESSEDVERSION for CDCR is flawed when buffering is enabled
> -
>
> Key: SOLR-11069
> URL: https://issues.apache.org/jira/browse/SOLR-11069
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: CDCR
>Affects Versions: 7.0
>Reporter: Amrit Sarkar
>
> {{LASTPROCESSEDVERSION}} (a.b.v. LPV) action for CDCR breaks down due to 
> poorly initialised and maintained buffer log for either source or target 
> cluster core nodes.
> If buffer is enabled for cores of either source or target cluster, it return 
> {{-1}}, *irrespective of number of entries in tlog read by the {{leader}}* 
> node of each shard of respective collection of respective cluster. Once 
> disabled, it starts telling us the correct LPV for each core.
> Due to the same flawed behavior, Update Log Synchroniser may doesn't work 
> properly as expected, i.e. provides correct seek to the {{non-leader}} nodes 
> to advance at. I am not sure whether this is an intended behavior for sync 
> but it surely doesn't feel right.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11069) LASTPROCESSEDVERSION for CDCR is flawed when buffering is enabled

2017-07-14 Thread Varun Thacker (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16088075#comment-16088075
 ] 

Varun Thacker commented on SOLR-11069:
--

I just saw a case where restarting the source cluster triggered bootstrap. 
Since LASTPROCESSEDVERSION was -1 the source ended up bootstrapping the target. 
Disabling buffer on source makes the pointer move correctly.

> LASTPROCESSEDVERSION for CDCR is flawed when buffering is enabled
> -
>
> Key: SOLR-11069
> URL: https://issues.apache.org/jira/browse/SOLR-11069
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: CDCR
>Affects Versions: 7.0
>Reporter: Amrit Sarkar
>
> {{LASTPROCESSEDVERSION}} (a.b.v. LPV) action for CDCR breaks down due to 
> poorly initialised and maintained buffer log for either source or target 
> cluster core nodes.
> If buffer is enabled for cores of either source or target cluster, it return 
> {{-1}}, *irrespective of number of entries in tlog read by the {{leader}}* 
> node of each shard of respective collection of respective cluster. Once 
> disabled, it starts telling us the correct LPV for each core.
> Due to the same flawed behavior, Update Log Synchroniser may doesn't work 
> properly as expected, i.e. provides correct seek to the {{non-leader}} nodes 
> to advance at. I am not sure whether this is an intended behavior for sync 
> but it surely doesn't feel right.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11069) LASTPROCESSEDVERSION for CDCR is flawed when buffering is enabled

2017-07-14 Thread Amrit Sarkar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16087665#comment-16087665
 ] 

Amrit Sarkar commented on SOLR-11069:
-

Continuing with how LPV was never tested robust:

The only bit where the LPV mentioned in the tests is in 
{{CdcrRequestHandlerTest}}
{code}
// replication never started, lastProcessedVersion should be -1 for both 
shards
rsp = 
invokeCdcrAction(shardToLeaderJetty.get(SOURCE_COLLECTION).get(SHARD1), 
CdcrParams.CdcrAction.LASTPROCESSEDVERSION);
long lastVersion = (Long) rsp.get(CdcrParams.LAST_PROCESSED_VERSION);
assertEquals(-1l, lastVersion);

rsp = 
invokeCdcrAction(shardToLeaderJetty.get(SOURCE_COLLECTION).get(SHARD2), 
CdcrParams.CdcrAction.LASTPROCESSEDVERSION);
lastVersion = (Long) rsp.get(CdcrParams.LAST_PROCESSED_VERSION);
assertEquals(-1l, lastVersion);
{code}

LPV > -1 or what LPV value (which should > 1 atleast) can be when leader reads 
some entries from tlogs is never tested anywhere or at least I cannot find it.

> LASTPROCESSEDVERSION for CDCR is flawed when buffering is enabled
> -
>
> Key: SOLR-11069
> URL: https://issues.apache.org/jira/browse/SOLR-11069
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: CDCR
>Affects Versions: 7.0
>Reporter: Amrit Sarkar
>
> {{LASTPROCESSEDVERSION}} (a.b.v. LPV) action for CDCR breaks down due to 
> poorly initialised and maintained buffer log for either source or target 
> cluster core nodes.
> If buffer is enabled for cores of either source or target cluster, it return 
> {{-1}}, *irrespective of number of entries in tlog read by the {{leader}}* 
> node of each shard of respective collection of respective cluster. Once 
> disabled, it starts telling us the correct LPV for each core.
> Due to the same flawed behavior, Update Log Synchroniser may doesn't work 
> properly as expected, i.e. provides correct seek to the {{non-leader}} nodes 
> to advance at. I am not sure whether this is an intended behavior for sync 
> but it surely doesn't feel right.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11069) LASTPROCESSEDVERSION for CDCR is flawed when buffering is enabled

2017-07-14 Thread Amrit Sarkar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16087100#comment-16087100
 ] 

Amrit Sarkar commented on SOLR-11069:
-

Regarding {{updateLogSynchronizer}} ::

Everytime we call {{DISABLEBUFFER}} or {{ENABLEBUFFER}}, 
CdcrBufferManager::stateUpdate gets invoked::
{code}
@Override
  public synchronized void stateUpdate() {
CdcrUpdateLog ulog = (CdcrUpdateLog) core.getUpdateHandler().getUpdateLog();
// If I am not the leader, I should always buffer my updates
if (!leaderStateManager.amILeader()) {
  ulog.enableBuffer();
  return;
}
// If I am the leader, I should buffer my updates only if buffer is enabled
else if 
(bufferStateManager.getState().equals(CdcrParams.BufferState.ENABLED)) {
  ulog.enableBuffer();
  return;
}
// otherwise, disable the buffer
ulog.disableBuffer();
  }
{code}

The non-leader nodes are by-defaulted are always buffer enabled ::
{code}
if (!leaderStateManager.amILeader()) {
  ulog.enableBuffer();
  return;
}
{code}
though LPV always calculated on leader but it has serious drawbacks explained 
later:

in CdcrUpdateLogSynchronizer:: run :: if buffering is {enabled} ::
{code}
// if we received -1, it means that the log reader on the leader has not yet 
started to read log entries
// do nothing
if (lastVersion == -1) {
  return;
}
try {
  CdcrUpdateLog ulog = (CdcrUpdateLog) 
core.getUpdateHandler().getUpdateLog();
  if (ulog.isBuffering()) {
log.debug("Advancing replica buffering tlog reader to {} @ {}:{}", 
lastVersion, collection, shardId);
ulog.getBufferToggle().seek(lastVersion);
  }
}
{code}
It always returns on {lastVersion == -1} and look at the comment {{if we 
received -1, it means that the log reader on the leader has not yet started to 
read log entries}}, that's misleading.

As the {{lastVersion}} is not +ve, the seek for the corresponding non-leader 
nodes are never set to appropriate LPV. 

Now if the leader goes down, and some non-leader becomes the leader himself, 
the LPV is not set properly resulting in improper sync and I have no idea how 
the impact will be in that case. 

Also, as for non-leader nodes buffer is always on, if in the future it becomes 
the leader itself, even if we have disabled buffer for the source collection 
cluster, the status and its action will be {{buffer enabled}}. Again, not sure 
of the impact, need to look closely.

> LASTPROCESSEDVERSION for CDCR is flawed when buffering is enabled
> -
>
> Key: SOLR-11069
> URL: https://issues.apache.org/jira/browse/SOLR-11069
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: CDCR
>Affects Versions: 7.0
>Reporter: Amrit Sarkar
>
> {{LASTPROCESSEDVERSION}} (a.b.v. LPV) action for CDCR breaks down due to 
> poorly initialised and maintained buffer log for either source or target 
> cluster core nodes.
> If buffer is enabled for cores of either source or target cluster, it return 
> {{-1}}, *irrespective of number of entries in tlog read by the {{leader}}* 
> node of each shard of respective collection of respective cluster. Once 
> disabled, it starts telling us the correct LPV for each core.
> Due to the same flawed behavior, Update Log Synchroniser may doesn't work 
> properly as expected, i.e. provides correct seek to the {{non-leader}} nodes 
> to advance at. I am not sure whether this is an intended behavior for sync 
> but it surely doesn't feel right.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11069) LASTPROCESSEDVERSION for CDCR is flawed when buffering is enabled

2017-07-14 Thread Amrit Sarkar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16087076#comment-16087076
 ] 

Amrit Sarkar commented on SOLR-11069:
-

So when we enable buffering in CDCR, {{buffertoggle}} gets initialised via 
{{newLogReader()}} where ::
{code}
 return new CdcrLogReader(new ArrayList(logs), tlog);
{code}
{code}
private CdcrLogReader(List tlogs, TransactionLog tlog) {
  this.tlogs = new LinkedBlockingDeque<>();
  this.tlogs.addAll(tlogs);
  if (tlog != null) this.tlogs.push(tlog); // ensure that the tlog being 
written is pushed

  // Register the pointer in the parent UpdateLog
  pointer = new CdcrLogPointer();
  logPointers.put(this, pointer);

  // If the reader is initialised while the updates log is empty, do nothing
  if ((currentTlog = this.tlogs.peekLast()) != null) {
tlogReader = currentTlog.getReader(0);
pointer.set(currentTlog.tlogFile);
numRecordsReadInCurrentTlog = 0;
log.debug("Init new tlog reader for {} - tlogReader = {}", 
currentTlog.tlogFile, tlogReader);
  }
}
{code}
{{lastVersion}} and {{nextToLastVersion}} initialised as {{-1}} and never 
changed / modified / updated ever. The recent logs are added into {{tlogs}} and 
current tlog is maintained though.

Now LPV is calculated as: CdcrRequestHandler::handleLastProcessedVersionAction
{code}
for (CdcrReplicatorState state : replicatorManager.getReplicatorStates()) {
  long version = Long.MAX_VALUE;
  if (state.getLogReader() != null) {
version = state.getLogReader().getLastVersion();
  }
  lastProcessedVersion = Math.min(lastProcessedVersion, version);
}

// next check the log reader of the buffer
CdcrUpdateLog.CdcrLogReader bufferLogReader = ((CdcrUpdateLog) 
core.getUpdateHandler().getUpdateLog()).getBufferToggle();
if (bufferLogReader != null) {
  lastProcessedVersion = Math.min(lastProcessedVersion, 
bufferLogReader.getLastVersion());
}
{code}
bufferLogReader.getLastVersion() is calculated {{-1}} and LPV outputs {{-1}}.

> LASTPROCESSEDVERSION for CDCR is flawed when buffering is enabled
> -
>
> Key: SOLR-11069
> URL: https://issues.apache.org/jira/browse/SOLR-11069
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: CDCR
>Affects Versions: 7.0
>Reporter: Amrit Sarkar
>
> {{LASTPROCESSEDVERSION}} (a.b.v. LPV) action for CDCR breaks down due to 
> poorly initialised and maintained buffer log for either source or target 
> cluster core nodes.
> If buffer is enabled for cores of either source or target cluster, it return 
> {{-1}}, *irrespective of number of entries in tlog read by the {{leader}}* 
> node of each shard of respective collection of respective cluster. Once 
> disabled, it starts telling us the correct LPV for each core.
> Due to the same flawed behavior, Update Log Synchroniser may doesn't work 
> properly as expected, i.e. provides correct seek to the {{non-leader}} nodes 
> to advance at. I am not sure whether this is an intended behavior for sync 
> but it surely doesn't feel right.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11069) LASTPROCESSEDVERSION for CDCR is flawed when buffering is enabled

2017-07-13 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16085994#comment-16085994
 ] 

Erick Erickson commented on SOLR-11069:
---

[~shalinmangar] [~rendel] Any comments?

> LASTPROCESSEDVERSION for CDCR is flawed when buffering is enabled
> -
>
> Key: SOLR-11069
> URL: https://issues.apache.org/jira/browse/SOLR-11069
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: CDCR
>Affects Versions: 7.0
>Reporter: Amrit Sarkar
>
> {{LASTPROCESSEDVERSION}} (a.b.v. LPV) action for CDCR breaks down due to 
> poorly initialised and maintained buffer log for either source or target 
> cluster core nodes.
> If buffer is enabled for cores of either source or target cluster, it return 
> {{-1}}, *irrespective of number of entries in tlog read by the {{leader}}* 
> node of each shard of respective collection of respective cluster. Once 
> disabled, it starts telling us the correct LPV for each core.
> Due to the same flawed behavior, Update Log Synchroniser may doesn't work 
> properly as expected, i.e. provides correct seek to the {{non-leader}} nodes 
> to advance at. I am not sure whether this is an intended behavior for sync 
> but it surely doesn't feel right.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org