[jira] [Commented] (SOLR-11278) CdcrBootstrapTest failing in branch_6_6

2017-09-01 Thread Amrit Sarkar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150879#comment-16150879
 ] 

Amrit Sarkar commented on SOLR-11278:
-

Another example, why two simultaneous threads are getting created to invoke 
BOOTSTRAP?

{code}
  [beaster]   2> 38858 INFO  
(updateExecutor-39-thread-1-processing-n:127.0.0.1:42155_solr 
x:cdcr-target_shard1_replica_n1 s:shard1 c:cdcr-target r:core_node2) 
[n:127.0.0.1:42155_solr c:cdcr-target s:shard1 r:core_node2 
x:cdcr-target_shard1_replica_n1] o.a.s.h.CdcrRequestHandler what' the lock this 
time :: true :: thread :: org.apache.solr.handler.CdcrRequestHandler@64d1ccf3
  [beaster]   2> 38858 INFO  (qtp1415866434-168) [n:127.0.0.1:42155_solr 
c:cdcr-target s:shard1 r:core_node2 x:cdcr-target_shard1_replica_n1] 
o.a.s.c.S.Request [cdcr-target_shard1_replica_n1]  webapp=/solr path=/cdcr 
params={qt=/cdcr=BOOTSTRAP_STATUS=javabin=2} status=0 QTime=0
  [beaster]   2> 38859 WARN  
(cdcr-bootstrap-status-66-thread-1-processing-n:127.0.0.1:36193_solr 
x:cdcr-source_shard1_replica_n1 s:shard1 c:cdcr-source r:core_node2) 
[n:127.0.0.1:36193_solr c:cdcr-source s:shard1 r:core_node2 
x:cdcr-source_shard1_replica_n1] o.a.s.h.CdcrReplicatorManager Bootstrap 
process was not found on target collection: cdcr-target shard: shard1, leader: 
http://127.0.0.1:42155/solr/cdcr-target_shard1_replica_n1/
  [beaster]   2> 38860 INFO  
(cdcr-bootstrap-status-66-thread-1-processing-n:127.0.0.1:36193_solr 
x:cdcr-source_shard1_replica_n1 s:shard1 c:cdcr-source r:core_node2) 
[n:127.0.0.1:36193_solr c:cdcr-source s:shard1 r:core_node2 
x:cdcr-source_shard1_replica_n1] o.a.s.h.CdcrReplicatorManager Attempting to 
bootstrap target collection: cdcr-target shard: shard1 leader: 
http://127.0.0.1:42155/solr/cdcr-target_shard1_replica_n1/
  [beaster]   2> 38860 INFO  (qtp1415866434-173) [n:127.0.0.1:42155_solr 
c:cdcr-target s:shard1 r:core_node2 x:cdcr-target_shard1_replica_n1] 
o.a.s.h.CdcrRequestHandler Boostrap is issued now. Request :: 
{action=BOOTSTRAP=/cdcr=http://127.0.0.1:36193/solr/cdcr-source_shard1_replica_n1/=javabin=2}
 : collection : cdcr-target
  [beaster]   2> 38865 INFO  
(recoveryExecutor-40-thread-1-processing-n:127.0.0.1:42155_solr 
x:cdcr-target_shard1_replica_n1 s:shard1 c:cdcr-target r:core_node2) 
[n:127.0.0.1:42155_solr c:cdcr-target s:shard1 r:core_node2 
x:cdcr-target_shard1_replica_n1] o.a.s.u.UpdateLog Starting to buffer updates. 
FSUpdateLog{state=ACTIVE, tlog=null}
  [beaster]   2> 38865 INFO  (qtp1415866434-173) [n:127.0.0.1:42155_solr 
c:cdcr-target s:shard1 r:core_node2 x:cdcr-target_shard1_replica_n1] 
o.a.s.c.S.Request [cdcr-target_shard1_replica_n1]  webapp=/solr path=/cdcr 
params={qt=/cdcr=http://127.0.0.1:36193/solr/cdcr-source_shard1_replica_n1/=BOOTSTRAP=javabin=2}
 status=0 QTime=4
  [beaster]   2> 38865 INFO  
(updateExecutor-39-thread-2-processing-n:127.0.0.1:42155_solr 
x:cdcr-target_shard1_replica_n1 s:shard1 c:cdcr-target r:core_node2) 
[n:127.0.0.1:42155_solr c:cdcr-target s:shard1 r:core_node2 
x:cdcr-target_shard1_replica_n1] o.a.s.h.CdcrRequestHandler what' the lock this 
time :: false :: thread :: org.apache.solr.handler.CdcrRequestHandler@64d1ccf3
{code}

the code is like this:

{code}
private void handleBootstrapAction(SolrQueryRequest req, SolrQueryResponse rsp) 
throws IOException, SolrServerException {
...
...
Runnable runnable = () -> {
  Lock recoveryLock = req.getCore().getSolrCoreState().getRecoveryLock();
  boolean locked = recoveryLock.tryLock();
  log.info("what' the lock this time :: " + locked + " :: thread :: " + 
this);
  SolrCoreState coreState = core.getSolrCoreState();
{code}

> CdcrBootstrapTest failing in branch_6_6
> ---
>
> Key: SOLR-11278
> URL: https://issues.apache.org/jira/browse/SOLR-11278
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: CDCR
>Reporter: Amrit Sarkar
>Assignee: Varun Thacker
> Attachments: SOLR-11278-cancel-bootstrap-on-stop.patch, 
> SOLR-11278.patch, test_results
>
>
> I ran beast for 10 rounds:
> ant beast -Dtestcase=CdcrBootstrapTest -Dtests.multiplier=2 -Dtests.slow=true 
> -Dtests.locale=vi -Dtests.timezone=Asia/Yekaterinburg -Dtests.asserts=true 
> -Dtests.file.encoding=US-ASCII -Dbeast.iters=10
> and seeing following failure:
> {code}
>   [beaster] [01:37:16.282] FAILURE  153s | 
> CdcrBootstrapTest.testBootstrapWithSourceCluster <<<
>   [beaster]> Throwable #1: java.lang.AssertionError: Document mismatch on 
> target after sync expected:<2000> but was:<1000>
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: 

[jira] [Commented] (SOLR-11278) CdcrBootstrapTest failing in branch_6_6

2017-09-01 Thread Amrit Sarkar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150771#comment-16150771
 ] 

Amrit Sarkar commented on SOLR-11278:
-

Confirmed there are two seperate bootstrap threads initiated, one acquires 
lock, one fails ::
{code}
[beaster]   2> 34430 INFO  
(updateExecutor-39-thread-1-processing-n:127.0.0.1:35690_solr 
x:cdcr-target_shard1_replica_n1 s:shard1 c:cdcr-target r:core_node2) 
[n:127.0.0.1:35690_solr c:cdcr-target s:shard1 r:core_node2 
x:cdcr-target_shard1_replica_n1] o.a.s.h.CdcrRequestHandler what' the lock this 
time :: true
  [beaster]   2> 34431 INFO  (qtp510024884-173) [n:127.0.0.1:35690_solr 
c:cdcr-target s:shard1 r:core_node2 x:cdcr-target_shard1_replica_n1] 
o.a.s.c.S.Request [cdcr-target_shard1_replica_n1]  webapp=/solr path=/cdcr 
params={qt=/cdcr=http://127.0.0.1:38721/solr/cdcr-source_shard1_replica_n1/=BOOTSTRAP=javabin=2}
 status=0 QTime=10
  [beaster]   2> 34432 INFO  (qtp600983226-216) [n:127.0.0.1:38721_solr 
c:cdcr-source s:shard1 r:core_node2 x:cdcr-source_shard1_replica_n1] 
o.a.s.c.S.Request [cdcr-source_shard1_replica_n1]  webapp=/solr path=/update 
params={_stateVer_=cdcr-source:4=javabin=2} status=0 QTime=13
  [beaster]   2> 34433 INFO  
(updateExecutor-39-thread-2-processing-n:127.0.0.1:35690_solr 
x:cdcr-target_shard1_replica_n1 s:shard1 c:cdcr-target r:core_node2) 
[n:127.0.0.1:35690_solr c:cdcr-target s:shard1 r:core_node2 
x:cdcr-target_shard1_replica_n1] o.a.s.h.CdcrRequestHandler what' the lock this 
time :: false
{code}

{{updateExecutor-39-thread-1-processing}}.
{{updateExecutor-39-thread-2-processing}}.

There are only three lines printed (two added for extensive logging) for 
{{updateExecutor-39-thread-2-processing}}, why? and why it is introduced at 
first place.

{code}
  [beaster]   2> 34433 INFO  
(updateExecutor-39-thread-2-processing-n:127.0.0.1:35690_solr 
x:cdcr-target_shard1_replica_n1 s:shard1 c:cdcr-target r:core_node2) 
[n:127.0.0.1:35690_solr c:cdcr-target s:shard1 r:core_node2 
x:cdcr-target_shard1_replica_n1] o.a.s.h.CdcrRequestHandler what' the lock this 
time :: false
  [beaster]   2> 34433 INFO  
(updateExecutor-39-thread-2-processing-n:127.0.0.1:35690_solr 
x:cdcr-target_shard1_replica_n1 s:shard1 c:cdcr-target r:core_node2) 
[n:127.0.0.1:35690_solr c:cdcr-target s:shard1 r:core_node2 
x:cdcr-target_shard1_replica_n1] o.a.s.h.CdcrRequestHandler we reached this 
point :: CANCEL BOOTSTRAP, locked :: false
  [beaster]   2> 34433 INFO  
(updateExecutor-39-thread-2-processing-n:127.0.0.1:35690_solr 
x:cdcr-target_shard1_replica_n1 s:shard1 c:cdcr-target r:core_node2) 
[n:127.0.0.1:35690_solr c:cdcr-target s:shard1 r:core_node2 
x:cdcr-target_shard1_replica_n1] o.a.s.h.CdcrRequestHandler someone called me, 
yes they did
{code}

My best guess :

See after {{updateExecutor-39-thread-1-processing}} gets the lock and invoke 
BOOTSTRAP api. An update is received on {{source}} collection and right after 
that another *"updateExecutor-39-thread-2-processing"* is invoked trying to 
acquire the lock and eventually when it fails, invoke CANCEL_BOOTSTRAP, 
creating chaos.

*The fine time frame b/w INVOKING bootstrap, CHANGING bootstrap status to 
running and RECEIVING a new update on source is creating confusion/chaos, as 
handleBootrapStatus is runnable thread, so more than one can gets invoked.*

Got the cause, need the solution.

> CdcrBootstrapTest failing in branch_6_6
> ---
>
> Key: SOLR-11278
> URL: https://issues.apache.org/jira/browse/SOLR-11278
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: CDCR
>Reporter: Amrit Sarkar
>Assignee: Varun Thacker
> Attachments: SOLR-11278-cancel-bootstrap-on-stop.patch, 
> SOLR-11278.patch, test_results
>
>
> I ran beast for 10 rounds:
> ant beast -Dtestcase=CdcrBootstrapTest -Dtests.multiplier=2 -Dtests.slow=true 
> -Dtests.locale=vi -Dtests.timezone=Asia/Yekaterinburg -Dtests.asserts=true 
> -Dtests.file.encoding=US-ASCII -Dbeast.iters=10
> and seeing following failure:
> {code}
>   [beaster] [01:37:16.282] FAILURE  153s | 
> CdcrBootstrapTest.testBootstrapWithSourceCluster <<<
>   [beaster]> Throwable #1: java.lang.AssertionError: Document mismatch on 
> target after sync expected:<2000> but was:<1000>
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11278) CdcrBootstrapTest failing in branch_6_6

2017-09-01 Thread Amrit Sarkar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150469#comment-16150469
 ] 

Amrit Sarkar commented on SOLR-11278:
-

Alright!

There are two issues in the troubled CDCR test :

1. Why bootstrap is called two seperate times in single life cycle

-> first time bootstrap is called when we enable CDCR, and 1000 documents are 
boostrapped to target

-> the second one is when bootstrapping is happening in background and new 
updates are getting shoved down to source. Once the first boostrap is 
successfully completed, target sees (I speculate) it is much behind and issues 
bootstrap again, which I don't really understand and putting extra logging 
doesn't really helps too. I am seeing more. OR all this is wrong, and boostrap 
is happening just one time in a weird manner.

-> as above listed, the first bootstrap and indexing into source cluster 
happens simulteneouly at one point. How it is suppose to behave that time.

according to code: CdcrRequestHandler:BootstrapCallable:call()::

{code}

 @Override
public Boolean call() throws Exception {
  boolean success = false;
  Throwable exception = null;
  UpdateLog ulog = core.getUpdateHandler().getUpdateLog();
  // we start buffering updates as a safeguard however we do not expect
  // to receive any updates from the source during bootstrap
  ulog.bufferUpdates();
  try {
commitOnLeader(masterUrl);

{code}

2. Why bootstrap issues at the second time (if it is) fails miserably

-> when the bootstrap is issued, it calls boostrapCallable to apply buffer 
updates and if "fetchIndex is not successful OR boostrapCallable is closed", 
drop those updates from UpdateLog in {{finally}} block

-> in this case, when bootstrap gets issued :: it checks whether it can get the 
LOCK ::

{code}
private void handleBootstrapAction(SolrQueryRequest req, SolrQueryResponse rsp) 
throws IOException, SolrServerException {
...

Runnable runnable = () -> {
  Lock recoveryLock = req.getCore().getSolrCoreState().getRecoveryLock();
  boolean locked = recoveryLock.tryLock();
  SolrCoreState coreState = core.getSolrCoreState();
  try {
if (!locked)  {
  log.info("we reached this point :: CANCEL BOOTSTRAP, locked :: " + 
locked);
  handleCancelBootstrap(req, rsp);
} 
{code}

and if not, it issues CANCEL BOOTSTRAP, which happend exactly in this case. 
CANCEL_BOOSTRAP is issued and applied QUIETLY.

->  in the background within the bootstrap,  applyBufferUpdates, in ulog, 
{{tlog}} is mysteriously NULL, suggesting no updates were recieved, umm why? 
and return bootstrapFuture as null and set the STATE of replication as ACTIVE.

according to code :: UpdatesLog.java ::

{code}
/** Returns the Future to wait on, or null if no replay was needed */
  public Future applyBufferedUpdates() {
// recovery trips this assert under some race - even when
// it checks the state first
// assert state == State.BUFFERING;

// block all updates to eliminate race conditions
// reading state and acting on it in the update processor
versionInfo.blockUpdates();
try {
  cancelApplyBufferUpdate = false;
  log.info("applyBufferedUpdates :: state :: start :: " + state);
  if (state != State.BUFFERING){
return null;
  }
  operationFlags &= ~FLAG_GAP;

  // handle case when no log was even created because no updates
  // were received.
  if (tlog == null) {
log.info("applyBufferedUpdates :: state :: middle 1 :: " + state);
state = State.ACTIVE;
return null;
  }
  log.info("applyBufferedUpdates :: state :: middle 2 :: " + state);
{code}

-> now in finally block, the fetchIndex is successfull but "closed" is true 
(DUE TO CANCEL BOOTSTRAP apprently) which is initialised to false:: it tries to 
drop the buffer updates from ulog where ::

{code}
 /** Returns true if we were able to drop buffered updates and return to the 
ACTIVE state */
  public boolean dropBufferedUpdates() {
versionInfo.blockUpdates();
try {
  log.info("dropBufferedUpdates :: state :: start " + state);
  if (state != State.BUFFERING){
return false;
  }
...
{code}

STATE is ACTIVE and returns {{false}}.

faling the "assert dropped" in BootstrapCallable and eventually emitting: 
"bootstrap operation failed".

Once bootstrap fails, in CdcrBootsrapTest::waitTargetToSync :: it waits for 120 
seconds, and recieves constant numFound from target, == 1000 / 1001 / 1100, any 
number basically before boostrap failed and eventually Assertion at 
CdcrBootsrapTest fails.

There are still some unanswered questions like:

1. Why in the first place two boostrap command are issued OR we think we two 
are issued and it one wrapped up in an awkward life time. Why the conventional 
forwarding not happening?

[jira] [Commented] (SOLR-11278) CdcrBootstrapTest failing in branch_6_6

2017-09-01 Thread Amrit Sarkar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150251#comment-16150251
 ] 

Amrit Sarkar commented on SOLR-11278:
-

Due to this assertion error, the {{waitForTargetSync}} emits the same 1000 / 
1001 / 1100 numFound for target for 120 seconds, as bootstrap has failed due to 
"UNABLE TO DROP BUFFER UPDATES". I am not sure what is the significance of 
dropping updates in {{finally}} block in bootstrap process. Would appreciate 
some advice on this.

> CdcrBootstrapTest failing in branch_6_6
> ---
>
> Key: SOLR-11278
> URL: https://issues.apache.org/jira/browse/SOLR-11278
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: CDCR
>Reporter: Amrit Sarkar
>Assignee: Varun Thacker
> Attachments: SOLR-11278-cancel-bootstrap-on-stop.patch, 
> SOLR-11278.patch, test_results
>
>
> I ran beast for 10 rounds:
> ant beast -Dtestcase=CdcrBootstrapTest -Dtests.multiplier=2 -Dtests.slow=true 
> -Dtests.locale=vi -Dtests.timezone=Asia/Yekaterinburg -Dtests.asserts=true 
> -Dtests.file.encoding=US-ASCII -Dbeast.iters=10
> and seeing following failure:
> {code}
>   [beaster] [01:37:16.282] FAILURE  153s | 
> CdcrBootstrapTest.testBootstrapWithSourceCluster <<<
>   [beaster]> Throwable #1: java.lang.AssertionError: Document mismatch on 
> target after sync expected:<2000> but was:<1000>
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11278) CdcrBootstrapTest failing in branch_6_6

2017-09-01 Thread Amrit Sarkar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150247#comment-16150247
 ] 

Amrit Sarkar commented on SOLR-11278:
-

Root cause of the assertion error ::

In bootstrap process, in bootstrap callable process,

{code}
Future future = ulog.applyBufferedUpdates();
  
}
return success;
  } finally {
if (closed || !success) {
  ...
  boolean dropped = ulog.dropBufferedUpdates();
  log.info("boostrap callable dropped :: " + dropped);
  assert dropped;
}
  }
{code}

{{ulog.dropBufferedUpdates()}} is getting calculated as {{false}} at ::

UpdateLog::dropBufferedUpdates::

{code}
  /** Returns true if we were able to drop buffered updates and return to the 
ACTIVE state */
  public boolean dropBufferedUpdates() {
versionInfo.blockUpdates();
try {
  log.info("dropBufferedUpdates :: state :: start " + state);
  if (state != State.BUFFERING){
return false;
  }
...
{code}

This state is assigned *{{ACTIVE}}* instead of buffering and it returns 
*{{false}}* and returns..

in UpdateLog, I cannot figure out where exactly {{state}} is getting assigned 
as *ACTIVE* b/w {{applybufferedupdates}} and {{dropbufferedupdates}} and 
running test to detect the same. Ideally the initial value of {{state}} is 
assigned {{ACTIVE}}.

> CdcrBootstrapTest failing in branch_6_6
> ---
>
> Key: SOLR-11278
> URL: https://issues.apache.org/jira/browse/SOLR-11278
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: CDCR
>Reporter: Amrit Sarkar
>Assignee: Varun Thacker
> Attachments: SOLR-11278-cancel-bootstrap-on-stop.patch, 
> SOLR-11278.patch, test_results
>
>
> I ran beast for 10 rounds:
> ant beast -Dtestcase=CdcrBootstrapTest -Dtests.multiplier=2 -Dtests.slow=true 
> -Dtests.locale=vi -Dtests.timezone=Asia/Yekaterinburg -Dtests.asserts=true 
> -Dtests.file.encoding=US-ASCII -Dbeast.iters=10
> and seeing following failure:
> {code}
>   [beaster] [01:37:16.282] FAILURE  153s | 
> CdcrBootstrapTest.testBootstrapWithSourceCluster <<<
>   [beaster]> Throwable #1: java.lang.AssertionError: Document mismatch on 
> target after sync expected:<2000> but was:<1000>
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11278) CdcrBootstrapTest failing in branch_6_6

2017-09-01 Thread Amrit Sarkar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150114#comment-16150114
 ] 

Amrit Sarkar commented on SOLR-11278:
-

Rambling again:

What is the use of bootstrapFuture essentially? to get status of the current 
operation, right?

In CdcrRequestHandler.java :: there are some custom log lines, ignore them:

{code}
 Runnable runnable = () -> {
  Lock recoveryLock = req.getCore().getSolrCoreState().getRecoveryLock();
  boolean locked = recoveryLock.tryLock();
  SolrCoreState coreState = core.getSolrCoreState();
  try {
if (!locked)  {
  log.info("we reached this point :: CANCEL BOOTSTRAP, locked :: " + 
locked);
  handleCancelBootstrap(req, rsp);
} else if (leaderStateManager.amILeader())  {
  coreState.setCdcrBootstrapRunning(true);
  //running.set(true);
  String masterUrl = req.getParams().get(ReplicationHandler.MASTER_URL);
  BootstrapCallable bootstrapCallable = new 
BootstrapCallable(masterUrl, core);
  coreState.setCdcrBootstrapCallable(bootstrapCallable);
  Future bootstrapFuture = 
core.getCoreContainer().getUpdateShardHandler().getRecoveryExecutor()
  .submit(bootstrapCallable);
  try {
log.info("we reached this point :: all good, bootstrapFuture.get :: 
" + bootstrapFuture.get());
  } catch (Exception e) {
log.error("bootstrapFuture.get :: ",e);
  }
  coreState.setCdcrBootstrapFuture(bootstrapFuture);
  try {
bootstrapFuture.get();
  } catch (InterruptedException e) {
Thread.currentThread().interrupt();
log.warn("Bootstrap was interrupted", e);
  } catch (ExecutionException e) {
log.error("Bootstrap operation failed", e);
  }
} else  {
  log.error("Action {} sent to non-leader replica @ {}:{}. Aborting 
bootstrap.", CdcrParams.CdcrAction.BOOTSTRAP, collectionName, shard);
}
  } finally {
if (locked) {
  coreState.setCdcrBootstrapRunning(false);
  recoveryLock.unlock();
}
  }
};
{code}

*bootstrapFuture.get()* throws:

  [beaster]   2> 43072 ERROR 
(updateExecutor-39-thread-1-processing-n:127.0.0.1:41488_solr 
x:cdcr-target_shard1_replica_n1 s:shard1 c:cdcr-target r:core_node2) 
[n:127.0.0.1:41488_solr c:cdcr-target s:shard1 r:core_node2 
x:cdcr-target_shard1_replica_n1] o.a.s.h.CdcrRequestHandler Bootstrap operation 
failed
  [beaster]   2> java.util.concurrent.ExecutionException: 
java.lang.AssertionError
  [beaster]   2>at 
java.util.concurrent.FutureTask.report(FutureTask.java:122)
  [beaster]   2>at 
java.util.concurrent.FutureTask.get(FutureTask.java:192)
  [beaster]   2>at 
org.apache.solr.handler.CdcrRequestHandler.lambda$handleBootstrapAction$0(CdcrRequestHandler.java:653)
  [beaster]   2>at 
com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176)
  [beaster]   2>at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
  [beaster]   2>at 
java.util.concurrent.FutureTask.run(FutureTask.java:266)
  [beaster]   2>at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:188)
  [beaster]   2>at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
  [beaster]   2>at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
  [beaster]   2>at java.lang.Thread.run(Thread.java:748)
  [beaster]   2> Caused by: java.lang.AssertionError
  [beaster]   2>at 
org.apache.solr.handler.CdcrRequestHandler$BootstrapCallable.call(CdcrRequestHandler.java:804)
  [beaster]   2>at 
org.apache.solr.handler.CdcrRequestHandler$BootstrapCallable.call(CdcrRequestHandler.java:723)
  [beaster]   2>at 
com.codahale.metrics.InstrumentedExecutorService$InstrumentedCallable.call(InstrumentedExecutorService.java:197)
  [beaster]   2>... 5 more

and bootstrap operation fails.

FutureTask.java ::
{code}
/**
 * Returns result or throws exception for completed task.
 * @param s completed state value
 */
@SuppressWarnings("unchecked")
private V report(int s) throws ExecutionException {
Object x = outcome;
if (s == NORMAL)
return (V)x;
if (s >= CANCELLED)
throw new CancellationException();
throw new ExecutionException((Throwable)x);
}
{code}

and the assertion failure is at same function {{finally}} block ::
{code}
if (closed || !success) {
  // we cannot apply the buffer in this case because it will introduce 
newer versions in the
  // update log and then the source cluster will get those versions via 

[jira] [Commented] (SOLR-11278) CdcrBootstrapTest failing in branch_6_6

2017-08-31 Thread Amrit Sarkar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16148943#comment-16148943
 ] 

Amrit Sarkar commented on SOLR-11278:
-

Ok.

I had three successful beast runs w/o any assertion failure :

{{branch_6_6}} :: without applying Shalin's bootstrap status fix patch
1. ant beast -Dbeast.iters=50 -Dtestcase=CdcrBootstrapTest
2. ant beast -Dtests.dups=2 -Dtests.iters=2 -Dbeast.iters=35 
-Dtestcase=CdcrBootstrapTest

{{master}} :: patch is committed yesterday
1. ant beast -Dbeast.iters=50 -Dtestcase=CdcrBootstrapTest -Dtests.dups=2 
-Dtests.iters=2

Suddenly life's good. Will keep testing more.

> CdcrBootstrapTest failing in branch_6_6
> ---
>
> Key: SOLR-11278
> URL: https://issues.apache.org/jira/browse/SOLR-11278
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: CDCR
>Reporter: Amrit Sarkar
>Assignee: Varun Thacker
> Attachments: SOLR-11278-cancel-bootstrap-on-stop.patch, 
> SOLR-11278.patch, test_results
>
>
> I ran beast for 10 rounds:
> ant beast -Dtestcase=CdcrBootstrapTest -Dtests.multiplier=2 -Dtests.slow=true 
> -Dtests.locale=vi -Dtests.timezone=Asia/Yekaterinburg -Dtests.asserts=true 
> -Dtests.file.encoding=US-ASCII -Dbeast.iters=10
> and seeing following failure:
> {code}
>   [beaster] [01:37:16.282] FAILURE  153s | 
> CdcrBootstrapTest.testBootstrapWithSourceCluster <<<
>   [beaster]> Throwable #1: java.lang.AssertionError: Document mismatch on 
> target after sync expected:<2000> but was:<1000>
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11278) CdcrBootstrapTest failing in branch_6_6

2017-08-30 Thread Amrit Sarkar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16148468#comment-16148468
 ] 

Amrit Sarkar commented on SOLR-11278:
-

Shalin, I am onto this. 
CdcrBootstrapTest.testBootstrapWithContinousIndexingOnSourceCluster is failing 
intermittently when Bootstrap gets failed due to unannounced SolrCore shutdown. 
This SolrCore shutdown may not be the root issue, but fetchIndex from source. 
Please see below:

{code}
  [beaster]   1> Adding 10 docs with commit=true, numDocs=1100
  [beaster]   2> 62998 INFO  (qtp358530981-503) [n:127.0.0.1:36872_solr 
c:cdcr-source s:shard1 r:core_node1 x:cdcr-source_shard1_replica1] 
o.a.s.u.DirectUpdateHandler2 start 
commit{,optimize=false,openSearcher=false,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
  [beaster]   2> 62998 INFO  (qtp358530981-503) [n:127.0.0.1:36872_solr 
c:cdcr-source s:shard1 r:core_node1 x:cdcr-source_shard1_replica1] 
o.a.s.u.SolrIndexWriter Calling setCommitData with 
IW:org.apache.solr.update.SolrIndexWriter@10111fe6
  [beaster]   2> 63003 INFO  (qtp358530981-502) [n:127.0.0.1:36872_solr 
c:cdcr-source s:shard1 r:core_node1 x:cdcr-source_shard1_replica1] 
o.a.s.c.S.Request [cdcr-source_shard1_replica1]  webapp=/solr path=/update 
params={_stateVer_=cdcr-source:3=javabin=2} status=0 QTime=10
  [beaster]   2> 63010 INFO  (qtp1081974557-457) [n:127.0.0.1:36154_solr 
c:cdcr-target s:shard1 r:core_node1 x:cdcr-target_shard1_replica1] 
o.a.s.c.S.Request [cdcr-target_shard1_replica1]  webapp=/solr path=/cdcr 
params={qt=/cdcr=http://127.0.0.1:36872/solr/cdcr-source_shard1_replica1/=BOOTSTRAP=javabin=2}
 status=0 QTime=32
  [beaster]   2> 63012 INFO  (qtp1081974557-459) [n:127.0.0.1:36154_solr 
c:cdcr-target s:shard1 r:core_node1 x:cdcr-target_shard1_replica1] 
o.a.s.c.S.Request [cdcr-target_shard1_replica1]  webapp=/solr path=/cdcr 
params={qt=/cdcr=BOOTSTRAP_STATUS=javabin=2} status=0 QTime=0
  [beaster]   2> 63013 INFO  
(cdcr-bootstrap-status-136-thread-1-processing-n:127.0.0.1:36872_solr 
x:cdcr-source_shard1_replica1 s:shard1 c:cdcr-source r:core_node1) 
[n:127.0.0.1:36872_solr c:cdcr-source s:shard1 r:core_node1 
x:cdcr-source_shard1_replica1] o.a.s.h.CdcrReplicatorManager CDCR bootstrap 
running for 1 seconds, sleeping for 2000 ms
  [beaster]   2> 63171 INFO  (qtp358530981-503) [n:127.0.0.1:36872_solr 
c:cdcr-source s:shard1 r:core_node1 x:cdcr-source_shard1_replica1] 
o.a.s.s.SolrIndexSearcher Opening 
[Searcher@7cd86816[cdcr-source_shard1_replica1] realtime]
  [beaster]   2> 63171 INFO  (qtp358530981-503) [n:127.0.0.1:36872_solr 
c:cdcr-source s:shard1 r:core_node1 x:cdcr-source_shard1_replica1] 
o.a.s.u.DirectUpdateHandler2 end_commit_flush
  [beaster]   2> 63171 INFO  (qtp358530981-504) [n:127.0.0.1:36872_solr 
c:cdcr-source s:shard1 r:core_node1 x:cdcr-source_shard1_replica1] 
o.a.s.u.DirectUpdateHandler2 start 
commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
  [beaster]   2> 63171 INFO  (qtp358530981-504) [n:127.0.0.1:36872_solr 
c:cdcr-source s:shard1 r:core_node1 x:cdcr-source_shard1_replica1] 
o.a.s.u.SolrIndexWriter Calling setCommitData with 
IW:org.apache.solr.update.SolrIndexWriter@10111fe6
  [beaster]   2> 63172 INFO  (qtp358530981-503) [n:127.0.0.1:36872_solr 
c:cdcr-source s:shard1 r:core_node1 x:cdcr-source_shard1_replica1] 
o.a.s.c.S.Request [cdcr-source_shard1_replica1]  webapp=/solr path=/update 
params={waitSearcher=true=false=true=false_end_point=true=javabin=2}
 status=0 QTime=174
  [beaster]   2> 63175 INFO  (qtp358530981-497) [n:127.0.0.1:36872_solr 
c:cdcr-source s:shard1 r:core_node1 x:cdcr-source_shard1_replica1] 
o.a.s.c.S.Request [cdcr-source_shard1_replica1]  webapp=/solr path=/replication 
params={qt=/replication=javabin=2=indexversion} status=0 
QTime=0
  [beaster]   2> 63176 INFO  
(recoveryExecutor-110-thread-1-processing-n:127.0.0.1:36154_solr 
x:cdcr-target_shard1_replica1 s:shard1 c:cdcr-target r:core_node1) 
[n:127.0.0.1:36154_solr c:cdcr-target s:shard1 r:core_node1 
x:cdcr-target_shard1_replica1] o.a.s.h.IndexFetcher Master's generation: 12
  [beaster]   2> 63176 INFO  
(recoveryExecutor-110-thread-1-processing-n:127.0.0.1:36154_solr 
x:cdcr-target_shard1_replica1 s:shard1 c:cdcr-target r:core_node1) 
[n:127.0.0.1:36154_solr c:cdcr-target s:shard1 r:core_node1 
x:cdcr-target_shard1_replica1] o.a.s.h.IndexFetcher Master's version: 
1504153213504
  [beaster]   2> 63176 INFO  
(recoveryExecutor-110-thread-1-processing-n:127.0.0.1:36154_solr 
x:cdcr-target_shard1_replica1 s:shard1 c:cdcr-target r:core_node1) 
[n:127.0.0.1:36154_solr c:cdcr-target s:shard1 r:core_node1 
x:cdcr-target_shard1_replica1] o.a.s.h.IndexFetcher Slave's generation: 1
  [beaster]   2> 63176 INFO  
(recoveryExecutor-110-thread-1-processing-n:127.0.0.1:36154_solr 
x:cdcr-target_shard1_replica1 s:shard1 c:cdcr-target r:core_node1) 

[jira] [Commented] (SOLR-11278) CdcrBootstrapTest failing in branch_6_6

2017-08-30 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16148445#comment-16148445
 ] 

Shalin Shekhar Mangar commented on SOLR-11278:
--

There are still failures in 
CdcrBootstrapTest.testBootstrapWithContinousIndexingOnSourceCluster after this 
fix. I don't have the time right now to dig but if someone is willing to do the 
investigation, I can help with reviews and fixes.

> CdcrBootstrapTest failing in branch_6_6
> ---
>
> Key: SOLR-11278
> URL: https://issues.apache.org/jira/browse/SOLR-11278
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: CDCR
>Reporter: Amrit Sarkar
>Assignee: Varun Thacker
> Attachments: SOLR-11278-cancel-bootstrap-on-stop.patch, 
> SOLR-11278.patch, test_results
>
>
> I ran beast for 10 rounds:
> ant beast -Dtestcase=CdcrBootstrapTest -Dtests.multiplier=2 -Dtests.slow=true 
> -Dtests.locale=vi -Dtests.timezone=Asia/Yekaterinburg -Dtests.asserts=true 
> -Dtests.file.encoding=US-ASCII -Dbeast.iters=10
> and seeing following failure:
> {code}
>   [beaster] [01:37:16.282] FAILURE  153s | 
> CdcrBootstrapTest.testBootstrapWithSourceCluster <<<
>   [beaster]> Throwable #1: java.lang.AssertionError: Document mismatch on 
> target after sync expected:<2000> but was:<1000>
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11278) CdcrBootstrapTest failing in branch_6_6

2017-08-30 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16147654#comment-16147654
 ] 

ASF subversion and git services commented on SOLR-11278:


Commit 56ddf473bcf80d9a8f7c3c17d6869cdd3c674223 in lucene-solr's branch 
refs/heads/branch_7x from [~shalinmangar]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=56ddf47 ]

SOLR-11278: Stopping CDCR should cancel a running bootstrap operation

(cherry picked from commit b4c6bfa)


> CdcrBootstrapTest failing in branch_6_6
> ---
>
> Key: SOLR-11278
> URL: https://issues.apache.org/jira/browse/SOLR-11278
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: CDCR
>Reporter: Amrit Sarkar
>Assignee: Varun Thacker
> Attachments: SOLR-11278-cancel-bootstrap-on-stop.patch, 
> SOLR-11278.patch, test_results
>
>
> I ran beast for 10 rounds:
> ant beast -Dtestcase=CdcrBootstrapTest -Dtests.multiplier=2 -Dtests.slow=true 
> -Dtests.locale=vi -Dtests.timezone=Asia/Yekaterinburg -Dtests.asserts=true 
> -Dtests.file.encoding=US-ASCII -Dbeast.iters=10
> and seeing following failure:
> {code}
>   [beaster] [01:37:16.282] FAILURE  153s | 
> CdcrBootstrapTest.testBootstrapWithSourceCluster <<<
>   [beaster]> Throwable #1: java.lang.AssertionError: Document mismatch on 
> target after sync expected:<2000> but was:<1000>
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11278) CdcrBootstrapTest failing in branch_6_6

2017-08-30 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16147647#comment-16147647
 ] 

ASF subversion and git services commented on SOLR-11278:


Commit b4c6bfafdb32127316b334ffdd195270e1077fbe in lucene-solr's branch 
refs/heads/master from [~shalinmangar]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=b4c6bfa ]

SOLR-11278: Stopping CDCR should cancel a running bootstrap operation


> CdcrBootstrapTest failing in branch_6_6
> ---
>
> Key: SOLR-11278
> URL: https://issues.apache.org/jira/browse/SOLR-11278
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: CDCR
>Reporter: Amrit Sarkar
>Assignee: Varun Thacker
> Attachments: SOLR-11278-cancel-bootstrap-on-stop.patch, 
> SOLR-11278.patch, test_results
>
>
> I ran beast for 10 rounds:
> ant beast -Dtestcase=CdcrBootstrapTest -Dtests.multiplier=2 -Dtests.slow=true 
> -Dtests.locale=vi -Dtests.timezone=Asia/Yekaterinburg -Dtests.asserts=true 
> -Dtests.file.encoding=US-ASCII -Dbeast.iters=10
> and seeing following failure:
> {code}
>   [beaster] [01:37:16.282] FAILURE  153s | 
> CdcrBootstrapTest.testBootstrapWithSourceCluster <<<
>   [beaster]> Throwable #1: java.lang.AssertionError: Document mismatch on 
> target after sync expected:<2000> but was:<1000>
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11278) CdcrBootstrapTest failing in branch_6_6

2017-08-30 Thread Amrit Sarkar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16147168#comment-16147168
 ] 

Amrit Sarkar commented on SOLR-11278:
-

When I commented out the second set of documents getting indexed in source, I 
didn't receive assertion failure, instead NPE returned back ::

{code}
   [junit4]   2> 71670 ERROR 
(recoveryExecutor-6-thread-1-processing-n:127.0.0.1:62762_solr 
x:cdcr-target_shard1_replica1 s:shard1 c:cdcr-target r:core_node1) 
[n:127.0.0.1:62762_solr c:cdcr-target s:shard1 r:core_node1 
x:cdcr-target_shard1_replica1] o.a.s.h.ReplicationHandler Index fetch failed 
:org.apache.solr.common.SolrException: Index fetch failed : 
   [junit4]   2>at 
org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:598)
   [junit4]   2>at 
org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:301)
   [junit4]   2>at 
org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:400)
   [junit4]   2>at 
org.apache.solr.handler.CdcrRequestHandler$BootstrapCallable.call(CdcrRequestHandler.java:759)
   [junit4]   2>at 
org.apache.solr.handler.CdcrRequestHandler$BootstrapCallable.call(CdcrRequestHandler.java:714)
   [junit4]   2>at 
com.codahale.metrics.InstrumentedExecutorService$InstrumentedCallable.call(InstrumentedExecutorService.java:197)
   [junit4]   2>at 
java.util.concurrent.FutureTask.run(FutureTask.java:266)
   [junit4]   2>at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
   [junit4]   2>at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
   [junit4]   2>at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
   [junit4]   2>at java.lang.Thread.run(Thread.java:745)
   [junit4]   2> Caused by: java.lang.NullPointerException
   [junit4]   2>at 
org.apache.solr.handler.IndexFetcher.openNewSearcherAndUpdateCommitPoint(IndexFetcher.java:823)
   [junit4]   2>at 
org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:575)
   [junit4]   2>... 10 more
{code}

> CdcrBootstrapTest failing in branch_6_6
> ---
>
> Key: SOLR-11278
> URL: https://issues.apache.org/jira/browse/SOLR-11278
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: CDCR
>Reporter: Amrit Sarkar
>Assignee: Varun Thacker
> Attachments: SOLR-11278.patch, test_results
>
>
> I ran beast for 10 rounds:
> ant beast -Dtestcase=CdcrBootstrapTest -Dtests.multiplier=2 -Dtests.slow=true 
> -Dtests.locale=vi -Dtests.timezone=Asia/Yekaterinburg -Dtests.asserts=true 
> -Dtests.file.encoding=US-ASCII -Dbeast.iters=10
> and seeing following failure:
> {code}
>   [beaster] [01:37:16.282] FAILURE  153s | 
> CdcrBootstrapTest.testBootstrapWithSourceCluster <<<
>   [beaster]> Throwable #1: java.lang.AssertionError: Document mismatch on 
> target after sync expected:<2000> but was:<1000>
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11278) CdcrBootstrapTest failing in branch_6_6

2017-08-30 Thread Amrit Sarkar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16147149#comment-16147149
 ] 

Amrit Sarkar commented on SOLR-11278:
-

Still rambling:

{code}
  /**
   * This test start cdcr source, adds data,starts target cluster, verifies 
replication,
   * stops cdcr replication and buffering, adds more data, re-enables cdcr and 
verify replication
   */
  public void testBootstrapWithSourceCluster() throws Exception {
// start the target first so that we know its zkhost
MiniSolrCloudCluster target = new MiniSolrCloudCluster(1, 
createTempDir("cdcr-target"), buildJettyConfig("/solr"));
try {
  target.waitForAllNodes(30);
  System.out.println("Target zkHost = " + 
target.getZkServer().getZkAddress());
  System.setProperty("cdcr.target.zkHost", 
target.getZkServer().getZkAddress());

  MiniSolrCloudCluster source = new MiniSolrCloudCluster(1, 
createTempDir("cdcr-source"), buildJettyConfig("/solr"));
  try {
source.waitForAllNodes(30);
source.uploadConfigSet(configset("cdcr-source"), "cdcr-source");

CollectionAdminRequest.createCollection("cdcr-source", "cdcr-source", 
1, 1)
.withProperty("solr.directoryFactory", 
"solr.StandardDirectoryFactory")
.process(source.getSolrClient());

CloudSolrClient sourceSolrClient = source.getSolrClient();
sourceSolrClient.setDefaultCollection("cdcr-source");
int docs = (TEST_NIGHTLY ? 100 : 10);
int numDocs = 0;
for (int k = 0; k < docs; k++) {
  UpdateRequest req = new UpdateRequest();
  for (; numDocs < (k + 1) * 100; numDocs++) {
SolrInputDocument doc = new SolrInputDocument();
doc.addField("id", "source_" + numDocs);
doc.addField("xyz", numDocs);
req.add(doc);
  }
  req.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);
  System.out.println("Adding " + docs + " docs with commit=true, 
numDocs=" + numDocs);
  req.process(sourceSolrClient);
}

QueryResponse response = sourceSolrClient.query(new SolrQuery("*:*"));
assertEquals("", numDocs, response.getResults().getNumFound());

// setup the target cluster
target.uploadConfigSet(configset("cdcr-target"), "cdcr-target");
CollectionAdminRequest.createCollection("cdcr-target", "cdcr-target", 
1, 1)
.process(target.getSolrClient());
CloudSolrClient targetSolrClient = target.getSolrClient();
targetSolrClient.setDefaultCollection("cdcr-target");

cdcrStart(targetSolrClient);
cdcrStart(sourceSolrClient);

System.out.println("bs status TX :: " + 
invokeCdcrAction(targetSolrClient, CdcrParams.CdcrAction.BOOTSTRAP_STATUS));
System.out.println("bs status SX :: " + 
invokeCdcrAction(sourceSolrClient, CdcrParams.CdcrAction.BOOTSTRAP_STATUS));

response = getCdcrQueue(sourceSolrClient);
System.out.println("Cdcr queue response: " + response.getResponse());
long foundDocs = waitForTargetToSync(numDocs, targetSolrClient);

System.out.println("bs status TY :: " + 
invokeCdcrAction(targetSolrClient, CdcrParams.CdcrAction.BOOTSTRAP_STATUS));
System.out.println("bs status SY :: " + 
invokeCdcrAction(sourceSolrClient, CdcrParams.CdcrAction.BOOTSTRAP_STATUS));

assertEquals("Document mismatch on target after sync", numDocs, 
foundDocs);

System.out.println("bs status TZ :: " + 
invokeCdcrAction(targetSolrClient, CdcrParams.CdcrAction.BOOTSTRAP_STATUS));
System.out.println("bs status SZ :: " + 
invokeCdcrAction(sourceSolrClient, CdcrParams.CdcrAction.BOOTSTRAP_STATUS));

cdcrStop(sourceSolrClient);
cdcrDisableBuffer(sourceSolrClient);

System.out.println("bs status TA :: " + 
invokeCdcrAction(targetSolrClient, CdcrParams.CdcrAction.BOOTSTRAP_STATUS));
System.out.println("bs status SA :: " + 
invokeCdcrAction(sourceSolrClient, CdcrParams.CdcrAction.BOOTSTRAP_STATUS));

int c = 0;
for (int k = 0; k < 10; k++) {
  UpdateRequest req = new UpdateRequest();
  for (; c < (k + 1) * 100; c++, numDocs++) {
SolrInputDocument doc = new SolrInputDocument();
doc.addField("id", "source_" + numDocs);
doc.addField("xyz", numDocs);
req.add(doc);
  }
  req.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);
  System.out.println("Adding 100 docs with commit=true, numDocs=" + 
numDocs);
  req.process(sourceSolrClient);
}

response = sourceSolrClient.query(new SolrQuery("*:*"));
assertEquals("", numDocs, response.getResults().getNumFound());

cdcrStart(sourceSolrClient);
cdcrEnableBuffer(sourceSolrClient);

System.out.println("bs status T1 :: " + 
invokeCdcrAction(targetSolrClient, 

[jira] [Commented] (SOLR-11278) CdcrBootstrapTest failing in branch_6_6

2017-08-30 Thread Amrit Sarkar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16146821#comment-16146821
 ] 

Amrit Sarkar commented on SOLR-11278:
-

Ok here's what's happening :

*BOOTSTRAP_STATUS* is never calculated / comes to be *FINISHED* in target 
cluster for all the three tests in the testClass.

I need to confirm what happens if bootstrap is not completed and solrcore is 
requested to be closed.

> CdcrBootstrapTest failing in branch_6_6
> ---
>
> Key: SOLR-11278
> URL: https://issues.apache.org/jira/browse/SOLR-11278
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: CDCR
>Reporter: Amrit Sarkar
>Assignee: Varun Thacker
> Attachments: SOLR-11278.patch, test_results
>
>
> I ran beast for 10 rounds:
> ant beast -Dtestcase=CdcrBootstrapTest -Dtests.multiplier=2 -Dtests.slow=true 
> -Dtests.locale=vi -Dtests.timezone=Asia/Yekaterinburg -Dtests.asserts=true 
> -Dtests.file.encoding=US-ASCII -Dbeast.iters=10
> and seeing following failure:
> {code}
>   [beaster] [01:37:16.282] FAILURE  153s | 
> CdcrBootstrapTest.testBootstrapWithSourceCluster <<<
>   [beaster]> Throwable #1: java.lang.AssertionError: Document mismatch on 
> target after sync expected:<2000> but was:<1000>
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11278) CdcrBootstrapTest failing in branch_6_6

2017-08-29 Thread Amrit Sarkar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16145829#comment-16145829
 ] 

Amrit Sarkar commented on SOLR-11278:
-

yeah! *I verified the bootstrap status comes out to be RUNNING instead of 
FINISHED even after all the operations are completed, assertions are true, and 
we are going to shut down both source and target next.* 

{code}
...
...
foundDocs = waitForTargetToSync(numDocs, targetSolrClient);
assertEquals("Document mismatch on target after sync", numDocs, 
foundDocs);

System.out.println(" bs status 1 :: " + 
invokeCdcrAction(targetSolrClient, CdcrParams.CdcrAction.BOOTSTRAP_STATUS));

cdcrStop(targetSolrClient);
cdcrStop(sourceSolrClient);

System.out.println(" bs status 2 :: " + 
invokeCdcrAction(targetSolrClient, CdcrParams.CdcrAction.BOOTSTRAP_STATUS));

  } finally {
source.shutdown();
System.out.println("source shutdown");
  }
} finally {
  target.shutdown();
  System.out.println("target shutdown");
}
  }
{code}

something's off with bootstrap or bootstrap_status, they are not in sync.

> CdcrBootstrapTest failing in branch_6_6
> ---
>
> Key: SOLR-11278
> URL: https://issues.apache.org/jira/browse/SOLR-11278
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: CDCR
>Reporter: Amrit Sarkar
>Assignee: Varun Thacker
> Attachments: SOLR-11278.patch, test_results
>
>
> I ran beast for 10 rounds:
> ant beast -Dtestcase=CdcrBootstrapTest -Dtests.multiplier=2 -Dtests.slow=true 
> -Dtests.locale=vi -Dtests.timezone=Asia/Yekaterinburg -Dtests.asserts=true 
> -Dtests.file.encoding=US-ASCII -Dbeast.iters=10
> and seeing following failure:
> {code}
>   [beaster] [01:37:16.282] FAILURE  153s | 
> CdcrBootstrapTest.testBootstrapWithSourceCluster <<<
>   [beaster]> Throwable #1: java.lang.AssertionError: Document mismatch on 
> target after sync expected:<2000> but was:<1000>
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11278) CdcrBootstrapTest failing in branch_6_6

2017-08-29 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16145564#comment-16145564
 ] 

Erick Erickson commented on SOLR-11278:
---

Now we're getting somewhere! I'm re-running some tests and will see if I get 
the same error in failing cases.

This starts to hang together with the close-to-packet-size difference in doc 
counts: my bet is there's a segment missing on the target. I'll think about how 
to test that.

> CdcrBootstrapTest failing in branch_6_6
> ---
>
> Key: SOLR-11278
> URL: https://issues.apache.org/jira/browse/SOLR-11278
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: CDCR
>Reporter: Amrit Sarkar
>Assignee: Varun Thacker
> Attachments: SOLR-11278.patch, test_results
>
>
> I ran beast for 10 rounds:
> ant beast -Dtestcase=CdcrBootstrapTest -Dtests.multiplier=2 -Dtests.slow=true 
> -Dtests.locale=vi -Dtests.timezone=Asia/Yekaterinburg -Dtests.asserts=true 
> -Dtests.file.encoding=US-ASCII -Dbeast.iters=10
> and seeing following failure:
> {code}
>   [beaster] [01:37:16.282] FAILURE  153s | 
> CdcrBootstrapTest.testBootstrapWithSourceCluster <<<
>   [beaster]> Throwable #1: java.lang.AssertionError: Document mismatch on 
> target after sync expected:<2000> but was:<1000>
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11278) CdcrBootstrapTest failing in branch_6_6

2017-08-29 Thread Amrit Sarkar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16145553#comment-16145553
 ] 

Amrit Sarkar commented on SOLR-11278:
-

[~varunthacker]

Yes you are right. This solution is not full-proofed.

I am able to narrow down one problem, still no solution on the way:

{code}
   [junit4]   2> 93761 ERROR 
(recoveryExecutor-5-thread-1-processing-n:127.0.0.1:55637_solr 
x:cdcr-target_shard1_replica1 s:shard1 c:cdcr-target r:core_node1) 
[n:127.0.0.1:55637_solr c:cdcr-target s:shard1 r:core_node1 
x:cdcr-target_shard1_replica1] o.a.s.h.ReplicationHandler Index fetch failed 
:org.apache.solr.common.SolrException: Index fetch failed : 
   [junit4]   2>at 
org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:540)
   [junit4]   2>at 
org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:251)
   [junit4]   2>at 
org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:397)
   [junit4]   2>at 
org.apache.solr.handler.CdcrRequestHandler$BootstrapCallable.call(CdcrRequestHandler.java:758)
   [junit4]   2>at 
org.apache.solr.handler.CdcrRequestHandler$BootstrapCallable.call(CdcrRequestHandler.java:713)
   [junit4]   2>at 
java.util.concurrent.FutureTask.run(FutureTask.java:266)
   [junit4]   2>at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
   [junit4]   2>at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
   [junit4]   2>at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
   [junit4]   2>at java.lang.Thread.run(Thread.java:745)
   [junit4]   2> Caused by: java.lang.NullPointerException
   [junit4]   2>at 
org.apache.solr.handler.IndexFetcher.openNewSearcherAndUpdateCommitPoint(IndexFetcher.java:753)
   [junit4]   2>at 
org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:517)
   [junit4]   2>... 9 more
   [junit4]   2> 
{code}

The problem is with *{{target.shutdown}}*. While doing target shutdown, it 
doesn't shutdown properly.

{code}
   [junit4]   2> 67512 INFO  (coreCloseExecutor-49-thread-1) 
[n:127.0.0.1:55637_solr c:cdcr-target s:shard1 r:core_node1 
x:cdcr-target_shard1_replica1] o.a.s.h.CdcrRequestHandler Solr core is being 
closed - shutting down CDCR handler @ cdcr-target:shard1
   [junit4]   2> 67515 WARN  
(updateExecutor-4-thread-1-processing-n:127.0.0.1:55637_solr 
x:cdcr-target_shard1_replica1 s:shard1 c:cdcr-target r:core_node1) 
[n:127.0.0.1:55637_solr c:cdcr-target s:shard1 r:core_node1 
x:cdcr-target_shard1_replica1] o.a.s.h.CdcrRequestHandler Bootstrap was 
interrupted
   [junit4]   2> java.lang.InterruptedException
   [junit4]   2>at 
java.util.concurrent.FutureTask.awaitDone(FutureTask.java:404)
   [junit4]   2>at 
java.util.concurrent.FutureTask.get(FutureTask.java:191)
   [junit4]   2>at 
org.apache.solr.handler.CdcrRequestHandler.lambda$handleBootstrapAction$0(CdcrRequestHandler.java:644)
   [junit4]   2>at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
   [junit4]   2>at 
java.util.concurrent.FutureTask.run(FutureTask.java:266)
   [junit4]   2>at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
   [junit4]   2>at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
   [junit4]   2>at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
   [junit4]   2>at java.lang.Thread.run(Thread.java:745)
{code}

and the core gets reloaded and throws NPE. I am diving into this properly, but 
it have a feeling this is a machine specific issue as we didn't see much of 
these in Jenkins failures as mentioned by Varun.

> CdcrBootstrapTest failing in branch_6_6
> ---
>
> Key: SOLR-11278
> URL: https://issues.apache.org/jira/browse/SOLR-11278
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: CDCR
>Reporter: Amrit Sarkar
>Assignee: Varun Thacker
> Attachments: SOLR-11278.patch, test_results
>
>
> I ran beast for 10 rounds:
> ant beast -Dtestcase=CdcrBootstrapTest -Dtests.multiplier=2 -Dtests.slow=true 
> -Dtests.locale=vi -Dtests.timezone=Asia/Yekaterinburg -Dtests.asserts=true 
> -Dtests.file.encoding=US-ASCII -Dbeast.iters=10
> and seeing following failure:
> {code}
>   [beaster] [01:37:16.282] FAILURE  153s | 
> CdcrBootstrapTest.testBootstrapWithSourceCluster <<<
>   [beaster]> Throwable #1: java.lang.AssertionError: Document mismatch on 
> target after sync expected:<2000> but was:<1000>

[jira] [Commented] (SOLR-11278) CdcrBootstrapTest failing in branch_6_6

2017-08-29 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16145551#comment-16145551
 ] 

Erick Erickson commented on SOLR-11278:
---

bq: I'm guessing this will help but don't you think the bigger problem is why 
can't replicate 10k documents ( 2 fields ) in 120 seconds? Perhaps we can add 
some more logging to the test case like when the bootstrapping starts etc .

The evidence from a test I ran the other day (results in SOLR-11003) is that 
these delays won't help. Or if they do let the test pass they just mask a more 
fundamental problem. I changed one of the tests so if waitForTargetToSync 
didn't return the proper number of docs I'd add one more document to the source 
and call waitForTargetToSync again. Here's the important bits:

At the initial mismatch in waitForTargetToSync
target: 1901
source: 2000

After my new loop that adds a single doc to source then calls 
waitForTargetToSync again I see counts of
target: 1902
source: 2001

Clearly my new doc is being indexed to the source and sent to the target so 
it's not just a matter of the docs getting to the target but somehow not being 
available to the currently-open searcher. They're just not on the target at all.

I'll instrument the tests a little more. It's strange that the difference is 99 
and the packet size is 100, kind of feels like an off-by-one error.

> CdcrBootstrapTest failing in branch_6_6
> ---
>
> Key: SOLR-11278
> URL: https://issues.apache.org/jira/browse/SOLR-11278
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: CDCR
>Reporter: Amrit Sarkar
>Assignee: Varun Thacker
> Attachments: SOLR-11278.patch, test_results
>
>
> I ran beast for 10 rounds:
> ant beast -Dtestcase=CdcrBootstrapTest -Dtests.multiplier=2 -Dtests.slow=true 
> -Dtests.locale=vi -Dtests.timezone=Asia/Yekaterinburg -Dtests.asserts=true 
> -Dtests.file.encoding=US-ASCII -Dbeast.iters=10
> and seeing following failure:
> {code}
>   [beaster] [01:37:16.282] FAILURE  153s | 
> CdcrBootstrapTest.testBootstrapWithSourceCluster <<<
>   [beaster]> Throwable #1: java.lang.AssertionError: Document mismatch on 
> target after sync expected:<2000> but was:<1000>
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11278) CdcrBootstrapTest failing in branch_6_6

2017-08-29 Thread Amrit Sarkar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16145107#comment-16145107
 ] 

Amrit Sarkar commented on SOLR-11278:
-

bq. I don't get the logic for this. Doesn't {[waitForTargetToSync}} already 
wait for 120 seconds ? Adding another 7 seconds shouldn't help much no?

This is a wrong observation. It doesn't wait 120 seconds. *It issues commit 
every second for a period of 120 seconds and checks whether the docs are same 
or not*. 

Every explicit commit will make the Solr Core to reload. If SolrCore is closed 
while bootstrapping / fetch index is taking place, we receive: 
{{InterruptEDxception for Bootstrap}}. Solrcore is reloaded at middle of 
copying, reloads fails too and SolrCore becomes NULL.

See:
{code}
  private long waitForTargetToSync(int numDocs, CloudSolrClient 
targetSolrClient) throws SolrServerException, IOException, InterruptedException 
{
long start = System.nanoTime();
QueryResponse response = null;
while (System.nanoTime() - start <= TimeUnit.NANOSECONDS.convert(120, 
TimeUnit.SECONDS)) {
  try {
targetSolrClient.commit();
response = targetSolrClient.query(new SolrQuery("*:*"));
if (response.getResults().getNumFound() == numDocs) {
  break;
}
  } catch (Exception e) {
log.warn("Exception trying to commit on target. This is expected and 
safe to ignore.", e);
  }
  Thread.sleep(1000);
}
return response != null ? response.getResults().getNumFound() : 0;
  }
{code}

> CdcrBootstrapTest failing in branch_6_6
> ---
>
> Key: SOLR-11278
> URL: https://issues.apache.org/jira/browse/SOLR-11278
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: CDCR
>Reporter: Amrit Sarkar
>Assignee: Varun Thacker
> Attachments: SOLR-11278.patch, test_results
>
>
> I ran beast for 10 rounds:
> ant beast -Dtestcase=CdcrBootstrapTest -Dtests.multiplier=2 -Dtests.slow=true 
> -Dtests.locale=vi -Dtests.timezone=Asia/Yekaterinburg -Dtests.asserts=true 
> -Dtests.file.encoding=US-ASCII -Dbeast.iters=10
> and seeing following failure:
> {code}
>   [beaster] [01:37:16.282] FAILURE  153s | 
> CdcrBootstrapTest.testBootstrapWithSourceCluster <<<
>   [beaster]> Throwable #1: java.lang.AssertionError: Document mismatch on 
> target after sync expected:<2000> but was:<1000>
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11278) CdcrBootstrapTest failing in branch_6_6

2017-08-29 Thread Varun Thacker (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16145066#comment-16145066
 ] 

Varun Thacker commented on SOLR-11278:
--

> 2. I bumped up the time / sleep time b/w issuing CDCR and checking target 
> collection for numDocs to 7 seconds, 

I don't get the logic for this. Doesn't {[waitForTargetToSync}} already wait 
for 120 seconds ? Adding another 7 seconds shouldn't help much no?

> and lowered documents from 10K to 2K

I'm guessing this will help but don't you think the bigger problem is why can't 
replicate 10k documents ( 2 fields ) in 120 seconds? Perhaps we can add some 
more logging to the test case like when the bootstrapping starts etc .


Before we wait for the target cluster to catch up , we print the queue size 
from the source cluster in the test case.

{code}
[junit4]   1 Cdcr queue response: 
{responseHeader={status=0,QTime=2},queues={127.0.0.1:64653/solr={cdcr-target={queueSize=2020,lastTimestamp=}}},tlogTotalSize=96279,tlogTotalCount=20,updateLogSynchronizer=stopped}
{code}

So this doesn't seem like a lot of documents to take 120 seconds to catch up?

> CdcrBootstrapTest failing in branch_6_6
> ---
>
> Key: SOLR-11278
> URL: https://issues.apache.org/jira/browse/SOLR-11278
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: CDCR
>Reporter: Amrit Sarkar
>Assignee: Varun Thacker
> Attachments: SOLR-11278.patch, test_results
>
>
> I ran beast for 10 rounds:
> ant beast -Dtestcase=CdcrBootstrapTest -Dtests.multiplier=2 -Dtests.slow=true 
> -Dtests.locale=vi -Dtests.timezone=Asia/Yekaterinburg -Dtests.asserts=true 
> -Dtests.file.encoding=US-ASCII -Dbeast.iters=10
> and seeing following failure:
> {code}
>   [beaster] [01:37:16.282] FAILURE  153s | 
> CdcrBootstrapTest.testBootstrapWithSourceCluster <<<
>   [beaster]> Throwable #1: java.lang.AssertionError: Document mismatch on 
> target after sync expected:<2000> but was:<1000>
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11278) CdcrBootstrapTest failing in branch_6_6

2017-08-29 Thread Varun Thacker (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16145012#comment-16145012
 ] 

Varun Thacker commented on SOLR-11278:
--

Hi Amrit,

I removed 6.6 from the Affects Versions(s) as this test fails regularly on 
other branches too.

In general this has failed quite a bit while building out an RC for 6.6.1 so it 
will be nice if we can figure out a fix for it

> CdcrBootstrapTest failing in branch_6_6
> ---
>
> Key: SOLR-11278
> URL: https://issues.apache.org/jira/browse/SOLR-11278
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: CDCR
>Reporter: Amrit Sarkar
>Assignee: Varun Thacker
> Attachments: SOLR-11278.patch, test_results
>
>
> I ran beast for 10 rounds:
> ant beast -Dtestcase=CdcrBootstrapTest -Dtests.multiplier=2 -Dtests.slow=true 
> -Dtests.locale=vi -Dtests.timezone=Asia/Yekaterinburg -Dtests.asserts=true 
> -Dtests.file.encoding=US-ASCII -Dbeast.iters=10
> and seeing following failure:
> {code}
>   [beaster] [01:37:16.282] FAILURE  153s | 
> CdcrBootstrapTest.testBootstrapWithSourceCluster <<<
>   [beaster]> Throwable #1: java.lang.AssertionError: Document mismatch on 
> target after sync expected:<2000> but was:<1000>
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11278) CdcrBootstrapTest failing in branch_6_6

2017-08-29 Thread Amrit Sarkar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16145011#comment-16145011
 ] 

Amrit Sarkar commented on SOLR-11278:
-

I am running beasts of 100 rounds, if passed, we may very well be comfortable 
with these changes.

> CdcrBootstrapTest failing in branch_6_6
> ---
>
> Key: SOLR-11278
> URL: https://issues.apache.org/jira/browse/SOLR-11278
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: CDCR
>Affects Versions: 6.6.1
>Reporter: Amrit Sarkar
> Attachments: SOLR-11278.patch, test_results
>
>
> I ran beast for 10 rounds:
> ant beast -Dtestcase=CdcrBootstrapTest -Dtests.multiplier=2 -Dtests.slow=true 
> -Dtests.locale=vi -Dtests.timezone=Asia/Yekaterinburg -Dtests.asserts=true 
> -Dtests.file.encoding=US-ASCII -Dbeast.iters=10
> and seeing following failure:
> {code}
>   [beaster] [01:37:16.282] FAILURE  153s | 
> CdcrBootstrapTest.testBootstrapWithSourceCluster <<<
>   [beaster]> Throwable #1: java.lang.AssertionError: Document mismatch on 
> target after sync expected:<2000> but was:<1000>
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11278) CdcrBootstrapTest failing in branch_6_6

2017-08-28 Thread Amrit Sarkar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16143949#comment-16143949
 ] 

Amrit Sarkar commented on SOLR-11278:
-

Posting SOLR-11003 discussion here:

bq. All the tests in this class fails where we stop CDCR, index docs in source 
and then turns on CDCR again and expect BOOTSTRAP to happen. If I debug on IDE, 
all tests passes successfully (as the steps slows down), suggesting the time to 
wait for target to sync is low. But increasing it to 5 minutes even, instead of 
default 2 minutes, doesn't work. I increased explicit commit issued while 
waiting from 1 to 3 second, doesn't work either.
Let me know if there are other tests which are failing too related to CDCR.

bq. Though the tests which are constantly failing I mentioned above, do Solr 
Core reload every second when it waits for target to sync. It can very well be 
variant of SOLR-11034 and/or SOLR-11035, as we can see occasional NPE at 
IndexFetcher.java:753.



> CdcrBootstrapTest failing in branch_6_6
> ---
>
> Key: SOLR-11278
> URL: https://issues.apache.org/jira/browse/SOLR-11278
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: CDCR
>Affects Versions: 6.6.1
>Reporter: Amrit Sarkar
> Attachments: test_results
>
>
> I ran beast for 10 rounds:
> ant beast -Dtestcase=CdcrBootstrapTest -Dtests.multiplier=2 -Dtests.slow=true 
> -Dtests.locale=vi -Dtests.timezone=Asia/Yekaterinburg -Dtests.asserts=true 
> -Dtests.file.encoding=US-ASCII -Dbeast.iters=10
> and seeing following failure:
> {code}
>   [beaster] [01:37:16.282] FAILURE  153s | 
> CdcrBootstrapTest.testBootstrapWithSourceCluster <<<
>   [beaster]> Throwable #1: java.lang.AssertionError: Document mismatch on 
> target after sync expected:<2000> but was:<1000>
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11278) CdcrBootstrapTest failing in branch_6_6

2017-08-28 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16143931#comment-16143931
 ] 

Erick Erickson commented on SOLR-11278:
---

My bet, and this would be for verification purposes only:

If on failure we added a single document to the source and tried again we 
wouldn't fail. Which would implicate SOLR-11034 and SOLR-11035. Hmmm, let me 
give that a whirl. Those two JIRAs are holding up several other JIRAs,...

> CdcrBootstrapTest failing in branch_6_6
> ---
>
> Key: SOLR-11278
> URL: https://issues.apache.org/jira/browse/SOLR-11278
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: CDCR
>Affects Versions: 6.6.1
>Reporter: Amrit Sarkar
> Attachments: test_results
>
>
> I ran beast for 10 rounds:
> ant beast -Dtestcase=CdcrBootstrapTest -Dtests.multiplier=2 -Dtests.slow=true 
> -Dtests.locale=vi -Dtests.timezone=Asia/Yekaterinburg -Dtests.asserts=true 
> -Dtests.file.encoding=US-ASCII -Dbeast.iters=10
> and seeing following failure:
> {code}
>   [beaster] [01:37:16.282] FAILURE  153s | 
> CdcrBootstrapTest.testBootstrapWithSourceCluster <<<
>   [beaster]> Throwable #1: java.lang.AssertionError: Document mismatch on 
> target after sync expected:<2000> but was:<1000>
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org