[jira] [Comment Edited] (SOLR-11278) CdcrBootstrapTest failing in branch_6_6

2017-09-01 Thread Amrit Sarkar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150114#comment-16150114
 ] 

Amrit Sarkar edited comment on SOLR-11278 at 9/1/17 6:41 AM:
-

Rambling again:

What is the use of bootstrapFuture essentially? to get status of the current 
operation, right?

In CdcrRequestHandler.java :: there are some custom log lines, ignore them:

{code}
 Runnable runnable = () -> {
  Lock recoveryLock = req.getCore().getSolrCoreState().getRecoveryLock();
  boolean locked = recoveryLock.tryLock();
  SolrCoreState coreState = core.getSolrCoreState();
  try {
if (!locked)  {
  log.info("we reached this point :: CANCEL BOOTSTRAP, locked :: " + 
locked);
  handleCancelBootstrap(req, rsp);
} else if (leaderStateManager.amILeader())  {
  coreState.setCdcrBootstrapRunning(true);
  //running.set(true);
  String masterUrl = req.getParams().get(ReplicationHandler.MASTER_URL);
  BootstrapCallable bootstrapCallable = new 
BootstrapCallable(masterUrl, core);
  coreState.setCdcrBootstrapCallable(bootstrapCallable);
  Future bootstrapFuture = 
core.getCoreContainer().getUpdateShardHandler().getRecoveryExecutor()
  .submit(bootstrapCallable);
  try {
log.info("we reached this point :: all good, bootstrapFuture.get :: 
" + bootstrapFuture.get());
  } catch (Exception e) {
log.error("bootstrapFuture.get :: ",e);
  }
  coreState.setCdcrBootstrapFuture(bootstrapFuture);
  try {
bootstrapFuture.get();
  } catch (InterruptedException e) {
Thread.currentThread().interrupt();
log.warn("Bootstrap was interrupted", e);
  } catch (ExecutionException e) {
log.error("Bootstrap operation failed", e);
  }
} else  {
  log.error("Action {} sent to non-leader replica @ {}:{}. Aborting 
bootstrap.", CdcrParams.CdcrAction.BOOTSTRAP, collectionName, shard);
}
  } finally {
if (locked) {
  coreState.setCdcrBootstrapRunning(false);
  recoveryLock.unlock();
}
  }
};
{code}

*bootstrapFuture.get()* throws:

{quote}
  [beaster]   2> 43072 ERROR 
(updateExecutor-39-thread-1-processing-n:127.0.0.1:41488_solr 
x:cdcr-target_shard1_replica_n1 s:shard1 c:cdcr-target r:core_node2) 
[n:127.0.0.1:41488_solr c:cdcr-target s:shard1 r:core_node2 
x:cdcr-target_shard1_replica_n1] o.a.s.h.CdcrRequestHandler Bootstrap operation 
failed
  [beaster]   2> java.util.concurrent.ExecutionException: 
java.lang.AssertionError
  [beaster]   2>at 
java.util.concurrent.FutureTask.report(FutureTask.java:122)
  [beaster]   2>at 
java.util.concurrent.FutureTask.get(FutureTask.java:192)
  [beaster]   2>at 
org.apache.solr.handler.CdcrRequestHandler.lambda$handleBootstrapAction$0(CdcrRequestHandler.java:653)
  [beaster]   2>at 
com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176)
  [beaster]   2>at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
  [beaster]   2>at 
java.util.concurrent.FutureTask.run(FutureTask.java:266)
  [beaster]   2>at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:188)
  [beaster]   2>at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
  [beaster]   2>at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
  [beaster]   2>at java.lang.Thread.run(Thread.java:748)
  [beaster]   2> Caused by: java.lang.AssertionError
  [beaster]   2>at 
org.apache.solr.handler.CdcrRequestHandler$BootstrapCallable.call(CdcrRequestHandler.java:804)
  [beaster]   2>at 
org.apache.solr.handler.CdcrRequestHandler$BootstrapCallable.call(CdcrRequestHandler.java:723)
  [beaster]   2>at 
com.codahale.metrics.InstrumentedExecutorService$InstrumentedCallable.call(InstrumentedExecutorService.java:197)
  [beaster]   2>... 5 more
{quote}

and bootstrap operation fails.

FutureTask.java ::
{code}
/**
 * Returns result or throws exception for completed task.
 * @param s completed state value
 */
@SuppressWarnings("unchecked")
private V report(int s) throws ExecutionException {
Object x = outcome;
if (s == NORMAL)
return (V)x;
if (s >= CANCELLED)
throw new CancellationException();
throw new ExecutionException((Throwable)x);
}
{code}

and the assertion failure is at BootstrapCallable call function {{finally}} 
block ::
{code}
if (closed || !success) {
  // we cannot apply the buffer in this case because it will introduce 
newer versions in the
  // 

[jira] [Comment Edited] (SOLR-11278) CdcrBootstrapTest failing in branch_6_6

2017-09-01 Thread Amrit Sarkar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150114#comment-16150114
 ] 

Amrit Sarkar edited comment on SOLR-11278 at 9/1/17 6:37 AM:
-

Rambling again:

What is the use of bootstrapFuture essentially? to get status of the current 
operation, right?

In CdcrRequestHandler.java :: there are some custom log lines, ignore them:

{code}
 Runnable runnable = () -> {
  Lock recoveryLock = req.getCore().getSolrCoreState().getRecoveryLock();
  boolean locked = recoveryLock.tryLock();
  SolrCoreState coreState = core.getSolrCoreState();
  try {
if (!locked)  {
  log.info("we reached this point :: CANCEL BOOTSTRAP, locked :: " + 
locked);
  handleCancelBootstrap(req, rsp);
} else if (leaderStateManager.amILeader())  {
  coreState.setCdcrBootstrapRunning(true);
  //running.set(true);
  String masterUrl = req.getParams().get(ReplicationHandler.MASTER_URL);
  BootstrapCallable bootstrapCallable = new 
BootstrapCallable(masterUrl, core);
  coreState.setCdcrBootstrapCallable(bootstrapCallable);
  Future bootstrapFuture = 
core.getCoreContainer().getUpdateShardHandler().getRecoveryExecutor()
  .submit(bootstrapCallable);
  try {
log.info("we reached this point :: all good, bootstrapFuture.get :: 
" + bootstrapFuture.get());
  } catch (Exception e) {
log.error("bootstrapFuture.get :: ",e);
  }
  coreState.setCdcrBootstrapFuture(bootstrapFuture);
  try {
bootstrapFuture.get();
  } catch (InterruptedException e) {
Thread.currentThread().interrupt();
log.warn("Bootstrap was interrupted", e);
  } catch (ExecutionException e) {
log.error("Bootstrap operation failed", e);
  }
} else  {
  log.error("Action {} sent to non-leader replica @ {}:{}. Aborting 
bootstrap.", CdcrParams.CdcrAction.BOOTSTRAP, collectionName, shard);
}
  } finally {
if (locked) {
  coreState.setCdcrBootstrapRunning(false);
  recoveryLock.unlock();
}
  }
};
{code}

*bootstrapFuture.get()* throws:

{quote}
  [beaster]   2> 43072 ERROR 
(updateExecutor-39-thread-1-processing-n:127.0.0.1:41488_solr 
x:cdcr-target_shard1_replica_n1 s:shard1 c:cdcr-target r:core_node2) 
[n:127.0.0.1:41488_solr c:cdcr-target s:shard1 r:core_node2 
x:cdcr-target_shard1_replica_n1] o.a.s.h.CdcrRequestHandler Bootstrap operation 
failed
  [beaster]   2> java.util.concurrent.ExecutionException: 
java.lang.AssertionError
  [beaster]   2>at 
java.util.concurrent.FutureTask.report(FutureTask.java:122)
  [beaster]   2>at 
java.util.concurrent.FutureTask.get(FutureTask.java:192)
  [beaster]   2>at 
org.apache.solr.handler.CdcrRequestHandler.lambda$handleBootstrapAction$0(CdcrRequestHandler.java:653)
  [beaster]   2>at 
com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176)
  [beaster]   2>at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
  [beaster]   2>at 
java.util.concurrent.FutureTask.run(FutureTask.java:266)
  [beaster]   2>at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:188)
  [beaster]   2>at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
  [beaster]   2>at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
  [beaster]   2>at java.lang.Thread.run(Thread.java:748)
  [beaster]   2> Caused by: java.lang.AssertionError
  [beaster]   2>at 
org.apache.solr.handler.CdcrRequestHandler$BootstrapCallable.call(CdcrRequestHandler.java:804)
  [beaster]   2>at 
org.apache.solr.handler.CdcrRequestHandler$BootstrapCallable.call(CdcrRequestHandler.java:723)
  [beaster]   2>at 
com.codahale.metrics.InstrumentedExecutorService$InstrumentedCallable.call(InstrumentedExecutorService.java:197)
  [beaster]   2>... 5 more
{quote}

and bootstrap operation fails.

FutureTask.java ::
{code}
/**
 * Returns result or throws exception for completed task.
 * @param s completed state value
 */
@SuppressWarnings("unchecked")
private V report(int s) throws ExecutionException {
Object x = outcome;
if (s == NORMAL)
return (V)x;
if (s >= CANCELLED)
throw new CancellationException();
throw new ExecutionException((Throwable)x);
}
{code}

and the assertion failure is at same function {{finally}} block ::
{code}
if (closed || !success) {
  // we cannot apply the buffer in this case because it will introduce 
newer versions in the
  // update log and then 

[jira] [Comment Edited] (SOLR-11278) CdcrBootstrapTest failing in branch_6_6

2017-09-01 Thread Amrit Sarkar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150114#comment-16150114
 ] 

Amrit Sarkar edited comment on SOLR-11278 at 9/1/17 6:37 AM:
-

Rambling again:

What is the use of bootstrapFuture essentially? to get status of the current 
operation, right?

In CdcrRequestHandler.java :: there are some custom log lines, ignore them:

{code}
 Runnable runnable = () -> {
  Lock recoveryLock = req.getCore().getSolrCoreState().getRecoveryLock();
  boolean locked = recoveryLock.tryLock();
  SolrCoreState coreState = core.getSolrCoreState();
  try {
if (!locked)  {
  log.info("we reached this point :: CANCEL BOOTSTRAP, locked :: " + 
locked);
  handleCancelBootstrap(req, rsp);
} else if (leaderStateManager.amILeader())  {
  coreState.setCdcrBootstrapRunning(true);
  //running.set(true);
  String masterUrl = req.getParams().get(ReplicationHandler.MASTER_URL);
  BootstrapCallable bootstrapCallable = new 
BootstrapCallable(masterUrl, core);
  coreState.setCdcrBootstrapCallable(bootstrapCallable);
  Future bootstrapFuture = 
core.getCoreContainer().getUpdateShardHandler().getRecoveryExecutor()
  .submit(bootstrapCallable);
  try {
log.info("we reached this point :: all good, bootstrapFuture.get :: 
" + bootstrapFuture.get());
  } catch (Exception e) {
log.error("bootstrapFuture.get :: ",e);
  }
  coreState.setCdcrBootstrapFuture(bootstrapFuture);
  try {
bootstrapFuture.get();
  } catch (InterruptedException e) {
Thread.currentThread().interrupt();
log.warn("Bootstrap was interrupted", e);
  } catch (ExecutionException e) {
log.error("Bootstrap operation failed", e);
  }
} else  {
  log.error("Action {} sent to non-leader replica @ {}:{}. Aborting 
bootstrap.", CdcrParams.CdcrAction.BOOTSTRAP, collectionName, shard);
}
  } finally {
if (locked) {
  coreState.setCdcrBootstrapRunning(false);
  recoveryLock.unlock();
}
  }
};
{code}

*bootstrapFuture.get()* throws:

{quote}
  [beaster]   2> 43072 ERROR 
(updateExecutor-39-thread-1-processing-n:127.0.0.1:41488_solr 
x:cdcr-target_shard1_replica_n1 s:shard1 c:cdcr-target r:core_node2) 
[n:127.0.0.1:41488_solr c:cdcr-target s:shard1 r:core_node2 
x:cdcr-target_shard1_replica_n1] o.a.s.h.CdcrRequestHandler Bootstrap operation 
failed
  [beaster]   2> java.util.concurrent.ExecutionException: 
java.lang.AssertionError
  [beaster]   2>at 
java.util.concurrent.FutureTask.report(FutureTask.java:122)
  [beaster]   2>at 
java.util.concurrent.FutureTask.get(FutureTask.java:192)
  [beaster]   2>at 
org.apache.solr.handler.CdcrRequestHandler.lambda$handleBootstrapAction$0(CdcrRequestHandler.java:653)
  [beaster]   2>at 
com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176)
  [beaster]   2>at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
  [beaster]   2>at 
java.util.concurrent.FutureTask.run(FutureTask.java:266)
  [beaster]   2>at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:188)
  [beaster]   2>at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
  [beaster]   2>at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
  [beaster]   2>at java.lang.Thread.run(Thread.java:748)
  [beaster]   2> Caused by: java.lang.AssertionError
  [beaster]   2>at 
org.apache.solr.handler.CdcrRequestHandler$BootstrapCallable.call(CdcrRequestHandler.java:804)
  [beaster]   2>at 
org.apache.solr.handler.CdcrRequestHandler$BootstrapCallable.call(CdcrRequestHandler.java:723)
  [beaster]   2>at 
com.codahale.metrics.InstrumentedExecutorService$InstrumentedCallable.call(InstrumentedExecutorService.java:197)
  [beaster]   2>... 5 more
{quote}

and bootstrap operation fails.

FutureTask.java ::
{code}
/**
 * Returns result or throws exception for completed task.
 * @param s completed state value
 */
@SuppressWarnings("unchecked")
private V report(int s) throws ExecutionException {
Object x = outcome;
if (s == NORMAL)
return (V)x;
if (s >= CANCELLED)
throw new CancellationException();
throw new ExecutionException((Throwable)x);
}
{code}

and the assertion failure is at same function {{finally}} block ::
{code}
if (closed || !success) {
  // we cannot apply the buffer in this case because it will introduce 
newer versions in the
  // update log and then 

[jira] [Comment Edited] (SOLR-11278) CdcrBootstrapTest failing in branch_6_6

2017-08-30 Thread Amrit Sarkar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16147149#comment-16147149
 ] 

Amrit Sarkar edited comment on SOLR-11278 at 8/30/17 12:31 PM:
---

Still rambling:

{code}
  /**
   * This test start cdcr source, adds data,starts target cluster, verifies 
replication,
   * stops cdcr replication and buffering, adds more data, re-enables cdcr and 
verify replication
   */
  public void testBootstrapWithSourceCluster() throws Exception {
// start the target first so that we know its zkhost
MiniSolrCloudCluster target = new MiniSolrCloudCluster(1, 
createTempDir("cdcr-target"), buildJettyConfig("/solr"));
try {
  target.waitForAllNodes(30);
  System.out.println("Target zkHost = " + 
target.getZkServer().getZkAddress());
  System.setProperty("cdcr.target.zkHost", 
target.getZkServer().getZkAddress());

  MiniSolrCloudCluster source = new MiniSolrCloudCluster(1, 
createTempDir("cdcr-source"), buildJettyConfig("/solr"));
  try {
source.waitForAllNodes(30);
source.uploadConfigSet(configset("cdcr-source"), "cdcr-source");

CollectionAdminRequest.createCollection("cdcr-source", "cdcr-source", 
1, 1)
.withProperty("solr.directoryFactory", 
"solr.StandardDirectoryFactory")
.process(source.getSolrClient());

CloudSolrClient sourceSolrClient = source.getSolrClient();
sourceSolrClient.setDefaultCollection("cdcr-source");
int docs = (TEST_NIGHTLY ? 100 : 10);
int numDocs = 0;
for (int k = 0; k < docs; k++) {
  UpdateRequest req = new UpdateRequest();
  for (; numDocs < (k + 1) * 100; numDocs++) {
SolrInputDocument doc = new SolrInputDocument();
doc.addField("id", "source_" + numDocs);
doc.addField("xyz", numDocs);
req.add(doc);
  }
  req.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);
  System.out.println("Adding " + docs + " docs with commit=true, 
numDocs=" + numDocs);
  req.process(sourceSolrClient);
}

QueryResponse response = sourceSolrClient.query(new SolrQuery("*:*"));
assertEquals("", numDocs, response.getResults().getNumFound());

// setup the target cluster
target.uploadConfigSet(configset("cdcr-target"), "cdcr-target");
CollectionAdminRequest.createCollection("cdcr-target", "cdcr-target", 
1, 1)
.process(target.getSolrClient());
CloudSolrClient targetSolrClient = target.getSolrClient();
targetSolrClient.setDefaultCollection("cdcr-target");

cdcrStart(targetSolrClient);
cdcrStart(sourceSolrClient);

System.out.println("bs status TX :: " + 
invokeCdcrAction(targetSolrClient, CdcrParams.CdcrAction.BOOTSTRAP_STATUS));
System.out.println("bs status SX :: " + 
invokeCdcrAction(sourceSolrClient, CdcrParams.CdcrAction.BOOTSTRAP_STATUS));

response = getCdcrQueue(sourceSolrClient);
System.out.println("Cdcr queue response: " + response.getResponse());
long foundDocs = waitForTargetToSync(numDocs, targetSolrClient);

System.out.println("bs status TY :: " + 
invokeCdcrAction(targetSolrClient, CdcrParams.CdcrAction.BOOTSTRAP_STATUS));
System.out.println("bs status SY :: " + 
invokeCdcrAction(sourceSolrClient, CdcrParams.CdcrAction.BOOTSTRAP_STATUS));

assertEquals("Document mismatch on target after sync", numDocs, 
foundDocs);

System.out.println("bs status TZ :: " + 
invokeCdcrAction(targetSolrClient, CdcrParams.CdcrAction.BOOTSTRAP_STATUS));
System.out.println("bs status SZ :: " + 
invokeCdcrAction(sourceSolrClient, CdcrParams.CdcrAction.BOOTSTRAP_STATUS));

cdcrStop(sourceSolrClient);
cdcrDisableBuffer(sourceSolrClient);

System.out.println("bs status TA :: " + 
invokeCdcrAction(targetSolrClient, CdcrParams.CdcrAction.BOOTSTRAP_STATUS));
System.out.println("bs status SA :: " + 
invokeCdcrAction(sourceSolrClient, CdcrParams.CdcrAction.BOOTSTRAP_STATUS));

int c = 0;
for (int k = 0; k < 10; k++) {
  UpdateRequest req = new UpdateRequest();
  for (; c < (k + 1) * 100; c++, numDocs++) {
SolrInputDocument doc = new SolrInputDocument();
doc.addField("id", "source_" + numDocs);
doc.addField("xyz", numDocs);
req.add(doc);
  }
  req.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);
  System.out.println("Adding 100 docs with commit=true, numDocs=" + 
numDocs);
  req.process(sourceSolrClient);
}

response = sourceSolrClient.query(new SolrQuery("*:*"));
assertEquals("", numDocs, response.getResults().getNumFound());

cdcrStart(sourceSolrClient);
cdcrEnableBuffer(sourceSolrClient);

System.out.println("bs 

[jira] [Comment Edited] (SOLR-11278) CdcrBootstrapTest failing in branch_6_6

2017-08-30 Thread Amrit Sarkar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16145553#comment-16145553
 ] 

Amrit Sarkar edited comment on SOLR-11278 at 8/30/17 12:00 PM:
---

[~varunthacker] [~erickerickson] :: This is on Solr Version: 6.3

Yes you are right. This solution is not full-proofed.

I am able to narrow down one problem, still no solution on the way:

{code}
   [junit4]   2> 93761 ERROR 
(recoveryExecutor-5-thread-1-processing-n:127.0.0.1:55637_solr 
x:cdcr-target_shard1_replica1 s:shard1 c:cdcr-target r:core_node1) 
[n:127.0.0.1:55637_solr c:cdcr-target s:shard1 r:core_node1 
x:cdcr-target_shard1_replica1] o.a.s.h.ReplicationHandler Index fetch failed 
:org.apache.solr.common.SolrException: Index fetch failed : 
   [junit4]   2>at 
org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:540)
   [junit4]   2>at 
org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:251)
   [junit4]   2>at 
org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:397)
   [junit4]   2>at 
org.apache.solr.handler.CdcrRequestHandler$BootstrapCallable.call(CdcrRequestHandler.java:758)
   [junit4]   2>at 
org.apache.solr.handler.CdcrRequestHandler$BootstrapCallable.call(CdcrRequestHandler.java:713)
   [junit4]   2>at 
java.util.concurrent.FutureTask.run(FutureTask.java:266)
   [junit4]   2>at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
   [junit4]   2>at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
   [junit4]   2>at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
   [junit4]   2>at java.lang.Thread.run(Thread.java:745)
   [junit4]   2> Caused by: java.lang.NullPointerException
   [junit4]   2>at 
org.apache.solr.handler.IndexFetcher.openNewSearcherAndUpdateCommitPoint(IndexFetcher.java:753)
   [junit4]   2>at 
org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:517)
   [junit4]   2>... 9 more
   [junit4]   2> 
{code}

The problem is with *{{target.shutdown}}*. While doing target shutdown, it 
doesn't shutdown properly.

{code}
   [junit4]   2> 67512 INFO  (coreCloseExecutor-49-thread-1) 
[n:127.0.0.1:55637_solr c:cdcr-target s:shard1 r:core_node1 
x:cdcr-target_shard1_replica1] o.a.s.h.CdcrRequestHandler Solr core is being 
closed - shutting down CDCR handler @ cdcr-target:shard1
   [junit4]   2> 67515 WARN  
(updateExecutor-4-thread-1-processing-n:127.0.0.1:55637_solr 
x:cdcr-target_shard1_replica1 s:shard1 c:cdcr-target r:core_node1) 
[n:127.0.0.1:55637_solr c:cdcr-target s:shard1 r:core_node1 
x:cdcr-target_shard1_replica1] o.a.s.h.CdcrRequestHandler Bootstrap was 
interrupted
   [junit4]   2> java.lang.InterruptedException
   [junit4]   2>at 
java.util.concurrent.FutureTask.awaitDone(FutureTask.java:404)
   [junit4]   2>at 
java.util.concurrent.FutureTask.get(FutureTask.java:191)
   [junit4]   2>at 
org.apache.solr.handler.CdcrRequestHandler.lambda$handleBootstrapAction$0(CdcrRequestHandler.java:644)
   [junit4]   2>at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
   [junit4]   2>at 
java.util.concurrent.FutureTask.run(FutureTask.java:266)
   [junit4]   2>at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
   [junit4]   2>at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
   [junit4]   2>at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
   [junit4]   2>at java.lang.Thread.run(Thread.java:745)
{code}

and the core gets reloaded and throws NPE. I am diving into this properly, but 
it have a feeling this is a machine specific issue as we didn't see much of 
these in Jenkins failures as mentioned by Varun.


was (Author: sarkaramr...@gmail.com):
[~varunthacker] [~erickerickson]

Yes you are right. This solution is not full-proofed.

I am able to narrow down one problem, still no solution on the way:

{code}
   [junit4]   2> 93761 ERROR 
(recoveryExecutor-5-thread-1-processing-n:127.0.0.1:55637_solr 
x:cdcr-target_shard1_replica1 s:shard1 c:cdcr-target r:core_node1) 
[n:127.0.0.1:55637_solr c:cdcr-target s:shard1 r:core_node1 
x:cdcr-target_shard1_replica1] o.a.s.h.ReplicationHandler Index fetch failed 
:org.apache.solr.common.SolrException: Index fetch failed : 
   [junit4]   2>at 
org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:540)
   [junit4]   2>at 
org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:251)
   [junit4]   2>at 
org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:397)
   

[jira] [Comment Edited] (SOLR-11278) CdcrBootstrapTest failing in branch_6_6

2017-08-29 Thread Amrit Sarkar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16145553#comment-16145553
 ] 

Amrit Sarkar edited comment on SOLR-11278 at 8/29/17 4:11 PM:
--

[~varunthacker] [~erickerickson]

Yes you are right. This solution is not full-proofed.

I am able to narrow down one problem, still no solution on the way:

{code}
   [junit4]   2> 93761 ERROR 
(recoveryExecutor-5-thread-1-processing-n:127.0.0.1:55637_solr 
x:cdcr-target_shard1_replica1 s:shard1 c:cdcr-target r:core_node1) 
[n:127.0.0.1:55637_solr c:cdcr-target s:shard1 r:core_node1 
x:cdcr-target_shard1_replica1] o.a.s.h.ReplicationHandler Index fetch failed 
:org.apache.solr.common.SolrException: Index fetch failed : 
   [junit4]   2>at 
org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:540)
   [junit4]   2>at 
org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:251)
   [junit4]   2>at 
org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:397)
   [junit4]   2>at 
org.apache.solr.handler.CdcrRequestHandler$BootstrapCallable.call(CdcrRequestHandler.java:758)
   [junit4]   2>at 
org.apache.solr.handler.CdcrRequestHandler$BootstrapCallable.call(CdcrRequestHandler.java:713)
   [junit4]   2>at 
java.util.concurrent.FutureTask.run(FutureTask.java:266)
   [junit4]   2>at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
   [junit4]   2>at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
   [junit4]   2>at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
   [junit4]   2>at java.lang.Thread.run(Thread.java:745)
   [junit4]   2> Caused by: java.lang.NullPointerException
   [junit4]   2>at 
org.apache.solr.handler.IndexFetcher.openNewSearcherAndUpdateCommitPoint(IndexFetcher.java:753)
   [junit4]   2>at 
org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:517)
   [junit4]   2>... 9 more
   [junit4]   2> 
{code}

The problem is with *{{target.shutdown}}*. While doing target shutdown, it 
doesn't shutdown properly.

{code}
   [junit4]   2> 67512 INFO  (coreCloseExecutor-49-thread-1) 
[n:127.0.0.1:55637_solr c:cdcr-target s:shard1 r:core_node1 
x:cdcr-target_shard1_replica1] o.a.s.h.CdcrRequestHandler Solr core is being 
closed - shutting down CDCR handler @ cdcr-target:shard1
   [junit4]   2> 67515 WARN  
(updateExecutor-4-thread-1-processing-n:127.0.0.1:55637_solr 
x:cdcr-target_shard1_replica1 s:shard1 c:cdcr-target r:core_node1) 
[n:127.0.0.1:55637_solr c:cdcr-target s:shard1 r:core_node1 
x:cdcr-target_shard1_replica1] o.a.s.h.CdcrRequestHandler Bootstrap was 
interrupted
   [junit4]   2> java.lang.InterruptedException
   [junit4]   2>at 
java.util.concurrent.FutureTask.awaitDone(FutureTask.java:404)
   [junit4]   2>at 
java.util.concurrent.FutureTask.get(FutureTask.java:191)
   [junit4]   2>at 
org.apache.solr.handler.CdcrRequestHandler.lambda$handleBootstrapAction$0(CdcrRequestHandler.java:644)
   [junit4]   2>at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
   [junit4]   2>at 
java.util.concurrent.FutureTask.run(FutureTask.java:266)
   [junit4]   2>at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
   [junit4]   2>at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
   [junit4]   2>at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
   [junit4]   2>at java.lang.Thread.run(Thread.java:745)
{code}

and the core gets reloaded and throws NPE. I am diving into this properly, but 
it have a feeling this is a machine specific issue as we didn't see much of 
these in Jenkins failures as mentioned by Varun.


was (Author: sarkaramr...@gmail.com):
[~varunthacker]

Yes you are right. This solution is not full-proofed.

I am able to narrow down one problem, still no solution on the way:

{code}
   [junit4]   2> 93761 ERROR 
(recoveryExecutor-5-thread-1-processing-n:127.0.0.1:55637_solr 
x:cdcr-target_shard1_replica1 s:shard1 c:cdcr-target r:core_node1) 
[n:127.0.0.1:55637_solr c:cdcr-target s:shard1 r:core_node1 
x:cdcr-target_shard1_replica1] o.a.s.h.ReplicationHandler Index fetch failed 
:org.apache.solr.common.SolrException: Index fetch failed : 
   [junit4]   2>at 
org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:540)
   [junit4]   2>at 
org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:251)
   [junit4]   2>at 
org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:397)
   [junit4]   2>at 

[jira] [Comment Edited] (SOLR-11278) CdcrBootstrapTest failing in branch_6_6

2017-08-29 Thread Amrit Sarkar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16145107#comment-16145107
 ] 

Amrit Sarkar edited comment on SOLR-11278 at 8/29/17 10:59 AM:
---

bq. I don't get the logic for this. Doesn't {[waitForTargetToSync}} already 
wait for 120 seconds ? Adding another 7 seconds shouldn't help much no?

This is a wrong observation. It doesn't wait 120 seconds. *It issues commit 
every second for a period of 120 seconds and checks whether the docs are same 
or not*. 

Every explicit commit will make the new Searcher to open / Solr Core is 
reloaded with new Searcher. If SolrCore is closed while bootstrapping / fetch 
index is taking place, we receive: {{InterruptEDxception for Bootstrap}}. 
Solrcore is reloaded at middle of copying, reloads fails too and SolrCore 
becomes NULL.

See:
{code}
  private long waitForTargetToSync(int numDocs, CloudSolrClient 
targetSolrClient) throws SolrServerException, IOException, InterruptedException 
{
long start = System.nanoTime();
QueryResponse response = null;
while (System.nanoTime() - start <= TimeUnit.NANOSECONDS.convert(120, 
TimeUnit.SECONDS)) {
  try {
targetSolrClient.commit();
response = targetSolrClient.query(new SolrQuery("*:*"));
if (response.getResults().getNumFound() == numDocs) {
  break;
}
  } catch (Exception e) {
log.warn("Exception trying to commit on target. This is expected and 
safe to ignore.", e);
  }
  Thread.sleep(1000);
}
return response != null ? response.getResults().getNumFound() : 0;
  }
{code}


was (Author: sarkaramr...@gmail.com):
bq. I don't get the logic for this. Doesn't {[waitForTargetToSync}} already 
wait for 120 seconds ? Adding another 7 seconds shouldn't help much no?

This is a wrong observation. It doesn't wait 120 seconds. *It issues commit 
every second for a period of 120 seconds and checks whether the docs are same 
or not*. 

Every explicit commit will make the Solr Core to reload. If SolrCore is closed 
while bootstrapping / fetch index is taking place, we receive: 
{{InterruptEDxception for Bootstrap}}. Solrcore is reloaded at middle of 
copying, reloads fails too and SolrCore becomes NULL.

See:
{code}
  private long waitForTargetToSync(int numDocs, CloudSolrClient 
targetSolrClient) throws SolrServerException, IOException, InterruptedException 
{
long start = System.nanoTime();
QueryResponse response = null;
while (System.nanoTime() - start <= TimeUnit.NANOSECONDS.convert(120, 
TimeUnit.SECONDS)) {
  try {
targetSolrClient.commit();
response = targetSolrClient.query(new SolrQuery("*:*"));
if (response.getResults().getNumFound() == numDocs) {
  break;
}
  } catch (Exception e) {
log.warn("Exception trying to commit on target. This is expected and 
safe to ignore.", e);
  }
  Thread.sleep(1000);
}
return response != null ? response.getResults().getNumFound() : 0;
  }
{code}

> CdcrBootstrapTest failing in branch_6_6
> ---
>
> Key: SOLR-11278
> URL: https://issues.apache.org/jira/browse/SOLR-11278
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: CDCR
>Reporter: Amrit Sarkar
>Assignee: Varun Thacker
> Attachments: SOLR-11278.patch, test_results
>
>
> I ran beast for 10 rounds:
> ant beast -Dtestcase=CdcrBootstrapTest -Dtests.multiplier=2 -Dtests.slow=true 
> -Dtests.locale=vi -Dtests.timezone=Asia/Yekaterinburg -Dtests.asserts=true 
> -Dtests.file.encoding=US-ASCII -Dbeast.iters=10
> and seeing following failure:
> {code}
>   [beaster] [01:37:16.282] FAILURE  153s | 
> CdcrBootstrapTest.testBootstrapWithSourceCluster <<<
>   [beaster]> Throwable #1: java.lang.AssertionError: Document mismatch on 
> target after sync expected:<2000> but was:<1000>
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-11278) CdcrBootstrapTest failing in branch_6_6

2017-08-28 Thread Amrit Sarkar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16143926#comment-16143926
 ] 

Amrit Sarkar edited comment on SOLR-11278 at 8/28/17 4:05 PM:
--

{code} 
 [beaster] Tests with failures [seed: 8D740119BA9589F1]:
  [beaster]   - 
org.apache.solr.cloud.CdcrBootstrapTest.testConvertClusterToCdcrAndBootstrap
  [beaster]   - 
org.apache.solr.cloud.CdcrBootstrapTest.testBootstrapWithSourceCluster
{code}

Safe to say, all the three tests are NOT passable at all the seeds.


was (Author: sarkaramr...@gmail.com):
{code} 
 [beaster] Tests with failures [seed: 8D740119BA9589F1]:
  [beaster]   - 
org.apache.solr.cloud.CdcrBootstrapTest.testConvertClusterToCdcrAndBootstrap
  [beaster]   - 
org.apache.solr.cloud.CdcrBootstrapTest.testBootstrapWithSourceCluster
{code}

Safe to say, all the three tests are passable at all the seeds.

> CdcrBootstrapTest failing in branch_6_6
> ---
>
> Key: SOLR-11278
> URL: https://issues.apache.org/jira/browse/SOLR-11278
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: CDCR
>Affects Versions: 6.6.1
>Reporter: Amrit Sarkar
> Attachments: test_results
>
>
> I ran beast for 10 rounds:
> ant beast -Dtestcase=CdcrBootstrapTest -Dtests.multiplier=2 -Dtests.slow=true 
> -Dtests.locale=vi -Dtests.timezone=Asia/Yekaterinburg -Dtests.asserts=true 
> -Dtests.file.encoding=US-ASCII -Dbeast.iters=10
> and seeing following failure:
> {code}
>   [beaster] [01:37:16.282] FAILURE  153s | 
> CdcrBootstrapTest.testBootstrapWithSourceCluster <<<
>   [beaster]> Throwable #1: java.lang.AssertionError: Document mismatch on 
> target after sync expected:<2000> but was:<1000>
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org