[jira] [Comment Edited] (SOLR-11278) CdcrBootstrapTest failing in branch_6_6
[ https://issues.apache.org/jira/browse/SOLR-11278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150114#comment-16150114 ] Amrit Sarkar edited comment on SOLR-11278 at 9/1/17 6:41 AM: - Rambling again: What is the use of bootstrapFuture essentially? to get status of the current operation, right? In CdcrRequestHandler.java :: there are some custom log lines, ignore them: {code} Runnable runnable = () -> { Lock recoveryLock = req.getCore().getSolrCoreState().getRecoveryLock(); boolean locked = recoveryLock.tryLock(); SolrCoreState coreState = core.getSolrCoreState(); try { if (!locked) { log.info("we reached this point :: CANCEL BOOTSTRAP, locked :: " + locked); handleCancelBootstrap(req, rsp); } else if (leaderStateManager.amILeader()) { coreState.setCdcrBootstrapRunning(true); //running.set(true); String masterUrl = req.getParams().get(ReplicationHandler.MASTER_URL); BootstrapCallable bootstrapCallable = new BootstrapCallable(masterUrl, core); coreState.setCdcrBootstrapCallable(bootstrapCallable); Future bootstrapFuture = core.getCoreContainer().getUpdateShardHandler().getRecoveryExecutor() .submit(bootstrapCallable); try { log.info("we reached this point :: all good, bootstrapFuture.get :: " + bootstrapFuture.get()); } catch (Exception e) { log.error("bootstrapFuture.get :: ",e); } coreState.setCdcrBootstrapFuture(bootstrapFuture); try { bootstrapFuture.get(); } catch (InterruptedException e) { Thread.currentThread().interrupt(); log.warn("Bootstrap was interrupted", e); } catch (ExecutionException e) { log.error("Bootstrap operation failed", e); } } else { log.error("Action {} sent to non-leader replica @ {}:{}. Aborting bootstrap.", CdcrParams.CdcrAction.BOOTSTRAP, collectionName, shard); } } finally { if (locked) { coreState.setCdcrBootstrapRunning(false); recoveryLock.unlock(); } } }; {code} *bootstrapFuture.get()* throws: {quote} [beaster] 2> 43072 ERROR (updateExecutor-39-thread-1-processing-n:127.0.0.1:41488_solr x:cdcr-target_shard1_replica_n1 s:shard1 c:cdcr-target r:core_node2) [n:127.0.0.1:41488_solr c:cdcr-target s:shard1 r:core_node2 x:cdcr-target_shard1_replica_n1] o.a.s.h.CdcrRequestHandler Bootstrap operation failed [beaster] 2> java.util.concurrent.ExecutionException: java.lang.AssertionError [beaster] 2>at java.util.concurrent.FutureTask.report(FutureTask.java:122) [beaster] 2>at java.util.concurrent.FutureTask.get(FutureTask.java:192) [beaster] 2>at org.apache.solr.handler.CdcrRequestHandler.lambda$handleBootstrapAction$0(CdcrRequestHandler.java:653) [beaster] 2>at com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176) [beaster] 2>at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [beaster] 2>at java.util.concurrent.FutureTask.run(FutureTask.java:266) [beaster] 2>at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:188) [beaster] 2>at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [beaster] 2>at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [beaster] 2>at java.lang.Thread.run(Thread.java:748) [beaster] 2> Caused by: java.lang.AssertionError [beaster] 2>at org.apache.solr.handler.CdcrRequestHandler$BootstrapCallable.call(CdcrRequestHandler.java:804) [beaster] 2>at org.apache.solr.handler.CdcrRequestHandler$BootstrapCallable.call(CdcrRequestHandler.java:723) [beaster] 2>at com.codahale.metrics.InstrumentedExecutorService$InstrumentedCallable.call(InstrumentedExecutorService.java:197) [beaster] 2>... 5 more {quote} and bootstrap operation fails. FutureTask.java :: {code} /** * Returns result or throws exception for completed task. * @param s completed state value */ @SuppressWarnings("unchecked") private V report(int s) throws ExecutionException { Object x = outcome; if (s == NORMAL) return (V)x; if (s >= CANCELLED) throw new CancellationException(); throw new ExecutionException((Throwable)x); } {code} and the assertion failure is at BootstrapCallable call function {{finally}} block :: {code} if (closed || !success) { // we cannot apply the buffer in this case because it will introduce newer versions in the //
[jira] [Comment Edited] (SOLR-11278) CdcrBootstrapTest failing in branch_6_6
[ https://issues.apache.org/jira/browse/SOLR-11278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150114#comment-16150114 ] Amrit Sarkar edited comment on SOLR-11278 at 9/1/17 6:37 AM: - Rambling again: What is the use of bootstrapFuture essentially? to get status of the current operation, right? In CdcrRequestHandler.java :: there are some custom log lines, ignore them: {code} Runnable runnable = () -> { Lock recoveryLock = req.getCore().getSolrCoreState().getRecoveryLock(); boolean locked = recoveryLock.tryLock(); SolrCoreState coreState = core.getSolrCoreState(); try { if (!locked) { log.info("we reached this point :: CANCEL BOOTSTRAP, locked :: " + locked); handleCancelBootstrap(req, rsp); } else if (leaderStateManager.amILeader()) { coreState.setCdcrBootstrapRunning(true); //running.set(true); String masterUrl = req.getParams().get(ReplicationHandler.MASTER_URL); BootstrapCallable bootstrapCallable = new BootstrapCallable(masterUrl, core); coreState.setCdcrBootstrapCallable(bootstrapCallable); Future bootstrapFuture = core.getCoreContainer().getUpdateShardHandler().getRecoveryExecutor() .submit(bootstrapCallable); try { log.info("we reached this point :: all good, bootstrapFuture.get :: " + bootstrapFuture.get()); } catch (Exception e) { log.error("bootstrapFuture.get :: ",e); } coreState.setCdcrBootstrapFuture(bootstrapFuture); try { bootstrapFuture.get(); } catch (InterruptedException e) { Thread.currentThread().interrupt(); log.warn("Bootstrap was interrupted", e); } catch (ExecutionException e) { log.error("Bootstrap operation failed", e); } } else { log.error("Action {} sent to non-leader replica @ {}:{}. Aborting bootstrap.", CdcrParams.CdcrAction.BOOTSTRAP, collectionName, shard); } } finally { if (locked) { coreState.setCdcrBootstrapRunning(false); recoveryLock.unlock(); } } }; {code} *bootstrapFuture.get()* throws: {quote} [beaster] 2> 43072 ERROR (updateExecutor-39-thread-1-processing-n:127.0.0.1:41488_solr x:cdcr-target_shard1_replica_n1 s:shard1 c:cdcr-target r:core_node2) [n:127.0.0.1:41488_solr c:cdcr-target s:shard1 r:core_node2 x:cdcr-target_shard1_replica_n1] o.a.s.h.CdcrRequestHandler Bootstrap operation failed [beaster] 2> java.util.concurrent.ExecutionException: java.lang.AssertionError [beaster] 2>at java.util.concurrent.FutureTask.report(FutureTask.java:122) [beaster] 2>at java.util.concurrent.FutureTask.get(FutureTask.java:192) [beaster] 2>at org.apache.solr.handler.CdcrRequestHandler.lambda$handleBootstrapAction$0(CdcrRequestHandler.java:653) [beaster] 2>at com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176) [beaster] 2>at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [beaster] 2>at java.util.concurrent.FutureTask.run(FutureTask.java:266) [beaster] 2>at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:188) [beaster] 2>at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [beaster] 2>at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [beaster] 2>at java.lang.Thread.run(Thread.java:748) [beaster] 2> Caused by: java.lang.AssertionError [beaster] 2>at org.apache.solr.handler.CdcrRequestHandler$BootstrapCallable.call(CdcrRequestHandler.java:804) [beaster] 2>at org.apache.solr.handler.CdcrRequestHandler$BootstrapCallable.call(CdcrRequestHandler.java:723) [beaster] 2>at com.codahale.metrics.InstrumentedExecutorService$InstrumentedCallable.call(InstrumentedExecutorService.java:197) [beaster] 2>... 5 more {quote} and bootstrap operation fails. FutureTask.java :: {code} /** * Returns result or throws exception for completed task. * @param s completed state value */ @SuppressWarnings("unchecked") private V report(int s) throws ExecutionException { Object x = outcome; if (s == NORMAL) return (V)x; if (s >= CANCELLED) throw new CancellationException(); throw new ExecutionException((Throwable)x); } {code} and the assertion failure is at same function {{finally}} block :: {code} if (closed || !success) { // we cannot apply the buffer in this case because it will introduce newer versions in the // update log and then
[jira] [Comment Edited] (SOLR-11278) CdcrBootstrapTest failing in branch_6_6
[ https://issues.apache.org/jira/browse/SOLR-11278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150114#comment-16150114 ] Amrit Sarkar edited comment on SOLR-11278 at 9/1/17 6:37 AM: - Rambling again: What is the use of bootstrapFuture essentially? to get status of the current operation, right? In CdcrRequestHandler.java :: there are some custom log lines, ignore them: {code} Runnable runnable = () -> { Lock recoveryLock = req.getCore().getSolrCoreState().getRecoveryLock(); boolean locked = recoveryLock.tryLock(); SolrCoreState coreState = core.getSolrCoreState(); try { if (!locked) { log.info("we reached this point :: CANCEL BOOTSTRAP, locked :: " + locked); handleCancelBootstrap(req, rsp); } else if (leaderStateManager.amILeader()) { coreState.setCdcrBootstrapRunning(true); //running.set(true); String masterUrl = req.getParams().get(ReplicationHandler.MASTER_URL); BootstrapCallable bootstrapCallable = new BootstrapCallable(masterUrl, core); coreState.setCdcrBootstrapCallable(bootstrapCallable); Future bootstrapFuture = core.getCoreContainer().getUpdateShardHandler().getRecoveryExecutor() .submit(bootstrapCallable); try { log.info("we reached this point :: all good, bootstrapFuture.get :: " + bootstrapFuture.get()); } catch (Exception e) { log.error("bootstrapFuture.get :: ",e); } coreState.setCdcrBootstrapFuture(bootstrapFuture); try { bootstrapFuture.get(); } catch (InterruptedException e) { Thread.currentThread().interrupt(); log.warn("Bootstrap was interrupted", e); } catch (ExecutionException e) { log.error("Bootstrap operation failed", e); } } else { log.error("Action {} sent to non-leader replica @ {}:{}. Aborting bootstrap.", CdcrParams.CdcrAction.BOOTSTRAP, collectionName, shard); } } finally { if (locked) { coreState.setCdcrBootstrapRunning(false); recoveryLock.unlock(); } } }; {code} *bootstrapFuture.get()* throws: {quote} [beaster] 2> 43072 ERROR (updateExecutor-39-thread-1-processing-n:127.0.0.1:41488_solr x:cdcr-target_shard1_replica_n1 s:shard1 c:cdcr-target r:core_node2) [n:127.0.0.1:41488_solr c:cdcr-target s:shard1 r:core_node2 x:cdcr-target_shard1_replica_n1] o.a.s.h.CdcrRequestHandler Bootstrap operation failed [beaster] 2> java.util.concurrent.ExecutionException: java.lang.AssertionError [beaster] 2>at java.util.concurrent.FutureTask.report(FutureTask.java:122) [beaster] 2>at java.util.concurrent.FutureTask.get(FutureTask.java:192) [beaster] 2>at org.apache.solr.handler.CdcrRequestHandler.lambda$handleBootstrapAction$0(CdcrRequestHandler.java:653) [beaster] 2>at com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176) [beaster] 2>at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [beaster] 2>at java.util.concurrent.FutureTask.run(FutureTask.java:266) [beaster] 2>at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:188) [beaster] 2>at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [beaster] 2>at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [beaster] 2>at java.lang.Thread.run(Thread.java:748) [beaster] 2> Caused by: java.lang.AssertionError [beaster] 2>at org.apache.solr.handler.CdcrRequestHandler$BootstrapCallable.call(CdcrRequestHandler.java:804) [beaster] 2>at org.apache.solr.handler.CdcrRequestHandler$BootstrapCallable.call(CdcrRequestHandler.java:723) [beaster] 2>at com.codahale.metrics.InstrumentedExecutorService$InstrumentedCallable.call(InstrumentedExecutorService.java:197) [beaster] 2>... 5 more {quote} and bootstrap operation fails. FutureTask.java :: {code} /** * Returns result or throws exception for completed task. * @param s completed state value */ @SuppressWarnings("unchecked") private V report(int s) throws ExecutionException { Object x = outcome; if (s == NORMAL) return (V)x; if (s >= CANCELLED) throw new CancellationException(); throw new ExecutionException((Throwable)x); } {code} and the assertion failure is at same function {{finally}} block :: {code} if (closed || !success) { // we cannot apply the buffer in this case because it will introduce newer versions in the // update log and then
[jira] [Comment Edited] (SOLR-11278) CdcrBootstrapTest failing in branch_6_6
[ https://issues.apache.org/jira/browse/SOLR-11278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16147149#comment-16147149 ] Amrit Sarkar edited comment on SOLR-11278 at 8/30/17 12:31 PM: --- Still rambling: {code} /** * This test start cdcr source, adds data,starts target cluster, verifies replication, * stops cdcr replication and buffering, adds more data, re-enables cdcr and verify replication */ public void testBootstrapWithSourceCluster() throws Exception { // start the target first so that we know its zkhost MiniSolrCloudCluster target = new MiniSolrCloudCluster(1, createTempDir("cdcr-target"), buildJettyConfig("/solr")); try { target.waitForAllNodes(30); System.out.println("Target zkHost = " + target.getZkServer().getZkAddress()); System.setProperty("cdcr.target.zkHost", target.getZkServer().getZkAddress()); MiniSolrCloudCluster source = new MiniSolrCloudCluster(1, createTempDir("cdcr-source"), buildJettyConfig("/solr")); try { source.waitForAllNodes(30); source.uploadConfigSet(configset("cdcr-source"), "cdcr-source"); CollectionAdminRequest.createCollection("cdcr-source", "cdcr-source", 1, 1) .withProperty("solr.directoryFactory", "solr.StandardDirectoryFactory") .process(source.getSolrClient()); CloudSolrClient sourceSolrClient = source.getSolrClient(); sourceSolrClient.setDefaultCollection("cdcr-source"); int docs = (TEST_NIGHTLY ? 100 : 10); int numDocs = 0; for (int k = 0; k < docs; k++) { UpdateRequest req = new UpdateRequest(); for (; numDocs < (k + 1) * 100; numDocs++) { SolrInputDocument doc = new SolrInputDocument(); doc.addField("id", "source_" + numDocs); doc.addField("xyz", numDocs); req.add(doc); } req.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true); System.out.println("Adding " + docs + " docs with commit=true, numDocs=" + numDocs); req.process(sourceSolrClient); } QueryResponse response = sourceSolrClient.query(new SolrQuery("*:*")); assertEquals("", numDocs, response.getResults().getNumFound()); // setup the target cluster target.uploadConfigSet(configset("cdcr-target"), "cdcr-target"); CollectionAdminRequest.createCollection("cdcr-target", "cdcr-target", 1, 1) .process(target.getSolrClient()); CloudSolrClient targetSolrClient = target.getSolrClient(); targetSolrClient.setDefaultCollection("cdcr-target"); cdcrStart(targetSolrClient); cdcrStart(sourceSolrClient); System.out.println("bs status TX :: " + invokeCdcrAction(targetSolrClient, CdcrParams.CdcrAction.BOOTSTRAP_STATUS)); System.out.println("bs status SX :: " + invokeCdcrAction(sourceSolrClient, CdcrParams.CdcrAction.BOOTSTRAP_STATUS)); response = getCdcrQueue(sourceSolrClient); System.out.println("Cdcr queue response: " + response.getResponse()); long foundDocs = waitForTargetToSync(numDocs, targetSolrClient); System.out.println("bs status TY :: " + invokeCdcrAction(targetSolrClient, CdcrParams.CdcrAction.BOOTSTRAP_STATUS)); System.out.println("bs status SY :: " + invokeCdcrAction(sourceSolrClient, CdcrParams.CdcrAction.BOOTSTRAP_STATUS)); assertEquals("Document mismatch on target after sync", numDocs, foundDocs); System.out.println("bs status TZ :: " + invokeCdcrAction(targetSolrClient, CdcrParams.CdcrAction.BOOTSTRAP_STATUS)); System.out.println("bs status SZ :: " + invokeCdcrAction(sourceSolrClient, CdcrParams.CdcrAction.BOOTSTRAP_STATUS)); cdcrStop(sourceSolrClient); cdcrDisableBuffer(sourceSolrClient); System.out.println("bs status TA :: " + invokeCdcrAction(targetSolrClient, CdcrParams.CdcrAction.BOOTSTRAP_STATUS)); System.out.println("bs status SA :: " + invokeCdcrAction(sourceSolrClient, CdcrParams.CdcrAction.BOOTSTRAP_STATUS)); int c = 0; for (int k = 0; k < 10; k++) { UpdateRequest req = new UpdateRequest(); for (; c < (k + 1) * 100; c++, numDocs++) { SolrInputDocument doc = new SolrInputDocument(); doc.addField("id", "source_" + numDocs); doc.addField("xyz", numDocs); req.add(doc); } req.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true); System.out.println("Adding 100 docs with commit=true, numDocs=" + numDocs); req.process(sourceSolrClient); } response = sourceSolrClient.query(new SolrQuery("*:*")); assertEquals("", numDocs, response.getResults().getNumFound()); cdcrStart(sourceSolrClient); cdcrEnableBuffer(sourceSolrClient); System.out.println("bs
[jira] [Comment Edited] (SOLR-11278) CdcrBootstrapTest failing in branch_6_6
[ https://issues.apache.org/jira/browse/SOLR-11278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16145553#comment-16145553 ] Amrit Sarkar edited comment on SOLR-11278 at 8/30/17 12:00 PM: --- [~varunthacker] [~erickerickson] :: This is on Solr Version: 6.3 Yes you are right. This solution is not full-proofed. I am able to narrow down one problem, still no solution on the way: {code} [junit4] 2> 93761 ERROR (recoveryExecutor-5-thread-1-processing-n:127.0.0.1:55637_solr x:cdcr-target_shard1_replica1 s:shard1 c:cdcr-target r:core_node1) [n:127.0.0.1:55637_solr c:cdcr-target s:shard1 r:core_node1 x:cdcr-target_shard1_replica1] o.a.s.h.ReplicationHandler Index fetch failed :org.apache.solr.common.SolrException: Index fetch failed : [junit4] 2>at org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:540) [junit4] 2>at org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:251) [junit4] 2>at org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:397) [junit4] 2>at org.apache.solr.handler.CdcrRequestHandler$BootstrapCallable.call(CdcrRequestHandler.java:758) [junit4] 2>at org.apache.solr.handler.CdcrRequestHandler$BootstrapCallable.call(CdcrRequestHandler.java:713) [junit4] 2>at java.util.concurrent.FutureTask.run(FutureTask.java:266) [junit4] 2>at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229) [junit4] 2>at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [junit4] 2>at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [junit4] 2>at java.lang.Thread.run(Thread.java:745) [junit4] 2> Caused by: java.lang.NullPointerException [junit4] 2>at org.apache.solr.handler.IndexFetcher.openNewSearcherAndUpdateCommitPoint(IndexFetcher.java:753) [junit4] 2>at org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:517) [junit4] 2>... 9 more [junit4] 2> {code} The problem is with *{{target.shutdown}}*. While doing target shutdown, it doesn't shutdown properly. {code} [junit4] 2> 67512 INFO (coreCloseExecutor-49-thread-1) [n:127.0.0.1:55637_solr c:cdcr-target s:shard1 r:core_node1 x:cdcr-target_shard1_replica1] o.a.s.h.CdcrRequestHandler Solr core is being closed - shutting down CDCR handler @ cdcr-target:shard1 [junit4] 2> 67515 WARN (updateExecutor-4-thread-1-processing-n:127.0.0.1:55637_solr x:cdcr-target_shard1_replica1 s:shard1 c:cdcr-target r:core_node1) [n:127.0.0.1:55637_solr c:cdcr-target s:shard1 r:core_node1 x:cdcr-target_shard1_replica1] o.a.s.h.CdcrRequestHandler Bootstrap was interrupted [junit4] 2> java.lang.InterruptedException [junit4] 2>at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:404) [junit4] 2>at java.util.concurrent.FutureTask.get(FutureTask.java:191) [junit4] 2>at org.apache.solr.handler.CdcrRequestHandler.lambda$handleBootstrapAction$0(CdcrRequestHandler.java:644) [junit4] 2>at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [junit4] 2>at java.util.concurrent.FutureTask.run(FutureTask.java:266) [junit4] 2>at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229) [junit4] 2>at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [junit4] 2>at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [junit4] 2>at java.lang.Thread.run(Thread.java:745) {code} and the core gets reloaded and throws NPE. I am diving into this properly, but it have a feeling this is a machine specific issue as we didn't see much of these in Jenkins failures as mentioned by Varun. was (Author: sarkaramr...@gmail.com): [~varunthacker] [~erickerickson] Yes you are right. This solution is not full-proofed. I am able to narrow down one problem, still no solution on the way: {code} [junit4] 2> 93761 ERROR (recoveryExecutor-5-thread-1-processing-n:127.0.0.1:55637_solr x:cdcr-target_shard1_replica1 s:shard1 c:cdcr-target r:core_node1) [n:127.0.0.1:55637_solr c:cdcr-target s:shard1 r:core_node1 x:cdcr-target_shard1_replica1] o.a.s.h.ReplicationHandler Index fetch failed :org.apache.solr.common.SolrException: Index fetch failed : [junit4] 2>at org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:540) [junit4] 2>at org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:251) [junit4] 2>at org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:397)
[jira] [Comment Edited] (SOLR-11278) CdcrBootstrapTest failing in branch_6_6
[ https://issues.apache.org/jira/browse/SOLR-11278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16145553#comment-16145553 ] Amrit Sarkar edited comment on SOLR-11278 at 8/29/17 4:11 PM: -- [~varunthacker] [~erickerickson] Yes you are right. This solution is not full-proofed. I am able to narrow down one problem, still no solution on the way: {code} [junit4] 2> 93761 ERROR (recoveryExecutor-5-thread-1-processing-n:127.0.0.1:55637_solr x:cdcr-target_shard1_replica1 s:shard1 c:cdcr-target r:core_node1) [n:127.0.0.1:55637_solr c:cdcr-target s:shard1 r:core_node1 x:cdcr-target_shard1_replica1] o.a.s.h.ReplicationHandler Index fetch failed :org.apache.solr.common.SolrException: Index fetch failed : [junit4] 2>at org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:540) [junit4] 2>at org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:251) [junit4] 2>at org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:397) [junit4] 2>at org.apache.solr.handler.CdcrRequestHandler$BootstrapCallable.call(CdcrRequestHandler.java:758) [junit4] 2>at org.apache.solr.handler.CdcrRequestHandler$BootstrapCallable.call(CdcrRequestHandler.java:713) [junit4] 2>at java.util.concurrent.FutureTask.run(FutureTask.java:266) [junit4] 2>at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229) [junit4] 2>at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [junit4] 2>at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [junit4] 2>at java.lang.Thread.run(Thread.java:745) [junit4] 2> Caused by: java.lang.NullPointerException [junit4] 2>at org.apache.solr.handler.IndexFetcher.openNewSearcherAndUpdateCommitPoint(IndexFetcher.java:753) [junit4] 2>at org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:517) [junit4] 2>... 9 more [junit4] 2> {code} The problem is with *{{target.shutdown}}*. While doing target shutdown, it doesn't shutdown properly. {code} [junit4] 2> 67512 INFO (coreCloseExecutor-49-thread-1) [n:127.0.0.1:55637_solr c:cdcr-target s:shard1 r:core_node1 x:cdcr-target_shard1_replica1] o.a.s.h.CdcrRequestHandler Solr core is being closed - shutting down CDCR handler @ cdcr-target:shard1 [junit4] 2> 67515 WARN (updateExecutor-4-thread-1-processing-n:127.0.0.1:55637_solr x:cdcr-target_shard1_replica1 s:shard1 c:cdcr-target r:core_node1) [n:127.0.0.1:55637_solr c:cdcr-target s:shard1 r:core_node1 x:cdcr-target_shard1_replica1] o.a.s.h.CdcrRequestHandler Bootstrap was interrupted [junit4] 2> java.lang.InterruptedException [junit4] 2>at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:404) [junit4] 2>at java.util.concurrent.FutureTask.get(FutureTask.java:191) [junit4] 2>at org.apache.solr.handler.CdcrRequestHandler.lambda$handleBootstrapAction$0(CdcrRequestHandler.java:644) [junit4] 2>at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [junit4] 2>at java.util.concurrent.FutureTask.run(FutureTask.java:266) [junit4] 2>at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229) [junit4] 2>at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [junit4] 2>at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [junit4] 2>at java.lang.Thread.run(Thread.java:745) {code} and the core gets reloaded and throws NPE. I am diving into this properly, but it have a feeling this is a machine specific issue as we didn't see much of these in Jenkins failures as mentioned by Varun. was (Author: sarkaramr...@gmail.com): [~varunthacker] Yes you are right. This solution is not full-proofed. I am able to narrow down one problem, still no solution on the way: {code} [junit4] 2> 93761 ERROR (recoveryExecutor-5-thread-1-processing-n:127.0.0.1:55637_solr x:cdcr-target_shard1_replica1 s:shard1 c:cdcr-target r:core_node1) [n:127.0.0.1:55637_solr c:cdcr-target s:shard1 r:core_node1 x:cdcr-target_shard1_replica1] o.a.s.h.ReplicationHandler Index fetch failed :org.apache.solr.common.SolrException: Index fetch failed : [junit4] 2>at org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:540) [junit4] 2>at org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:251) [junit4] 2>at org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:397) [junit4] 2>at
[jira] [Comment Edited] (SOLR-11278) CdcrBootstrapTest failing in branch_6_6
[ https://issues.apache.org/jira/browse/SOLR-11278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16145107#comment-16145107 ] Amrit Sarkar edited comment on SOLR-11278 at 8/29/17 10:59 AM: --- bq. I don't get the logic for this. Doesn't {[waitForTargetToSync}} already wait for 120 seconds ? Adding another 7 seconds shouldn't help much no? This is a wrong observation. It doesn't wait 120 seconds. *It issues commit every second for a period of 120 seconds and checks whether the docs are same or not*. Every explicit commit will make the new Searcher to open / Solr Core is reloaded with new Searcher. If SolrCore is closed while bootstrapping / fetch index is taking place, we receive: {{InterruptEDxception for Bootstrap}}. Solrcore is reloaded at middle of copying, reloads fails too and SolrCore becomes NULL. See: {code} private long waitForTargetToSync(int numDocs, CloudSolrClient targetSolrClient) throws SolrServerException, IOException, InterruptedException { long start = System.nanoTime(); QueryResponse response = null; while (System.nanoTime() - start <= TimeUnit.NANOSECONDS.convert(120, TimeUnit.SECONDS)) { try { targetSolrClient.commit(); response = targetSolrClient.query(new SolrQuery("*:*")); if (response.getResults().getNumFound() == numDocs) { break; } } catch (Exception e) { log.warn("Exception trying to commit on target. This is expected and safe to ignore.", e); } Thread.sleep(1000); } return response != null ? response.getResults().getNumFound() : 0; } {code} was (Author: sarkaramr...@gmail.com): bq. I don't get the logic for this. Doesn't {[waitForTargetToSync}} already wait for 120 seconds ? Adding another 7 seconds shouldn't help much no? This is a wrong observation. It doesn't wait 120 seconds. *It issues commit every second for a period of 120 seconds and checks whether the docs are same or not*. Every explicit commit will make the Solr Core to reload. If SolrCore is closed while bootstrapping / fetch index is taking place, we receive: {{InterruptEDxception for Bootstrap}}. Solrcore is reloaded at middle of copying, reloads fails too and SolrCore becomes NULL. See: {code} private long waitForTargetToSync(int numDocs, CloudSolrClient targetSolrClient) throws SolrServerException, IOException, InterruptedException { long start = System.nanoTime(); QueryResponse response = null; while (System.nanoTime() - start <= TimeUnit.NANOSECONDS.convert(120, TimeUnit.SECONDS)) { try { targetSolrClient.commit(); response = targetSolrClient.query(new SolrQuery("*:*")); if (response.getResults().getNumFound() == numDocs) { break; } } catch (Exception e) { log.warn("Exception trying to commit on target. This is expected and safe to ignore.", e); } Thread.sleep(1000); } return response != null ? response.getResults().getNumFound() : 0; } {code} > CdcrBootstrapTest failing in branch_6_6 > --- > > Key: SOLR-11278 > URL: https://issues.apache.org/jira/browse/SOLR-11278 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: CDCR >Reporter: Amrit Sarkar >Assignee: Varun Thacker > Attachments: SOLR-11278.patch, test_results > > > I ran beast for 10 rounds: > ant beast -Dtestcase=CdcrBootstrapTest -Dtests.multiplier=2 -Dtests.slow=true > -Dtests.locale=vi -Dtests.timezone=Asia/Yekaterinburg -Dtests.asserts=true > -Dtests.file.encoding=US-ASCII -Dbeast.iters=10 > and seeing following failure: > {code} > [beaster] [01:37:16.282] FAILURE 153s | > CdcrBootstrapTest.testBootstrapWithSourceCluster <<< > [beaster]> Throwable #1: java.lang.AssertionError: Document mismatch on > target after sync expected:<2000> but was:<1000> > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-11278) CdcrBootstrapTest failing in branch_6_6
[ https://issues.apache.org/jira/browse/SOLR-11278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16143926#comment-16143926 ] Amrit Sarkar edited comment on SOLR-11278 at 8/28/17 4:05 PM: -- {code} [beaster] Tests with failures [seed: 8D740119BA9589F1]: [beaster] - org.apache.solr.cloud.CdcrBootstrapTest.testConvertClusterToCdcrAndBootstrap [beaster] - org.apache.solr.cloud.CdcrBootstrapTest.testBootstrapWithSourceCluster {code} Safe to say, all the three tests are NOT passable at all the seeds. was (Author: sarkaramr...@gmail.com): {code} [beaster] Tests with failures [seed: 8D740119BA9589F1]: [beaster] - org.apache.solr.cloud.CdcrBootstrapTest.testConvertClusterToCdcrAndBootstrap [beaster] - org.apache.solr.cloud.CdcrBootstrapTest.testBootstrapWithSourceCluster {code} Safe to say, all the three tests are passable at all the seeds. > CdcrBootstrapTest failing in branch_6_6 > --- > > Key: SOLR-11278 > URL: https://issues.apache.org/jira/browse/SOLR-11278 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: CDCR >Affects Versions: 6.6.1 >Reporter: Amrit Sarkar > Attachments: test_results > > > I ran beast for 10 rounds: > ant beast -Dtestcase=CdcrBootstrapTest -Dtests.multiplier=2 -Dtests.slow=true > -Dtests.locale=vi -Dtests.timezone=Asia/Yekaterinburg -Dtests.asserts=true > -Dtests.file.encoding=US-ASCII -Dbeast.iters=10 > and seeing following failure: > {code} > [beaster] [01:37:16.282] FAILURE 153s | > CdcrBootstrapTest.testBootstrapWithSourceCluster <<< > [beaster]> Throwable #1: java.lang.AssertionError: Document mismatch on > target after sync expected:<2000> but was:<1000> > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org