[jira] [Comment Edited] (SOLR-11278) CdcrBootstrapTest failing intermittently

2017-09-20 Thread Amrit Sarkar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16174015#comment-16174015
 ] 

Amrit Sarkar edited comment on SOLR-11278 at 9/20/17 11:49 PM:
---

Uploading another patch which makes sure if we request bootstrap status after 
submitting a bootstrap, we get the correct status: RUNNING. Used CountDownLatch 
internally in the function.


was (Author: sarkaramr...@gmail.com):
Uploading another patch which blocks any other bootstrap call b/w one is 
submitted and executed. Used CountDownLatch internally in the function.

> CdcrBootstrapTest failing intermittently
> 
>
> Key: SOLR-11278
> URL: https://issues.apache.org/jira/browse/SOLR-11278
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: CDCR
>Affects Versions: 7.0, 6.6.1
>Reporter: Amrit Sarkar
>Assignee: Varun Thacker
>Priority: Critical
>  Labels: test
> Attachments: master-bs.patch, SOLR-11278-awaits-fix.patch, 
> SOLR-11278-cancel-bootstrap-on-stop.patch, SOLR-11278.patch, 
> SOLR-11278.patch, SOLR-11278.patch, test_results
>
>
> {{CdcrBootstrapTest}} is failing while running beasts for significant 
> iterations.
> The bootstrapping is failing in the test, after the first batch is indexed 
> for each {{testmethod}}, which results in documents mismatch ::
> {code}
>   [beaster]   2> 39167 ERROR 
> (updateExecutor-39-thread-1-processing-n:127.0.0.1:42155_solr 
> x:cdcr-target_shard1_replica_n1 s:shard1 c:cdcr-target r:core_node2) 
> [n:127.0.0.1:42155_solr c:cdcr-target s:shard1 r:core_node2 
> x:cdcr-target_shard1_replica_n1] o.a.s.h.CdcrRequestHandler Bootstrap 
> operation failed
>   [beaster]   2> java.util.concurrent.ExecutionException: 
> java.lang.AssertionError
>   [beaster]   2>  at 
> java.util.concurrent.FutureTask.report(FutureTask.java:122)
>   [beaster]   2>  at 
> java.util.concurrent.FutureTask.get(FutureTask.java:192)
>   [beaster]   2>  at 
> org.apache.solr.handler.CdcrRequestHandler.lambda$handleBootstrapAction$0(CdcrRequestHandler.java:654)
>   [beaster]   2>  at 
> com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176)
>   [beaster]   2>  at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   [beaster]   2>  at 
> java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   [beaster]   2>  at 
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:188)
>   [beaster]   2>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   [beaster]   2>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   [beaster]   2>  at java.lang.Thread.run(Thread.java:748)
>   [beaster]   2> Caused by: java.lang.AssertionError
>   [beaster]   2>  at 
> org.apache.solr.handler.CdcrRequestHandler$BootstrapCallable.call(CdcrRequestHandler.java:813)
>   [beaster]   2>  at 
> org.apache.solr.handler.CdcrRequestHandler$BootstrapCallable.call(CdcrRequestHandler.java:724)
>   [beaster]   2>  at 
> com.codahale.metrics.InstrumentedExecutorService$InstrumentedCallable.call(InstrumentedExecutorService.java:197)
>   [beaster]   2>  ... 5 more
> {code}
> {code}
>   [beaster] [01:37:16.282] FAILURE  153s | 
> CdcrBootstrapTest.testBootstrapWithSourceCluster <<<
>   [beaster]> Throwable #1: java.lang.AssertionError: Document mismatch on 
> target after sync expected:<2000> but was:<1000>
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-11278) CdcrBootstrapTest failing intermittently

2017-09-18 Thread Amrit Sarkar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16170958#comment-16170958
 ] 

Amrit Sarkar edited comment on SOLR-11278 at 9/19/17 12:58 AM:
---

I had an offline discussion with Shalin and Varun and we are able to figure out 
what's wrong in the Cdcr Bootstrap.

* since issuing bootstrap is an asynchronous call, there is a probable race 
around condition where after issuing a bootstrap, it immediately checks for 
bootstrap status and if not found, another bootstrap gets issued.
* this 2nd bootstrap fails to acquire lock issues cancel boostrap
* since the bootstrap at target is now "cancelled", the bootstrap status in 
CdcrReplicatorManager goes into rigorous infinite loop as the condition 
"cancelled" is not handled.

In the patch both "*submitted*" and "*cancelled*" bootstrap status conditions 
and 'what to do next' is covered, which will nullify the extensive bootstrap 
calling and even the bootstrap should complete successfully. 


was (Author: sarkaramr...@gmail.com):
I had an offline discussion with Shalin and Varun and we are able to figure out 
what's wrong in the Cdcr Bootstrap.

* since issuing bootstrap is an asynchronous call, there is a probable race 
around condition where after issuing a bootstrap, it immediately checks for 
bootstrap status and if not found, another bootstrap gets issued.
* this 2nd bootstrap fails to acquire lock issues cancel boostrap
* since the bootstrap at target is now "cancelled", the bootstrap status in 
CdcrReplicatorManager goes into  infinite loop rigorous as the condition 
"cancelled" is not handled.

In the patch both "*submitted*" and "*cancelled*" bootstrap status conditions 
and 'what to do next' is covered, which will nullify the extensive bootstrap 
calling and even the bootstrap should complete successfully. 

> CdcrBootstrapTest failing intermittently
> 
>
> Key: SOLR-11278
> URL: https://issues.apache.org/jira/browse/SOLR-11278
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: CDCR
>Affects Versions: 7.0, 6.6.1
>Reporter: Amrit Sarkar
>Assignee: Varun Thacker
>Priority: Critical
>  Labels: test
> Attachments: master-bs.patch, SOLR-11278-awaits-fix.patch, 
> SOLR-11278-cancel-bootstrap-on-stop.patch, SOLR-11278.patch, 
> SOLR-11278.patch, test_results
>
>
> {{CdcrBootstrapTest}} is failing while running beasts for significant 
> iterations.
> The bootstrapping is failing in the test, after the first batch is indexed 
> for each {{testmethod}}, which results in documents mismatch ::
> {code}
>   [beaster]   2> 39167 ERROR 
> (updateExecutor-39-thread-1-processing-n:127.0.0.1:42155_solr 
> x:cdcr-target_shard1_replica_n1 s:shard1 c:cdcr-target r:core_node2) 
> [n:127.0.0.1:42155_solr c:cdcr-target s:shard1 r:core_node2 
> x:cdcr-target_shard1_replica_n1] o.a.s.h.CdcrRequestHandler Bootstrap 
> operation failed
>   [beaster]   2> java.util.concurrent.ExecutionException: 
> java.lang.AssertionError
>   [beaster]   2>  at 
> java.util.concurrent.FutureTask.report(FutureTask.java:122)
>   [beaster]   2>  at 
> java.util.concurrent.FutureTask.get(FutureTask.java:192)
>   [beaster]   2>  at 
> org.apache.solr.handler.CdcrRequestHandler.lambda$handleBootstrapAction$0(CdcrRequestHandler.java:654)
>   [beaster]   2>  at 
> com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176)
>   [beaster]   2>  at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   [beaster]   2>  at 
> java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   [beaster]   2>  at 
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:188)
>   [beaster]   2>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   [beaster]   2>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   [beaster]   2>  at java.lang.Thread.run(Thread.java:748)
>   [beaster]   2> Caused by: java.lang.AssertionError
>   [beaster]   2>  at 
> org.apache.solr.handler.CdcrRequestHandler$BootstrapCallable.call(CdcrRequestHandler.java:813)
>   [beaster]   2>  at 
> org.apache.solr.handler.CdcrRequestHandler$BootstrapCallable.call(CdcrRequestHandler.java:724)
>   [beaster]   2>  at 
> com.codahale.metrics.InstrumentedExecutorService$InstrumentedCallable.call(InstrumentedExecutorService.java:197)
>   [beaster]   2>  ... 5 more
> {code}
> {code}
>   [beaster] [01:37:16.282] FAILURE  153s | 
> CdcrBootstrapTest.testBootstrapWithSourceCluster <<<
>   [beaster]> Throwable #1: java.lang.AssertionError: