[jira] [Commented] (SOLR-12291) Async prematurely reports completed status that causes severe shard loss

2019-08-29 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-12291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16918638#comment-16918638
 ] 

ASF subversion and git services commented on SOLR-12291:


Commit 8b6ca690acee929ceadd3ea7a8a504499cbfa012 in lucene-solr's branch 
refs/heads/branch_7_7 from Mikhail Khludnev
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=8b6ca69 ]

SOLR-12291: fixing premature completion of async tasks

* extract async tracking methods from OverseerCollectionMessageHandler into the 
separate class
* replacing hashmap to named list to avoid entry loss


> Async prematurely reports completed status that causes severe shard loss
> 
>
> Key: SOLR-12291
> URL: https://issues.apache.org/jira/browse/SOLR-12291
> Project: Solr
>  Issue Type: Bug
>  Components: Backup/Restore, SolrCloud
>Reporter: Varun Thacker
>Assignee: Mikhail Khludnev
>Priority: Major
> Fix For: 8.1, master (9.0)
>
> Attachments: SOLR-12291.patch, SOLR-12291.patch, SOLR-12291.patch, 
> SOLR-12291.patch, SOLR-12291.patch, SOLR-12291.patch, SOLR-12291.patch, 
> SOLR-122911.patch
>
>
> The OverseerCollectionMessageHandler sliceCmd assumes only one replica exists 
> on one node
> When multiple replicas of a slice are on the same node we only track one 
> replica's async request. This happens because the async requestMap's key is 
> "node_name"
> I discovered this when [~alabax] shared some logs of a restore issue, where 
> the second replica got added before the first replica had completed it's 
> restorecore action.
> While looking at the logs I noticed that the overseer never called 
> REQUESTSTATUS for the restorecore action , almost as if it had missed 
> tracking that particular async request.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12291) Async prematurely reports completed status that causes severe shard loss

2019-08-29 Thread Ishan Chattopadhyaya (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-12291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16918551#comment-16918551
 ] 

Ishan Chattopadhyaya commented on SOLR-12291:
-

Thanks for clarifying. I can take a stab at it (since I'm already on 
backporting SOLR-13718 there); shall seek your help if I'm stuck.

> Async prematurely reports completed status that causes severe shard loss
> 
>
> Key: SOLR-12291
> URL: https://issues.apache.org/jira/browse/SOLR-12291
> Project: Solr
>  Issue Type: Bug
>  Components: Backup/Restore, SolrCloud
>Reporter: Varun Thacker
>Assignee: Mikhail Khludnev
>Priority: Major
> Fix For: 8.1, master (9.0)
>
> Attachments: SOLR-12291.patch, SOLR-12291.patch, SOLR-12291.patch, 
> SOLR-12291.patch, SOLR-12291.patch, SOLR-12291.patch, SOLR-12291.patch, 
> SOLR-122911.patch
>
>
> The OverseerCollectionMessageHandler sliceCmd assumes only one replica exists 
> on one node
> When multiple replicas of a slice are on the same node we only track one 
> replica's async request. This happens because the async requestMap's key is 
> "node_name"
> I discovered this when [~alabax] shared some logs of a restore issue, where 
> the second replica got added before the first replica had completed it's 
> restorecore action.
> While looking at the logs I noticed that the overseer never called 
> REQUESTSTATUS for the restorecore action , almost as if it had missed 
> tracking that particular async request.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12291) Async prematurely reports completed status that causes severe shard loss

2019-08-29 Thread Mikhail Khludnev (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-12291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16918455#comment-16918455
 ] 

Mikhail Khludnev commented on SOLR-12291:
-

It does. I haven't thought about porting to 7.x. Would you like me to do so?   

> Async prematurely reports completed status that causes severe shard loss
> 
>
> Key: SOLR-12291
> URL: https://issues.apache.org/jira/browse/SOLR-12291
> Project: Solr
>  Issue Type: Bug
>  Components: Backup/Restore, SolrCloud
>Reporter: Varun Thacker
>Assignee: Mikhail Khludnev
>Priority: Major
> Fix For: 8.1, master (9.0)
>
> Attachments: SOLR-12291.patch, SOLR-12291.patch, SOLR-12291.patch, 
> SOLR-12291.patch, SOLR-12291.patch, SOLR-12291.patch, SOLR-12291.patch, 
> SOLR-122911.patch
>
>
> The OverseerCollectionMessageHandler sliceCmd assumes only one replica exists 
> on one node
> When multiple replicas of a slice are on the same node we only track one 
> replica's async request. This happens because the async requestMap's key is 
> "node_name"
> I discovered this when [~alabax] shared some logs of a restore issue, where 
> the second replica got added before the first replica had completed it's 
> restorecore action.
> While looking at the logs I noticed that the overseer never called 
> REQUESTSTATUS for the restorecore action , almost as if it had missed 
> tracking that particular async request.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12291) Async prematurely reports completed status that causes severe shard loss

2019-08-28 Thread Ishan Chattopadhyaya (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-12291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16918281#comment-16918281
 ] 

Ishan Chattopadhyaya commented on SOLR-12291:
-

Does this affect the 7x branch as well? Should we port to 7.7 branch as well?

> Async prematurely reports completed status that causes severe shard loss
> 
>
> Key: SOLR-12291
> URL: https://issues.apache.org/jira/browse/SOLR-12291
> Project: Solr
>  Issue Type: Bug
>  Components: Backup/Restore, SolrCloud
>Reporter: Varun Thacker
>Assignee: Mikhail Khludnev
>Priority: Major
> Fix For: 8.1, master (9.0)
>
> Attachments: SOLR-12291.patch, SOLR-12291.patch, SOLR-12291.patch, 
> SOLR-12291.patch, SOLR-12291.patch, SOLR-12291.patch, SOLR-12291.patch, 
> SOLR-122911.patch
>
>
> The OverseerCollectionMessageHandler sliceCmd assumes only one replica exists 
> on one node
> When multiple replicas of a slice are on the same node we only track one 
> replica's async request. This happens because the async requestMap's key is 
> "node_name"
> I discovered this when [~alabax] shared some logs of a restore issue, where 
> the second replica got added before the first replica had completed it's 
> restorecore action.
> While looking at the logs I noticed that the overseer never called 
> REQUESTSTATUS for the restorecore action , almost as if it had missed 
> tracking that particular async request.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12291) Async prematurely reports completed status that causes severe shard loss

2019-04-30 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16830016#comment-16830016
 ] 

ASF subversion and git services commented on SOLR-12291:


Commit 39ff3052c383f6275fb647f7f5b641ecaf46c639 in lucene-solr's branch 
refs/heads/branch_8x from Mikhail Khludnev
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=39ff305 ]

SOLR-12291: fixing premature completion of async tasks

* extract async tracking methods from OverseerCollectionMessageHandler into the 
separate class
* replacing hashmap to named list to avoid entry loss


> Async prematurely reports completed status that causes severe shard loss
> 
>
> Key: SOLR-12291
> URL: https://issues.apache.org/jira/browse/SOLR-12291
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Backup/Restore, SolrCloud
>Reporter: Varun Thacker
>Assignee: Mikhail Khludnev
>Priority: Major
> Attachments: SOLR-12291.patch, SOLR-12291.patch, SOLR-12291.patch, 
> SOLR-12291.patch, SOLR-12291.patch, SOLR-12291.patch, SOLR-12291.patch, 
> SOLR-122911.patch
>
>
> The OverseerCollectionMessageHandler sliceCmd assumes only one replica exists 
> on one node
> When multiple replicas of a slice are on the same node we only track one 
> replica's async request. This happens because the async requestMap's key is 
> "node_name"
> I discovered this when [~alabax] shared some logs of a restore issue, where 
> the second replica got added before the first replica had completed it's 
> restorecore action.
> While looking at the logs I noticed that the overseer never called 
> REQUESTSTATUS for the restorecore action , almost as if it had missed 
> tracking that particular async request.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12291) Async prematurely reports completed status that causes severe shard loss

2019-04-30 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16830012#comment-16830012
 ] 

ASF subversion and git services commented on SOLR-12291:


Commit 5ca0602d2802d6b64186972993157a1dbf4bc1e6 in lucene-solr's branch 
refs/heads/master from Mikhail Khludnev
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=5ca0602 ]

SOLR-12291: fixing premature completion of async tasks

* extract async tracking methods from OverseerCollectionMessageHandler into the 
separate class
* replacing hashmap to named list to avoid entry loss


> Async prematurely reports completed status that causes severe shard loss
> 
>
> Key: SOLR-12291
> URL: https://issues.apache.org/jira/browse/SOLR-12291
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Backup/Restore, SolrCloud
>Reporter: Varun Thacker
>Assignee: Mikhail Khludnev
>Priority: Major
> Attachments: SOLR-12291.patch, SOLR-12291.patch, SOLR-12291.patch, 
> SOLR-12291.patch, SOLR-12291.patch, SOLR-12291.patch, SOLR-12291.patch, 
> SOLR-122911.patch
>
>
> The OverseerCollectionMessageHandler sliceCmd assumes only one replica exists 
> on one node
> When multiple replicas of a slice are on the same node we only track one 
> replica's async request. This happens because the async requestMap's key is 
> "node_name"
> I discovered this when [~alabax] shared some logs of a restore issue, where 
> the second replica got added before the first replica had completed it's 
> restorecore action.
> While looking at the logs I noticed that the overseer never called 
> REQUESTSTATUS for the restorecore action , almost as if it had missed 
> tracking that particular async request.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12291) Async prematurely reports completed status that causes severe shard loss

2019-04-29 Thread Mikhail Khludnev (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16829007#comment-16829007
 ] 

Mikhail Khludnev commented on SOLR-12291:
-

I think it's ready. Let me push it this week before 8.1 cut. [~anshumg], 
[~varunthacker], [~ichattopadhyaya], [~tomasflobbe], what do you think?

> Async prematurely reports completed status that causes severe shard loss
> 
>
> Key: SOLR-12291
> URL: https://issues.apache.org/jira/browse/SOLR-12291
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Backup/Restore, SolrCloud
>Reporter: Varun Thacker
>Assignee: Mikhail Khludnev
>Priority: Major
> Attachments: SOLR-12291.patch, SOLR-12291.patch, SOLR-12291.patch, 
> SOLR-12291.patch, SOLR-12291.patch, SOLR-12291.patch, SOLR-122911.patch
>
>
> The OverseerCollectionMessageHandler sliceCmd assumes only one replica exists 
> on one node
> When multiple replicas of a slice are on the same node we only track one 
> replica's async request. This happens because the async requestMap's key is 
> "node_name"
> I discovered this when [~alabax] shared some logs of a restore issue, where 
> the second replica got added before the first replica had completed it's 
> restorecore action.
> While looking at the logs I noticed that the overseer never called 
> REQUESTSTATUS for the restorecore action , almost as if it had missed 
> tracking that particular async request.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12291) Async prematurely reports completed status that causes severe shard loss

2019-04-28 Thread Lucene/Solr QA (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16828910#comment-16828910
 ] 

Lucene/Solr QA commented on SOLR-12291:
---

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  3m 
55s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  4m  
9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  4m  
9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Release audit (RAT) {color} | 
{color:green}  4m  4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Check forbidden APIs {color} | 
{color:green}  3m 59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Validate source patterns {color} | 
{color:green}  3m 59s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 90m 
20s{color} | {color:green} core in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
33s{color} | {color:green} test-framework in the patch passed. {color} |
| {color:black}{color} | {color:black} {color} | {color:black}105m 28s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | SOLR-12291 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12967330/SOLR-12291.patch |
| Optional Tests |  compile  javac  unit  ratsources  checkforbiddenapis  
validatesourcepatterns  |
| uname | Linux lucene2-us-west.apache.org 4.4.0-112-generic #135-Ubuntu SMP 
Fri Jan 19 11:48:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | ant |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-SOLR-Build/sourcedir/dev-tools/test-patch/lucene-solr-yetus-personality.sh
 |
| git revision | master / 8dd22bc |
| ant | version: Apache Ant(TM) version 1.9.6 compiled on July 20 2018 |
| Default Java | LTS |
|  Test Results | 
https://builds.apache.org/job/PreCommit-SOLR-Build/391/testReport/ |
| modules | C: solr/core solr/test-framework U: solr |
| Console output | 
https://builds.apache.org/job/PreCommit-SOLR-Build/391/console |
| Powered by | Apache Yetus 0.7.0   http://yetus.apache.org |


This message was automatically generated.



> Async prematurely reports completed status that causes severe shard loss
> 
>
> Key: SOLR-12291
> URL: https://issues.apache.org/jira/browse/SOLR-12291
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Backup/Restore, SolrCloud
>Reporter: Varun Thacker
>Assignee: Mikhail Khludnev
>Priority: Major
> Attachments: SOLR-12291.patch, SOLR-12291.patch, SOLR-12291.patch, 
> SOLR-12291.patch, SOLR-12291.patch, SOLR-12291.patch, SOLR-122911.patch
>
>
> The OverseerCollectionMessageHandler sliceCmd assumes only one replica exists 
> on one node
> When multiple replicas of a slice are on the same node we only track one 
> replica's async request. This happens because the async requestMap's key is 
> "node_name"
> I discovered this when [~alabax] shared some logs of a restore issue, where 
> the second replica got added before the first replica had completed it's 
> restorecore action.
> While looking at the logs I noticed that the overseer never called 
> REQUESTSTATUS for the restorecore action , almost as if it had missed 
> tracking that particular async request.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12291) Async prematurely reports completed status that causes severe shard loss

2019-04-28 Thread Lucene/Solr QA (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16828831#comment-16828831
 ] 

Lucene/Solr QA commented on SOLR-12291:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m  
3s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Release audit (RAT) {color} | 
{color:green}  1m 58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Check forbidden APIs {color} | 
{color:green}  1m 59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Validate source patterns {color} | 
{color:green}  1m 58s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 45m 43s{color} 
| {color:red} core in the patch failed. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 52m 46s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | solr.cloud.ReindexCollectionTest |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | SOLR-12291 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12967309/SOLR-12291.patch |
| Optional Tests |  compile  javac  unit  ratsources  checkforbiddenapis  
validatesourcepatterns  |
| uname | Linux lucene1-us-west 4.4.0-137-generic #163~14.04.1-Ubuntu SMP Mon 
Sep 24 17:14:57 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | ant |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-SOLR-Build/sourcedir/dev-tools/test-patch/lucene-solr-yetus-personality.sh
 |
| git revision | master / 8dd22bc |
| ant | version: Apache Ant(TM) version 1.9.3 compiled on July 24 2018 |
| Default Java | LTS |
| unit | 
https://builds.apache.org/job/PreCommit-SOLR-Build/390/artifact/out/patch-unit-solr_core.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-SOLR-Build/390/testReport/ |
| modules | C: solr/core U: solr/core |
| Console output | 
https://builds.apache.org/job/PreCommit-SOLR-Build/390/console |
| Powered by | Apache Yetus 0.7.0   http://yetus.apache.org |


This message was automatically generated.



> Async prematurely reports completed status that causes severe shard loss
> 
>
> Key: SOLR-12291
> URL: https://issues.apache.org/jira/browse/SOLR-12291
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Backup/Restore, SolrCloud
>Reporter: Varun Thacker
>Assignee: Mikhail Khludnev
>Priority: Major
> Attachments: SOLR-12291.patch, SOLR-12291.patch, SOLR-12291.patch, 
> SOLR-12291.patch, SOLR-12291.patch, SOLR-122911.patch
>
>
> The OverseerCollectionMessageHandler sliceCmd assumes only one replica exists 
> on one node
> When multiple replicas of a slice are on the same node we only track one 
> replica's async request. This happens because the async requestMap's key is 
> "node_name"
> I discovered this when [~alabax] shared some logs of a restore issue, where 
> the second replica got added before the first replica had completed it's 
> restorecore action.
> While looking at the logs I noticed that the overseer never called 
> REQUESTSTATUS for the restorecore action , almost as if it had missed 
> tracking that particular async request.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12291) Async prematurely reports completed status that causes severe shard loss

2019-04-28 Thread Lucene/Solr QA (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16827852#comment-16827852
 ] 

Lucene/Solr QA commented on SOLR-12291:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  5s{color} 
| {color:red} SOLR-12291 does not apply to master. Rebase required? Wrong 
Branch? See 
https://wiki.apache.org/solr/HowToContribute#Creating_the_patch_file for help. 
{color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | SOLR-12291 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12967296/SOLR-12291.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-SOLR-Build/389/console |
| Powered by | Apache Yetus 0.7.0   http://yetus.apache.org |


This message was automatically generated.



> Async prematurely reports completed status that causes severe shard loss
> 
>
> Key: SOLR-12291
> URL: https://issues.apache.org/jira/browse/SOLR-12291
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Backup/Restore, SolrCloud
>Reporter: Varun Thacker
>Assignee: Mikhail Khludnev
>Priority: Major
> Attachments: SOLR-12291.patch, SOLR-12291.patch, SOLR-12291.patch, 
> SOLR-12291.patch, SOLR-12291.patch, SOLR-122911.patch
>
>
> The OverseerCollectionMessageHandler sliceCmd assumes only one replica exists 
> on one node
> When multiple replicas of a slice are on the same node we only track one 
> replica's async request. This happens because the async requestMap's key is 
> "node_name"
> I discovered this when [~alabax] shared some logs of a restore issue, where 
> the second replica got added before the first replica had completed it's 
> restorecore action.
> While looking at the logs I noticed that the overseer never called 
> REQUESTSTATUS for the restorecore action , almost as if it had missed 
> tracking that particular async request.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12291) Async prematurely reports completed status that causes severe shard loss

2019-04-25 Thread Lucene/Solr QA (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16826596#comment-16826596
 ] 

Lucene/Solr QA commented on SOLR-12291:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  3m 
50s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  4m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  4m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Release audit (RAT) {color} | 
{color:green}  4m 36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Check forbidden APIs {color} | 
{color:green}  4m 36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Validate source patterns {color} | 
{color:green}  4m 36s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 89m  8s{color} 
| {color:red} core in the patch failed. {color} |
| {color:black}{color} | {color:black} {color} | {color:black}102m 39s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | solr.cloud.AsyncCallRequestStatusResponseTest |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | SOLR-12291 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12967061/SOLR-12291.patch |
| Optional Tests |  compile  javac  unit  ratsources  checkforbiddenapis  
validatesourcepatterns  |
| uname | Linux lucene2-us-west.apache.org 4.4.0-112-generic #135-Ubuntu SMP 
Fri Jan 19 11:48:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | ant |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-SOLR-Build/sourcedir/dev-tools/test-patch/lucene-solr-yetus-personality.sh
 |
| git revision | master / ef79dd5 |
| ant | version: Apache Ant(TM) version 1.9.6 compiled on July 20 2018 |
| Default Java | LTS |
| unit | 
https://builds.apache.org/job/PreCommit-SOLR-Build/387/artifact/out/patch-unit-solr_core.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-SOLR-Build/387/testReport/ |
| modules | C: solr/core U: solr/core |
| Console output | 
https://builds.apache.org/job/PreCommit-SOLR-Build/387/console |
| Powered by | Apache Yetus 0.7.0   http://yetus.apache.org |


This message was automatically generated.



> Async prematurely reports completed status that causes severe shard loss
> 
>
> Key: SOLR-12291
> URL: https://issues.apache.org/jira/browse/SOLR-12291
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Backup/Restore, SolrCloud
>Reporter: Varun Thacker
>Assignee: Mikhail Khludnev
>Priority: Major
> Attachments: SOLR-12291.patch, SOLR-12291.patch, SOLR-12291.patch, 
> SOLR-12291.patch, SOLR-122911.patch
>
>
> The OverseerCollectionMessageHandler sliceCmd assumes only one replica exists 
> on one node
> When multiple replicas of a slice are on the same node we only track one 
> replica's async request. This happens because the async requestMap's key is 
> "node_name"
> I discovered this when [~alabax] shared some logs of a restore issue, where 
> the second replica got added before the first replica had completed it's 
> restorecore action.
> While looking at the logs I noticed that the overseer never called 
> REQUESTSTATUS for the restorecore action , almost as if it had missed 
> tracking that particular async request.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12291) Async prematurely reports completed status that causes severe shard loss

2019-04-24 Thread Lucene/Solr QA (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16825366#comment-16825366
 ] 

Lucene/Solr QA commented on SOLR-12291:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  4m  
1s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  3m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  3m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Release audit (RAT) {color} | 
{color:green}  3m 50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Check forbidden APIs {color} | 
{color:green}  3m 50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Validate source patterns {color} | 
{color:green}  3m 51s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 88m 44s{color} 
| {color:red} core in the patch failed. {color} |
| {color:black}{color} | {color:black} {color} | {color:black}102m  1s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | solr.security.AuditLoggerIntegrationTest |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | SOLR-12291 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12966845/SOLR-12291.patch |
| Optional Tests |  compile  javac  unit  ratsources  checkforbiddenapis  
validatesourcepatterns  |
| uname | Linux lucene2-us-west.apache.org 4.4.0-112-generic #135-Ubuntu SMP 
Fri Jan 19 11:48:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | ant |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-SOLR-Build/sourcedir/dev-tools/test-patch/lucene-solr-yetus-personality.sh
 |
| git revision | master / 33c9456 |
| ant | version: Apache Ant(TM) version 1.9.6 compiled on July 20 2018 |
| Default Java | LTS |
| unit | 
https://builds.apache.org/job/PreCommit-SOLR-Build/383/artifact/out/patch-unit-solr_core.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-SOLR-Build/383/testReport/ |
| modules | C: solr/core U: solr/core |
| Console output | 
https://builds.apache.org/job/PreCommit-SOLR-Build/383/console |
| Powered by | Apache Yetus 0.7.0   http://yetus.apache.org |


This message was automatically generated.



> Async prematurely reports completed status that causes severe shard loss
> 
>
> Key: SOLR-12291
> URL: https://issues.apache.org/jira/browse/SOLR-12291
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Backup/Restore, SolrCloud
>Reporter: Varun Thacker
>Assignee: Mikhail Khludnev
>Priority: Major
> Attachments: SOLR-12291.patch, SOLR-12291.patch, SOLR-12291.patch, 
> SOLR-122911.patch
>
>
> The OverseerCollectionMessageHandler sliceCmd assumes only one replica exists 
> on one node
> When multiple replicas of a slice are on the same node we only track one 
> replica's async request. This happens because the async requestMap's key is 
> "node_name"
> I discovered this when [~alabax] shared some logs of a restore issue, where 
> the second replica got added before the first replica had completed it's 
> restorecore action.
> While looking at the logs I noticed that the overseer never called 
> REQUESTSTATUS for the restorecore action , almost as if it had missed 
> tracking that particular async request.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12291) Async prematurely reports completed status that causes severe shard loss

2019-04-24 Thread Mikhail Khludnev (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16824853#comment-16824853
 ] 

Mikhail Khludnev commented on SOLR-12291:
-

Let's start from scratch. Todays patch just adds JUnit assume for getting 
statuses from all nodes. Now, this assume mostly fails causes test to be 
skipped in report. I believe if the core problem (keys overlap in async IDs 
map) is fixed, it should pass since every node responds its' status. I'm going 
to commit just this test amendment soon, shout out to veto.   

> Async prematurely reports completed status that causes severe shard loss
> 
>
> Key: SOLR-12291
> URL: https://issues.apache.org/jira/browse/SOLR-12291
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Backup/Restore, SolrCloud
>Reporter: Varun Thacker
>Assignee: Mikhail Khludnev
>Priority: Major
> Attachments: SOLR-12291.patch, SOLR-12291.patch, SOLR-12291.patch, 
> SOLR-122911.patch
>
>
> The OverseerCollectionMessageHandler sliceCmd assumes only one replica exists 
> on one node
> When multiple replicas of a slice are on the same node we only track one 
> replica's async request. This happens because the async requestMap's key is 
> "node_name"
> I discovered this when [~alabax] shared some logs of a restore issue, where 
> the second replica got added before the first replica had completed it's 
> restorecore action.
> While looking at the logs I noticed that the overseer never called 
> REQUESTSTATUS for the restorecore action , almost as if it had missed 
> tracking that particular async request.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org