[ 
https://issues.apache.org/jira/browse/SOLR-12480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki  resolved SOLR-12480.
--------------------------------------
    Resolution: Duplicate

> TriggerAction failures may cause inconsistent trigger behavior
> --------------------------------------------------------------
>
>                 Key: SOLR-12480
>                 URL: https://issues.apache.org/jira/browse/SOLR-12480
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: AutoScaling
>    Affects Versions: 7.4, master (8.0)
>            Reporter: Andrzej Bialecki 
>            Priority: Major
>
> The following issue occasionally appears when running 
> {{TestLargeCluster.testNodeLost}}.
> The test kills a large number of nodes, waiting for a certain time between 
> the kills. Depending on the sequence and the length of {{waitFor}} it may 
> happen that when {{ExecutePlanAction}} processes MOVEREPLICA the target node 
> may just have been killed. This results in an exception and a FAILED status 
> of the action.
> However, this failure is not reported back to the trigger as unprocessed 
> event because it happens asynchronously in the action executor (in 
> {{ScheduledTriggers}}) - so the trigger happily resets its internal state to 
> no longer track the lost node. As a result, replicas remain lost and even if 
> there’s a Policy violation the event will not be generated again, and the 
> number of replicas won’t go back to the original number.
> Also, {{ScheduledTriggers:311}} and 323 only logs the exception but doesn’t 
> fire listeners with FAILED status, which is a bug.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to