[ https://issues.apache.org/jira/browse/SOLR-12480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andrzej Bialecki resolved SOLR-12480. -------------------------------------- Resolution: Duplicate > TriggerAction failures may cause inconsistent trigger behavior > -------------------------------------------------------------- > > Key: SOLR-12480 > URL: https://issues.apache.org/jira/browse/SOLR-12480 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: AutoScaling > Affects Versions: 7.4, master (8.0) > Reporter: Andrzej Bialecki > Priority: Major > > The following issue occasionally appears when running > {{TestLargeCluster.testNodeLost}}. > The test kills a large number of nodes, waiting for a certain time between > the kills. Depending on the sequence and the length of {{waitFor}} it may > happen that when {{ExecutePlanAction}} processes MOVEREPLICA the target node > may just have been killed. This results in an exception and a FAILED status > of the action. > However, this failure is not reported back to the trigger as unprocessed > event because it happens asynchronously in the action executor (in > {{ScheduledTriggers}}) - so the trigger happily resets its internal state to > no longer track the lost node. As a result, replicas remain lost and even if > there’s a Policy violation the event will not be generated again, and the > number of replicas won’t go back to the original number. > Also, {{ScheduledTriggers:311}} and 323 only logs the exception but doesn’t > fire listeners with FAILED status, which is a bug. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org