[jira] [Commented] (SOLR-13072) Management of markers for nodeLost / nodeAdded events is broken
[ https://issues.apache.org/jira/browse/SOLR-13072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17113004#comment-17113004 ] Andrzej Bialecki commented on SOLR-13072: - [~cjcowie] thanks for reporting this - I created a separate issue to track this: SOLR-14505. > Management of markers for nodeLost / nodeAdded events is broken > --- > > Key: SOLR-13072 > URL: https://issues.apache.org/jira/browse/SOLR-13072 > Project: Solr > Issue Type: Bug > Components: AutoScaling >Affects Versions: 7.5, 7.6, 8.0 >Reporter: Andrzej Bialecki >Assignee: Andrzej Bialecki >Priority: Major > Fix For: 7.7, 8.0, master (9.0) > > > In order to prevent {{nodeLost}} events from being lost when it's the > Overseer leader that is the node that was lost a mechanism was added to > record markers for these events by any other live node, in > {{ZkController.registerLiveNodesListener()}}. As similar mechanism also > exists for {{nodeAdded}} events. > On Overseer leader restart if the autoscaling configuration didn't contain > any triggers that consume {{nodeLost}} events then these markers are removed. > If there are 1 or more trigger configs that consume {{nodeLost}} events then > these triggers would read the markers, remove them and generate appropriate > events. > However, as the {{NodeMarkersRegistrationTest}} shows this mechanism is > broken and susceptible to race conditions. > It's not unusual to have more than 1 {{nodeLost}} trigger because in addition > to any user-defined triggers there's always one that is automatically defined > if missing: {{.auto_add_replicas}}. However, if there's more than 1 > {{nodeLost}} trigger then the process of consuming and removing the markers > becomes non-deterministic - each trigger may pick up (and delete) all, none, > or some of the markers. > So as it is now this mechanism is broken if more than 1 {{nodeLost}} or more > than 1 {{nodeAdded}} trigger is defined. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13072) Management of markers for nodeLost / nodeAdded events is broken
[ https://issues.apache.org/jira/browse/SOLR-13072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17112254#comment-17112254 ] Colvin Cowie commented on SOLR-13072: - Hi [~ab] I've seen intermittent NullPointerExceptions in org.apache.solr.cloud.ZkController.registerLiveNodesListener() which was added by this issue. I sent an email to the dev mailing list, if you could have a chance to look at it. Thanks in advance > Management of markers for nodeLost / nodeAdded events is broken > --- > > Key: SOLR-13072 > URL: https://issues.apache.org/jira/browse/SOLR-13072 > Project: Solr > Issue Type: Bug > Components: AutoScaling >Affects Versions: 7.5, 7.6, 8.0 >Reporter: Andrzej Bialecki >Assignee: Andrzej Bialecki >Priority: Major > Fix For: 7.7, 8.0, master (9.0) > > > In order to prevent {{nodeLost}} events from being lost when it's the > Overseer leader that is the node that was lost a mechanism was added to > record markers for these events by any other live node, in > {{ZkController.registerLiveNodesListener()}}. As similar mechanism also > exists for {{nodeAdded}} events. > On Overseer leader restart if the autoscaling configuration didn't contain > any triggers that consume {{nodeLost}} events then these markers are removed. > If there are 1 or more trigger configs that consume {{nodeLost}} events then > these triggers would read the markers, remove them and generate appropriate > events. > However, as the {{NodeMarkersRegistrationTest}} shows this mechanism is > broken and susceptible to race conditions. > It's not unusual to have more than 1 {{nodeLost}} trigger because in addition > to any user-defined triggers there's always one that is automatically defined > if missing: {{.auto_add_replicas}}. However, if there's more than 1 > {{nodeLost}} trigger then the process of consuming and removing the markers > becomes non-deterministic - each trigger may pick up (and delete) all, none, > or some of the markers. > So as it is now this mechanism is broken if more than 1 {{nodeLost}} or more > than 1 {{nodeAdded}} trigger is defined. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org