Yan Xu created MESOS-8630: ----------------------------- Summary: All subsequent registry operations fail after the registrar is aborted after a failed update Key: MESOS-8630 URL: https://issues.apache.org/jira/browse/MESOS-8630 Project: Mesos Issue Type: Bug Components: master Reporter: Yan Xu
Failure to update registry always aborts the registrar but don't always abort the master process. When the registrar fails to update the registry it would abort the actor and fail all future operations. The rationale as explained here: [https://github.com/apache/mesos/commit/5eaf1eb346fc2f46c852c1246bdff12a89216b60] {quote}In this event, the Master won't commit suicide until the initial failure is processed. However, in the interim, subsequent operations are potentially being performed against the Registrar. This could lead to fighting between masters if a "demoted" master re-attempts to acquire log-leadership! {quote} However when the registrar updates is requested by an operator API (maintenance, quota update, etc) the master process doesn't shut down (a 500 error is returned to the client instead) and all subsequent operations will fail! -- This message was sent by Atlassian JIRA (v7.6.3#76005)