[jira] [Commented] (MESOS-9502) IOswitchboard cleanup could get stuck.
[ https://issues.apache.org/jira/browse/MESOS-9502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16730063#comment-16730063 ] Meng Zhu commented on MESOS-9502: - [~jieyu] mentioned that one possibility is pid reuse. When the io switchboard terminates and agent restarts, the old pid for io switchboard will be reaped by init. The agent will reap on the old pid after failover, in common case, this will return None() immediately. However, in the corner case, if the pid is reused, the agent can get stuck. > IOswitchboard cleanup could get stuck. > -- > > Key: MESOS-9502 > URL: https://issues.apache.org/jira/browse/MESOS-9502 > Project: Mesos > Issue Type: Bug > Components: containerization >Affects Versions: 1.7.0 >Reporter: Meng Zhu >Priority: Critical > > Our check container got stuck during destroy which in turned stucks the > parent container. It is blocked by the I/O switchboard cleanup: > 1223 18:04:41.00 16269 switchboard.cpp:814] Sending SIGTERM to I/O > switchboard server (pid: 62854) since container > 4d4074fa-bc87-471b-8659-08e519b68e13.16d02532-675a-4acb-964d-57459ecf6b67.check-e91521a3-bf72-4ac4-8ead-3950e31cf09e > is being destroyed > > 1227 04:45:38.00 5189 switchboard.cpp:916] I/O switchboard server > process for container > 4d4074fa-bc87-471b-8659-08e519b68e13.16d02532-675a-4acb-964d-57459ecf6b67.check-e91521a3-bf72-4ac4-8ead-3950e31cf09e > has terminated (status=N/A) > Note the timestamp. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (MESOS-9502) IOswitchboard cleanup could get stuck.
Meng Zhu created MESOS-9502: --- Summary: IOswitchboard cleanup could get stuck. Key: MESOS-9502 URL: https://issues.apache.org/jira/browse/MESOS-9502 Project: Mesos Issue Type: Bug Components: containerization Affects Versions: 1.7.0 Reporter: Meng Zhu Our check container got stuck during destroy which in turned stucks the parent container. It is blocked by the I/O switchboard cleanup: 1223 18:04:41.00 16269 switchboard.cpp:814] Sending SIGTERM to I/O switchboard server (pid: 62854) since container 4d4074fa-bc87-471b-8659-08e519b68e13.16d02532-675a-4acb-964d-57459ecf6b67.check-e91521a3-bf72-4ac4-8ead-3950e31cf09e is being destroyed 1227 04:45:38.00 5189 switchboard.cpp:916] I/O switchboard server process for container 4d4074fa-bc87-471b-8659-08e519b68e13.16d02532-675a-4acb-964d-57459ecf6b67.check-e91521a3-bf72-4ac4-8ead-3950e31cf09e has terminated (status=N/A) Note the timestamp. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (MESOS-7023) IOSwitchboardTest.RecoverThenKillSwitchboardContainerDestroyed is flaky
[ https://issues.apache.org/jira/browse/MESOS-7023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Rukletsov reassigned MESOS-7023: -- Assignee: (was: Kevin Klues) > IOSwitchboardTest.RecoverThenKillSwitchboardContainerDestroyed is flaky > --- > > Key: MESOS-7023 > URL: https://issues.apache.org/jira/browse/MESOS-7023 > Project: Mesos > Issue Type: Bug > Components: agent, test >Affects Versions: 1.2.2 > Environment: ASF CI, cmake, gcc, Ubuntu 14.04, without libevent/SSL >Reporter: Greg Mann >Priority: Major > Labels: debugging, disabled-test, flaky > Attachments: IOSwitchboardTest. > RecoverThenKillSwitchboardContainerDestroyed.txt > > > This was observed on ASF CI: > {code} > /mesos/src/tests/containerizer/io_switchboard_tests.cpp:1052: Failure > Value of: statusFailed->reason() > Actual: 1 > Expected: TaskStatus::REASON_IO_SWITCHBOARD_EXITED > Which is: 27 > {code} > Find full log attached. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (MESOS-8252) MasterAuthorizationTest.SlaveRemovedLost is flaky.
[ https://issues.apache.org/jira/browse/MESOS-8252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Rukletsov reassigned MESOS-8252: -- Assignee: (was: Alexander Rojas) > MasterAuthorizationTest.SlaveRemovedLost is flaky. > -- > > Key: MESOS-8252 > URL: https://issues.apache.org/jira/browse/MESOS-8252 > Project: Mesos > Issue Type: Bug > Components: test >Reporter: Alexander Rukletsov >Priority: Major > Labels: flaky-test > Attachments: SlaveRemovedLost-badrun.txt > > > Observed it in the internal CI today. Most likely related to the recent > introduction of {{Abandoned}} future state. Full log attached. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MESOS-9491) There exists no way to statically configure a weight for a Mesos role
[ https://issues.apache.org/jira/browse/MESOS-9491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16729570#comment-16729570 ] Alexander Rukletsov commented on MESOS-9491: [~bbannier] Why do you static configuration would be useful? We wanted to move away from a concept of statically defining roles in a cluster. > There exists no way to statically configure a weight for a Mesos role > - > > Key: MESOS-9491 > URL: https://issues.apache.org/jira/browse/MESOS-9491 > Project: Mesos > Issue Type: Bug > Components: allocation >Reporter: Benjamin Bannier >Priority: Major > > While it is possible to change the weight of any role at runtime over the > operator API, it seems we currently have no supported way to configure this > statically with configuration flags. Both the {{\-\-weights}} and {{--roles}} > flag would in principle allow this, but are deprecated. -- This message was sent by Atlassian JIRA (v7.6.3#76005)