[jira] [Commented] (MESOS-9180) tasks get stuck in TASK_KILLING on the default executor
[ https://issues.apache.org/jira/browse/MESOS-9180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16762217#comment-16762217 ] Gilbert Song commented on MESOS-9180: - [~Kirill P], could you add the agent logs for triaging. Also this may related to the recent stuck task fix due to a FD leak MESOS-9151 and MESOS-9501, could you please upgrade and verify if you still have this issue? > tasks get stuck in TASK_KILLING on the default executor > --- > > Key: MESOS-9180 > URL: https://issues.apache.org/jira/browse/MESOS-9180 > Project: Mesos > Issue Type: Bug > Components: executor >Affects Versions: 1.6.1 > Environment: Ubuntu 18.04, Ubuntu 16.04 >Reporter: Kirill Plyashkevich >Priority: Critical > Labels: containerization > > during our load tests tasks get stuck in TASK_KILLING state > {quote}{noformat} > I0823 16:30:20.367563 21608 executor.cpp:192] Version: 1.6.1 > I0823 16:30:20.439478 21684 default_executor.cpp:202] Received SUBSCRIBED > event > I0823 16:30:20.441012 21684 default_executor.cpp:206] Subscribed executor on > XX.XXX.XX.XXX > I0823 16:30:20.916216 21665 default_executor.cpp:202] Received LAUNCH_GROUP > event > I0823 16:30:20.917373 21645 default_executor.cpp:426] Setting > 'MESOS_CONTAINER_IP' to: 172.26.10.222 > I0823 16:30:22.573794 21658 default_executor.cpp:202] Received ACKNOWLEDGED > event > I0823 16:30:22.575518 21637 default_executor.cpp:202] Received ACKNOWLEDGED > event > I0823 16:30:22.577137 21665 default_executor.cpp:202] Received ACKNOWLEDGED > event > I0823 16:30:33.091509 21642 default_executor.cpp:661] Finished launching > tasks [ > test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.akka, > > test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.redis, > > test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.delivery > ] in child containers [ > 3680beff-96d2-4ebd-832c-9cbbddf8c507.8e04f74f-cb8b-46b9-8758-340455a844c8, > 3680beff-96d2-4ebd-832c-9cbbddf8c507.fc60bf0f-5814-4ea9-a37f-89ebe3e2f5f7, > 3680beff-96d2-4ebd-832c-9cbbddf8c507.ab481072-c8ab-4a76-be8b-7f4431220e7b ] > I0823 16:30:33.091567 21642 default_executor.cpp:685] Waiting on child > containers of tasks [ > test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.akka, > > test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.redis, > > test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.delivery > ] > I0823 16:30:33.096014 21647 default_executor.cpp:746] Waiting for child > container > 3680beff-96d2-4ebd-832c-9cbbddf8c507.8e04f74f-cb8b-46b9-8758-340455a844c8 of > task > 'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.akka' > I0823 16:30:33.096310 21647 default_executor.cpp:746] Waiting for child > container > 3680beff-96d2-4ebd-832c-9cbbddf8c507.fc60bf0f-5814-4ea9-a37f-89ebe3e2f5f7 of > task > 'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.redis' > I0823 16:30:33.096470 21647 default_executor.cpp:746] Waiting for child > container > 3680beff-96d2-4ebd-832c-9cbbddf8c507.ab481072-c8ab-4a76-be8b-7f4431220e7b of > task > 'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.delivery' > I0823 16:30:33.521510 21648 default_executor.cpp:202] Received ACKNOWLEDGED > event > I0823 16:30:33.522073 21652 default_executor.cpp:202] Received ACKNOWLEDGED > event > I0823 16:30:33.523569 21679 default_executor.cpp:202] Received ACKNOWLEDGED > event > I0823 16:30:38.593736 21668 checker_process.cpp:814] Output of the COMMAND > health check for task > 'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.redis' > (stdout): > 0 > PONG > I0823 16:30:38.593777 21668 checker_process.cpp:817] Output of the COMMAND > health check for task > 'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.redis' > (stderr): > I0823 16:30:38.610167 21650 checker_process.cpp:814] Output of the COMMAND > health check for task > 'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.akka' > (stdout): > I0823 16:30:38.610194 21650 checker_process.cpp:817] Output of the COMMAND > health check for task > 'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.akka' > (stderr): > I0823 16:30:38.700561 21681 checker_process.cpp:814] Output of the COMMAND > health check for task > 'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.delivery' > (stdout): > I0823 16:30:38.700598 21681
[jira] [Commented] (MESOS-9180) tasks get stuck in TASK_KILLING on the default executor
[ https://issues.apache.org/jira/browse/MESOS-9180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16590356#comment-16590356 ] Kirill Plyashkevich commented on MESOS-9180: somewhat related to MESOS-8679, but in this case killing is actually being retried. > tasks get stuck in TASK_KILLING on the default executor > --- > > Key: MESOS-9180 > URL: https://issues.apache.org/jira/browse/MESOS-9180 > Project: Mesos > Issue Type: Bug > Components: executor >Affects Versions: 1.6.1 > Environment: Ubuntu 18.04, Ubuntu 16.04 >Reporter: Kirill Plyashkevich >Priority: Critical > > during our load tests tasks get stuck in TASK_KILLING state > {quote}{noformat} > I0823 16:30:20.367563 21608 executor.cpp:192] Version: 1.6.1 > I0823 16:30:20.439478 21684 default_executor.cpp:202] Received SUBSCRIBED > event > I0823 16:30:20.441012 21684 default_executor.cpp:206] Subscribed executor on > XX.XXX.XX.XXX > I0823 16:30:20.916216 21665 default_executor.cpp:202] Received LAUNCH_GROUP > event > I0823 16:30:20.917373 21645 default_executor.cpp:426] Setting > 'MESOS_CONTAINER_IP' to: 172.26.10.222 > I0823 16:30:22.573794 21658 default_executor.cpp:202] Received ACKNOWLEDGED > event > I0823 16:30:22.575518 21637 default_executor.cpp:202] Received ACKNOWLEDGED > event > I0823 16:30:22.577137 21665 default_executor.cpp:202] Received ACKNOWLEDGED > event > I0823 16:30:33.091509 21642 default_executor.cpp:661] Finished launching > tasks [ > test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.akka, > > test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.redis, > > test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.delivery > ] in child containers [ > 3680beff-96d2-4ebd-832c-9cbbddf8c507.8e04f74f-cb8b-46b9-8758-340455a844c8, > 3680beff-96d2-4ebd-832c-9cbbddf8c507.fc60bf0f-5814-4ea9-a37f-89ebe3e2f5f7, > 3680beff-96d2-4ebd-832c-9cbbddf8c507.ab481072-c8ab-4a76-be8b-7f4431220e7b ] > I0823 16:30:33.091567 21642 default_executor.cpp:685] Waiting on child > containers of tasks [ > test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.akka, > > test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.redis, > > test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.delivery > ] > I0823 16:30:33.096014 21647 default_executor.cpp:746] Waiting for child > container > 3680beff-96d2-4ebd-832c-9cbbddf8c507.8e04f74f-cb8b-46b9-8758-340455a844c8 of > task > 'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.akka' > I0823 16:30:33.096310 21647 default_executor.cpp:746] Waiting for child > container > 3680beff-96d2-4ebd-832c-9cbbddf8c507.fc60bf0f-5814-4ea9-a37f-89ebe3e2f5f7 of > task > 'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.redis' > I0823 16:30:33.096470 21647 default_executor.cpp:746] Waiting for child > container > 3680beff-96d2-4ebd-832c-9cbbddf8c507.ab481072-c8ab-4a76-be8b-7f4431220e7b of > task > 'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.delivery' > I0823 16:30:33.521510 21648 default_executor.cpp:202] Received ACKNOWLEDGED > event > I0823 16:30:33.522073 21652 default_executor.cpp:202] Received ACKNOWLEDGED > event > I0823 16:30:33.523569 21679 default_executor.cpp:202] Received ACKNOWLEDGED > event > I0823 16:30:38.593736 21668 checker_process.cpp:814] Output of the COMMAND > health check for task > 'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.redis' > (stdout): > 0 > PONG > I0823 16:30:38.593777 21668 checker_process.cpp:817] Output of the COMMAND > health check for task > 'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.redis' > (stderr): > I0823 16:30:38.610167 21650 checker_process.cpp:814] Output of the COMMAND > health check for task > 'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.akka' > (stdout): > I0823 16:30:38.610194 21650 checker_process.cpp:817] Output of the COMMAND > health check for task > 'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.akka' > (stderr): > I0823 16:30:38.700561 21681 checker_process.cpp:814] Output of the COMMAND > health check for task > 'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.delivery' > (stdout): > I0823 16:30:38.700598 21681 checker_process.cpp:817] Output of the COMMAND > health check for task >