Kirill Plyashkevich created MESOS-9180:
------------------------------------------

             Summary: tasks get stuck in TASK_KILLING on the default executor
                 Key: MESOS-9180
                 URL: https://issues.apache.org/jira/browse/MESOS-9180
             Project: Mesos
          Issue Type: Bug
          Components: executor
    Affects Versions: 1.6.1
         Environment: Ubuntu 18.04, Ubuntu 16.04
            Reporter: Kirill Plyashkevich


during our load tests tasks get stuck in TASK_KILLING state
{quote}{noformat}
I0823 16:30:20.367563 21608 executor.cpp:192] Version: 1.6.1
I0823 16:30:20.439478 21684 default_executor.cpp:202] Received SUBSCRIBED event
I0823 16:30:20.441012 21684 default_executor.cpp:206] Subscribed executor on 
XX.XXX.XX.XXX
I0823 16:30:20.916216 21665 default_executor.cpp:202] Received LAUNCH_GROUP 
event
I0823 16:30:20.917373 21645 default_executor.cpp:426] Setting 
'MESOS_CONTAINER_IP' to: 172.26.10.222
I0823 16:30:22.573794 21658 default_executor.cpp:202] Received ACKNOWLEDGED 
event
I0823 16:30:22.575518 21637 default_executor.cpp:202] Received ACKNOWLEDGED 
event
I0823 16:30:22.577137 21665 default_executor.cpp:202] Received ACKNOWLEDGED 
event
I0823 16:30:33.091509 21642 default_executor.cpp:661] Finished launching tasks 
[ 
test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.akka,
 
test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.redis,
 
test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.delivery
 ] in child containers [ 
3680beff-96d2-4ebd-832c-9cbbddf8c507.8e04f74f-cb8b-46b9-8758-340455a844c8, 
3680beff-96d2-4ebd-832c-9cbbddf8c507.fc60bf0f-5814-4ea9-a37f-89ebe3e2f5f7, 
3680beff-96d2-4ebd-832c-9cbbddf8c507.ab481072-c8ab-4a76-be8b-7f4431220e7b ]
I0823 16:30:33.091567 21642 default_executor.cpp:685] Waiting on child 
containers of tasks [ 
test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.akka,
 
test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.redis,
 
test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.delivery
 ]
I0823 16:30:33.096014 21647 default_executor.cpp:746] Waiting for child 
container 
3680beff-96d2-4ebd-832c-9cbbddf8c507.8e04f74f-cb8b-46b9-8758-340455a844c8 of 
task 
'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.akka'
I0823 16:30:33.096310 21647 default_executor.cpp:746] Waiting for child 
container 
3680beff-96d2-4ebd-832c-9cbbddf8c507.fc60bf0f-5814-4ea9-a37f-89ebe3e2f5f7 of 
task 
'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.redis'
I0823 16:30:33.096470 21647 default_executor.cpp:746] Waiting for child 
container 
3680beff-96d2-4ebd-832c-9cbbddf8c507.ab481072-c8ab-4a76-be8b-7f4431220e7b of 
task 
'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.delivery'
I0823 16:30:33.521510 21648 default_executor.cpp:202] Received ACKNOWLEDGED 
event
I0823 16:30:33.522073 21652 default_executor.cpp:202] Received ACKNOWLEDGED 
event
I0823 16:30:33.523569 21679 default_executor.cpp:202] Received ACKNOWLEDGED 
event
I0823 16:30:38.593736 21668 checker_process.cpp:814] Output of the COMMAND 
health check for task 
'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.redis'
 (stdout):
0
PONG
I0823 16:30:38.593777 21668 checker_process.cpp:817] Output of the COMMAND 
health check for task 
'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.redis'
 (stderr):
I0823 16:30:38.610167 21650 checker_process.cpp:814] Output of the COMMAND 
health check for task 
'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.akka'
 (stdout):
I0823 16:30:38.610194 21650 checker_process.cpp:817] Output of the COMMAND 
health check for task 
'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.akka'
 (stderr):
I0823 16:30:38.700561 21681 checker_process.cpp:814] Output of the COMMAND 
health check for task 
'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.delivery'
 (stdout):
I0823 16:30:38.700598 21681 checker_process.cpp:817] Output of the COMMAND 
health check for task 
'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.delivery'
 (stderr):
I0823 16:30:42.786908 21649 checker_process.cpp:971] COMMAND health check for 
task 
'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.redis'
 returned: 0
I0823 16:30:42.787267 21649 default_executor.cpp:1375] Received task health 
update for task 
'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.redis',
 task is healthy
I0823 16:30:45.156363 21658 default_executor.cpp:202] Received ACKNOWLEDGED 
event
I0823 16:30:48.454120 21653 checker_process.cpp:971] COMMAND health check for 
task 
'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.akka'
 returned: 1
W0823 16:30:48.454218 21653 health_checker.cpp:283] COMMAND health check for 
task 
'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.akka'
 failed: Command terminated with signal Hangup
W0823 16:30:48.454242 21653 health_checker.cpp:305] COMMAND health check for 
task 
'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.akka'
 failed 1 times consecutively
I0823 16:30:48.454370 21653 default_executor.cpp:1375] Received task health 
update for task 
'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.akka',
 task is not healthy
I0823 16:30:50.887114 21666 checker_process.cpp:971] COMMAND health check for 
task 
'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.delivery'
 returned: 1
W0823 16:30:50.887183 21666 health_checker.cpp:283] COMMAND health check for 
task 
'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.delivery'
 failed: Command terminated with signal Hangup
W0823 16:30:50.887198 21666 health_checker.cpp:305] COMMAND health check for 
task 
'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.delivery'
 failed 1 times consecutively
I0823 16:30:50.887295 21657 default_executor.cpp:1375] Received task health 
update for task 
'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.delivery',
 task is not healthy
I0823 16:30:51.289993 21689 default_executor.cpp:202] Received ACKNOWLEDGED 
event
I0823 16:30:51.607558 21659 default_executor.cpp:202] Received ACKNOWLEDGED 
event
W0823 16:31:23.851263 21657 health_checker.cpp:273] COMMAND health check for 
task 
'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.delivery'
 failed: Command timed out after 5secs
W0823 16:31:23.851332 21657 health_checker.cpp:305] COMMAND health check for 
task 
'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.delivery'
 failed 2 times consecutively
I0823 16:31:23.851519 21641 default_executor.cpp:1375] Received task health 
update for task 
'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.delivery',
 task is not healthy
W0823 16:31:24.081169 21654 health_checker.cpp:273] COMMAND health check for 
task 
'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.akka'
 failed: Command timed out after 5secs
W0823 16:31:24.081220 21654 health_checker.cpp:305] COMMAND health check for 
task 
'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.akka'
 failed 2 times consecutively
I0823 16:31:24.081336 21654 default_executor.cpp:1375] Received task health 
update for task 
'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.akka',
 task is not healthy
I0823 16:31:24.487970 21659 default_executor.cpp:202] Received ACKNOWLEDGED 
event
I0823 16:31:26.176144 21682 default_executor.cpp:202] Received ACKNOWLEDGED 
event
W0823 16:31:48.187378 21659 health_checker.cpp:273] COMMAND health check for 
task 
'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.delivery'
 failed: Command timed out after 5secs
W0823 16:31:48.187428 21659 health_checker.cpp:305] COMMAND health check for 
task 
'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.delivery'
 failed 3 times consecutively
I0823 16:31:48.187537 21659 default_executor.cpp:1375] Received task health 
update for task 
'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.delivery',
 task is not healthy
W0823 16:31:48.210490 21676 health_checker.cpp:273] COMMAND health check for 
task 
'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.akka'
 failed: Command timed out after 5secs
W0823 16:31:48.210537 21676 health_checker.cpp:305] COMMAND health check for 
task 
'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.akka'
 failed 3 times consecutively
I0823 16:31:48.210651 21676 default_executor.cpp:1375] Received task health 
update for task 
'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.akka',
 task is not healthy
I0823 16:31:48.426265 21660 default_executor.cpp:202] Received ACKNOWLEDGED 
event
I0823 16:31:48.427875 21640 default_executor.cpp:202] Received ACKNOWLEDGED 
event
W0823 16:32:24.028173 21638 health_checker.cpp:273] COMMAND health check for 
task 
'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.delivery'
 failed: Command timed out after 5secs
W0823 16:32:24.028211 21638 health_checker.cpp:305] COMMAND health check for 
task 
'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.delivery'
 failed 4 times consecutively
I0823 16:32:24.028343 21638 default_executor.cpp:1375] Received task health 
update for task 
'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.delivery',
 task is not healthy
W0823 16:32:24.080215 21688 health_checker.cpp:273] COMMAND health check for 
task 
'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.akka'
 failed: Command timed out after 5secs
W0823 16:32:24.080267 21688 health_checker.cpp:305] COMMAND health check for 
task 
'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.akka'
 failed 4 times consecutively
I0823 16:32:24.080369 21688 default_executor.cpp:1375] Received task health 
update for task 
'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.akka',
 task is not healthy
I0823 16:32:24.994634 21672 default_executor.cpp:202] Received ACKNOWLEDGED 
event
I0823 16:32:25.002722 21683 default_executor.cpp:202] Received ACKNOWLEDGED 
event
W0823 16:32:49.181438 21671 health_checker.cpp:273] COMMAND health check for 
task 
'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.delivery'
 failed: Command timed out after 5secs
W0823 16:32:49.181476 21671 health_checker.cpp:305] COMMAND health check for 
task 
'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.delivery'
 failed 5 times consecutively
I0823 16:32:49.181608 21671 default_executor.cpp:1375] Received task health 
update for task 
'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.delivery',
 task is not healthy
I0823 16:32:49.182938 21671 default_executor.cpp:1249] Received kill for task 
'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.delivery'
I0823 16:32:49.183014 21671 checker_process.cpp:281] Stopped COMMAND health 
check for task 
'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.delivery'
I0823 16:32:49.183149 21671 default_executor.cpp:1124] Killing task 
test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.delivery
 running in child container 
3680beff-96d2-4ebd-832c-9cbbddf8c507.ab481072-c8ab-4a76-be8b-7f4431220e7b with 
SIGTERM signal
I0823 16:32:49.183159 21671 default_executor.cpp:1135] Scheduling escalation to 
SIGKILL in 90secs from now
I0823 16:32:50.426288 21665 default_executor.cpp:202] Received ACKNOWLEDGED 
event
I0823 16:32:50.430682 21682 default_executor.cpp:202] Received ACKNOWLEDGED 
event
W0823 16:32:50.917691 21689 health_checker.cpp:273] COMMAND health check for 
task 
'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.akka'
 failed: Command timed out after 5secs
W0823 16:32:50.917750 21689 health_checker.cpp:305] COMMAND health check for 
task 
'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.akka'
 failed 5 times consecutively
I0823 16:32:50.917850 21680 default_executor.cpp:1375] Received task health 
update for task 
'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.akka',
 task is not healthy
I0823 16:32:50.919066 21680 default_executor.cpp:1249] Received kill for task 
'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.akka'
I0823 16:32:50.919119 21680 checker_process.cpp:281] Stopped COMMAND health 
check for task 
'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.akka'
I0823 16:32:50.919231 21680 default_executor.cpp:1124] Killing task 
test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.akka
 running in child container 
3680beff-96d2-4ebd-832c-9cbbddf8c507.8e04f74f-cb8b-46b9-8758-340455a844c8 with 
SIGTERM signal
I0823 16:32:50.919241 21680 default_executor.cpp:1135] Scheduling escalation to 
SIGKILL in 90secs from now
I0823 16:32:51.127272 21651 default_executor.cpp:202] Received ACKNOWLEDGED 
event
I0823 16:32:51.130367 21670 default_executor.cpp:202] Received ACKNOWLEDGED 
event
I0823 16:32:51.973668 21665 default_executor.cpp:953] Child container 
3680beff-96d2-4ebd-832c-9cbbddf8c507.8e04f74f-cb8b-46b9-8758-340455a844c8 of 
task 
'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.akka'
 completed in state TASK_KILLED: Command terminated with signal Terminated
I0823 16:32:51.973721 21665 default_executor.cpp:974] Killing task group 
containing tasks [ 
test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.akka,
 
test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.redis,
 
test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.delivery
 ]
I0823 16:32:51.973819 21691 checker_process.cpp:281] Stopped COMMAND health 
check for task 
'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.redis'
I0823 16:32:51.973997 21665 default_executor.cpp:1124] Killing task 
test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.redis
 running in child container 
3680beff-96d2-4ebd-832c-9cbbddf8c507.fc60bf0f-5814-4ea9-a37f-89ebe3e2f5f7 with 
SIGTERM signal
I0823 16:32:51.974021 21665 default_executor.cpp:1135] Scheduling escalation to 
SIGKILL in 3secs from now
I0823 16:32:51.975106 21665 default_executor.cpp:953] Child container 
3680beff-96d2-4ebd-832c-9cbbddf8c507.ab481072-c8ab-4a76-be8b-7f4431220e7b of 
task 
'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.delivery'
 completed in state TASK_KILLED: Command terminated with signal Terminated
I0823 16:32:51.995775 21671 default_executor.cpp:202] Received ACKNOWLEDGED 
event
I0823 16:32:51.997719 21644 default_executor.cpp:202] Received ACKNOWLEDGED 
event
I0823 16:32:52.003360 21676 default_executor.cpp:202] Received ACKNOWLEDGED 
event
I0823 16:32:54.974514 21646 default_executor.cpp:1213] Task 
'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.redis'
 running in child container 
3680beff-96d2-4ebd-832c-9cbbddf8c507.fc60bf0f-5814-4ea9-a37f-89ebe3e2f5f7 did 
not terminate after 3secs, sending SIGKILL to the container
W0823 16:32:54.982900 21650 default_executor.cpp:1222] Escalation to SIGKILL 
the task 
'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.redis'
 running in child container 
3680beff-96d2-4ebd-832c-9cbbddf8c507.fc60bf0f-5814-4ea9-a37f-89ebe3e2f5f7 
failed: The agent failed to send signal Killed (9) to the container 
3680beff-96d2-4ebd-832c-9cbbddf8c507.fc60bf0f-5814-4ea9-a37f-89ebe3e2f5f7: 
Unable to send signal to container: No such process; Retrying in 1secs
I0823 16:32:55.983327 21639 default_executor.cpp:1213] Task 
'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.redis'
 running in child container 
3680beff-96d2-4ebd-832c-9cbbddf8c507.fc60bf0f-5814-4ea9-a37f-89ebe3e2f5f7 did 
not terminate after 3secs, sending SIGKILL to the container
W0823 16:32:55.990069 21644 default_executor.cpp:1222] Escalation to SIGKILL 
the task 
'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.redis'
 running in child container 
3680beff-96d2-4ebd-832c-9cbbddf8c507.fc60bf0f-5814-4ea9-a37f-89ebe3e2f5f7 
failed: The agent failed to send signal Killed (9) to the container 
3680beff-96d2-4ebd-832c-9cbbddf8c507.fc60bf0f-5814-4ea9-a37f-89ebe3e2f5f7: 
Unable to send signal to container: No such process; Retrying in 1secs
I0823 16:32:56.991422 21670 default_executor.cpp:1213] Task 
'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.redis'
 running in child container 
3680beff-96d2-4ebd-832c-9cbbddf8c507.fc60bf0f-5814-4ea9-a37f-89ebe3e2f5f7 did 
not terminate after 3secs, sending SIGKILL to the container
{noformat}{quote}
and then it loops forever with retrying to kill already non-existing process

the other form of that bug we observed is
{quote}{noformat}
I0823 11:19:44.460397 35632 default_executor.cpp:202] Received KILL event
I0823 11:19:44.460433 35632 default_executor.cpp:1249] Received kill for task 
'test_11c6bfe0-a660-11e8-8861-4f65393a63f6.instance-687724c4-a660-11e8-ab64-c6905d8f8b70.redis'
W0823 11:19:44.460445 35632 default_executor.cpp:1259] Ignoring kill for task 
'test_11c6bfe0-a660-11e8-8861-4f65393a63f6.instance-687724c4-a660-11e8-ab64-c6905d8f8b70.redis'
 as it is in the process of getting killed
I0823 11:19:45.078868 35660 default_executor.cpp:1213] Task 
'test_11c6bfe0-a660-11e8-8861-4f65393a63f6.instance-687724c4-a660-11e8-ab64-c6905d8f8b70.redis'
 running in child container 
ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53 did 
not terminate after 1mins, sending SIGKILL to the container
W0823 11:19:45.083555 35645 default_executor.cpp:1222] Escalation to SIGKILL 
the task 
'test_11c6bfe0-a660-11e8-8861-4f65393a63f6.instance-687724c4-a660-11e8-ab64-c6905d8f8b70.redis'
 running in child container 
ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53 
failed: The agent failed to send signal Killed (9) to the container 
ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53: 
Unable to send signal to container: No such process; Retrying in 1secs
I0823 11:19:46.084547 35637 default_executor.cpp:1213] Task 
'test_11c6bfe0-a660-11e8-8861-4f65393a63f6.instance-687724c4-a660-11e8-ab64-c6905d8f8b70.redis'
 running in child container 
ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53 did 
not terminate after 1mins, sending SIGKILL to the container
W0823 11:19:46.088583 35639 default_executor.cpp:1222] Escalation to SIGKILL 
the task 
'test_11c6bfe0-a660-11e8-8861-4f65393a63f6.instance-687724c4-a660-11e8-ab64-c6905d8f8b70.redis'
 running in child container 
ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53 
failed: The agent failed to send signal Killed (9) to the container 
ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53: 
Unable to send signal to container: No such process; Retrying in 1secs
I0823 11:19:47.089757 35630 default_executor.cpp:1213] Task 
'test_11c6bfe0-a660-11e8-8861-4f65393a63f6.instance-687724c4-a660-11e8-ab64-c6905d8f8b70.redis'
 running in child container 
ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53 did 
not terminate after 1mins, sending SIGKILL to the container
W0823 11:19:47.094741 35631 default_executor.cpp:1222] Escalation to SIGKILL 
the task 
'test_11c6bfe0-a660-11e8-8861-4f65393a63f6.instance-687724c4-a660-11e8-ab64-c6905d8f8b70.redis'
 running in child container 
ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53 
failed: The agent failed to send signal Killed (9) to the container 
ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53: 
Unable to send signal to container: No such process; Retrying in 1secs
I0823 11:19:48.095813 35623 default_executor.cpp:1213] Task 
'test_11c6bfe0-a660-11e8-8861-4f65393a63f6.instance-687724c4-a660-11e8-ab64-c6905d8f8b70.redis'
 running in child container 
ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53 did 
not terminate after 1mins, sending SIGKILL to the container
W0823 11:19:48.100821 35632 default_executor.cpp:1222] Escalation to SIGKILL 
the task 
'test_11c6bfe0-a660-11e8-8861-4f65393a63f6.instance-687724c4-a660-11e8-ab64-c6905d8f8b70.redis'
 running in child container 
ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53 
failed: The agent failed to send signal Killed (9) to the container 
ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53: 
Unable to send signal to container: No such process; Retrying in 1secs
I0823 11:19:49.101478 35627 default_executor.cpp:1213] Task 
'test_11c6bfe0-a660-11e8-8861-4f65393a63f6.instance-687724c4-a660-11e8-ab64-c6905d8f8b70.redis'
 running in child container 
ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53 did 
not terminate after 1mins, sending SIGKILL to the container
W0823 11:19:49.105983 35651 default_executor.cpp:1222] Escalation to SIGKILL 
the task 
'test_11c6bfe0-a660-11e8-8861-4f65393a63f6.instance-687724c4-a660-11e8-ab64-c6905d8f8b70.redis'
 running in child container 
ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53 
failed: The agent failed to send signal Killed (9) to the container 
ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53: 
Unable to send signal to container: No such process; Retrying in 1secs
I0823 11:19:50.106503 35662 default_executor.cpp:1213] Task 
'test_11c6bfe0-a660-11e8-8861-4f65393a63f6.instance-687724c4-a660-11e8-ab64-c6905d8f8b70.redis'
 running in child container 
ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53 did 
not terminate after 1mins, sending SIGKILL to the container
W0823 11:19:50.111423 35723 default_executor.cpp:1222] Escalation to SIGKILL 
the task 
'test_11c6bfe0-a660-11e8-8861-4f65393a63f6.instance-687724c4-a660-11e8-ab64-c6905d8f8b70.redis'
 running in child container 
ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53 
failed: The agent failed to send signal Killed (9) to the container 
ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53: 
Unable to send signal to container: No such process; Retrying in 1secs
I0823 11:19:51.112059 35725 default_executor.cpp:1213] Task 
'test_11c6bfe0-a660-11e8-8861-4f65393a63f6.instance-687724c4-a660-11e8-ab64-c6905d8f8b70.redis'
 running in child container 
ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53 did 
not terminate after 1mins, sending SIGKILL to the container
W0823 11:19:51.116915 35664 default_executor.cpp:1222] Escalation to SIGKILL 
the task 
'test_11c6bfe0-a660-11e8-8861-4f65393a63f6.instance-687724c4-a660-11e8-ab64-c6905d8f8b70.redis'
 running in child container 
ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53 
failed: The agent failed to send signal Killed (9) to the container 
ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53: 
Unable to send signal to container: No such process; Retrying in 1secs
I0823 11:19:52.118046 35653 default_executor.cpp:1213] Task 
'test_11c6bfe0-a660-11e8-8861-4f65393a63f6.instance-687724c4-a660-11e8-ab64-c6905d8f8b70.redis'
 running in child container 
ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53 did 
not terminate after 1mins, sending SIGKILL to the container
W0823 11:19:52.122288 35616 default_executor.cpp:1222] Escalation to SIGKILL 
the task 
'test_11c6bfe0-a660-11e8-8861-4f65393a63f6.instance-687724c4-a660-11e8-ab64-c6905d8f8b70.redis'
 running in child container 
ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53 
failed: The agent failed to send signal Killed (9) to the container 
ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53: 
Unable to send signal to container: No such process; Retrying in 1secs
I0823 11:19:53.123337 35658 default_executor.cpp:1213] Task 
'test_11c6bfe0-a660-11e8-8861-4f65393a63f6.instance-687724c4-a660-11e8-ab64-c6905d8f8b70.redis'
 running in child container 
ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53 did 
not terminate after 1mins, sending SIGKILL to the container
W0823 11:19:53.128535 35641 default_executor.cpp:1222] Escalation to SIGKILL 
the task 
'test_11c6bfe0-a660-11e8-8861-4f65393a63f6.instance-687724c4-a660-11e8-ab64-c6905d8f8b70.redis'
 running in child container 
ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53 
failed: The agent failed to send signal Killed (9) to the container 
ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53: 
Unable to send signal to container: No such process; Retrying in 1secs
I0823 11:19:54.129462 35633 default_executor.cpp:1213] Task 
'test_11c6bfe0-a660-11e8-8861-4f65393a63f6.instance-687724c4-a660-11e8-ab64-c6905d8f8b70.redis'
 running in child container 
ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53 did 
not terminate after 1mins, sending SIGKILL to the container
W0823 11:19:54.133767 35644 default_executor.cpp:1222] Escalation to SIGKILL 
the task 
'test_11c6bfe0-a660-11e8-8861-4f65393a63f6.instance-687724c4-a660-11e8-ab64-c6905d8f8b70.redis'
 running in child container 
ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53 
failed: The agent failed to send signal Killed (9) to the container 
ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53: 
Unable to send signal to container: No such process; Retrying in 1secs
I0823 11:19:55.134635 35618 default_executor.cpp:1213] Task 
'test_11c6bfe0-a660-11e8-8861-4f65393a63f6.instance-687724c4-a660-11e8-ab64-c6905d8f8b70.redis'
 running in child container 
ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53 did 
not terminate after 1mins, sending SIGKILL to the container
W0823 11:19:55.138553 35622 default_executor.cpp:1222] Escalation to SIGKILL 
the task 
'test_11c6bfe0-a660-11e8-8861-4f65393a63f6.instance-687724c4-a660-11e8-ab64-c6905d8f8b70.redis'
 running in child container 
ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53 
failed: The agent failed to send signal Killed (9) to the container 
ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53: 
Unable to send signal to container: No such process; Retrying in 1secs
I0823 11:19:56.139037 35638 default_executor.cpp:1213] Task 
'test_11c6bfe0-a660-11e8-8861-4f65393a63f6.instance-687724c4-a660-11e8-ab64-c6905d8f8b70.redis'
 running in child container 
ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53 did 
not terminate after 1mins, sending SIGKILL to the container
W0823 11:19:56.142948 35724 default_executor.cpp:1222] Escalation to SIGKILL 
the task 
'test_11c6bfe0-a660-11e8-8861-4f65393a63f6.instance-687724c4-a660-11e8-ab64-c6905d8f8b70.redis'
 running in child container 
ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53 
failed: The agent failed to send signal Killed (9) to the container 
ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53: 
Unable to send signal to container: No such process; Retrying in 1secs
I0823 11:19:57.143637 35659 default_executor.cpp:1213] Task 
'test_11c6bfe0-a660-11e8-8861-4f65393a63f6.instance-687724c4-a660-11e8-ab64-c6905d8f8b70.redis'
 running in child container 
ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53 did 
not terminate after 1mins, sending SIGKILL to the container
W0823 11:19:57.148473 35636 default_executor.cpp:1222] Escalation to SIGKILL 
the task 
'test_11c6bfe0-a660-11e8-8861-4f65393a63f6.instance-687724c4-a660-11e8-ab64-c6905d8f8b70.redis'
 running in child container 
ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53 
failed: The agent failed to send signal Killed (9) to the container 
ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53: 
Unable to send signal to container: No such process; Retrying in 1secs
I0823 11:19:58.149035 35648 default_executor.cpp:1213] Task 
'test_11c6bfe0-a660-11e8-8861-4f65393a63f6.instance-687724c4-a660-11e8-ab64-c6905d8f8b70.redis'
 running in child container 
ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53 did 
not terminate after 1mins, sending SIGKILL to the container
W0823 11:19:58.152792 35621 default_executor.cpp:1222] Escalation to SIGKILL 
the task 
'test_11c6bfe0-a660-11e8-8861-4f65393a63f6.instance-687724c4-a660-11e8-ab64-c6905d8f8b70.redis'
 running in child container 
ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53 
failed: The agent failed to send signal Killed (9) to the container 
ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53: 
Unable to send signal to container: No such process; Retrying in 1secs
I0823 11:19:59.153236 35629 default_executor.cpp:1213] Task 
'test_11c6bfe0-a660-11e8-8861-4f65393a63f6.instance-687724c4-a660-11e8-ab64-c6905d8f8b70.redis'
 running in child container 
ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53 did 
not terminate after 1mins, sending SIGKILL to the container
W0823 11:19:59.157325 35656 default_executor.cpp:1222] Escalation to SIGKILL 
the task 
'test_11c6bfe0-a660-11e8-8861-4f65393a63f6.instance-687724c4-a660-11e8-ab64-c6905d8f8b70.redis'
 running in child container 
ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53 
failed: The agent failed to send signal Killed (9) to the container 
ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53: 
Unable to send signal to container: No such process; Retrying in 1secs
I0823 11:20:00.158377 35660 default_executor.cpp:1213] Task 
'test_11c6bfe0-a660-11e8-8861-4f65393a63f6.instance-687724c4-a660-11e8-ab64-c6905d8f8b70.redis'
 running in child container 
ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53 did 
not terminate after 1mins, sending SIGKILL to the container
W0823 11:20:00.162392 35627 default_executor.cpp:1222] Escalation to SIGKILL 
the task 
'test_11c6bfe0-a660-11e8-8861-4f65393a63f6.instance-687724c4-a660-11e8-ab64-c6905d8f8b70.redis'
 running in child container 
ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53 
failed: The agent failed to send signal Killed (9) to the container 
ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53: 
Unable to send signal to container: No such process; Retrying in 1secs
I0823 11:20:01.162860 35637 default_executor.cpp:1213] Task 
'test_11c6bfe0-a660-11e8-8861-4f65393a63f6.instance-687724c4-a660-11e8-ab64-c6905d8f8b70.redis'
 running in child container 
ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53 did 
not terminate after 1mins, sending SIGKILL to the container
W0823 11:20:01.167155 35662 default_executor.cpp:1222] Escalation to SIGKILL 
the task 
'test_11c6bfe0-a660-11e8-8861-4f65393a63f6.instance-687724c4-a660-11e8-ab64-c6905d8f8b70.redis'
 running in child container 
ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53 
failed: The agent failed to send signal Killed (9) to the container 
ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53: 
Unable to send signal to container: No such process; Retrying in 1secs
I0823 11:20:02.167553 35630 default_executor.cpp:1213] Task 
'test_11c6bfe0-a660-11e8-8861-4f65393a63f6.instance-687724c4-a660-11e8-ab64-c6905d8f8b70.redis'
 running in child container 
ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53 did 
not terminate after 1mins, sending SIGKILL to the container
W0823 11:20:02.172479 35725 default_executor.cpp:1222] Escalation to SIGKILL 
the task 
'test_11c6bfe0-a660-11e8-8861-4f65393a63f6.instance-687724c4-a660-11e8-ab64-c6905d8f8b70.redis'
 running in child container 
ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53 
failed: The agent failed to send signal Killed (9) to the container 
ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53: 
Unable to send signal to container: No such process; Retrying in 1secs
I0823 11:20:03.173439 35619 default_executor.cpp:1213] Task 
'test_11c6bfe0-a660-11e8-8861-4f65393a63f6.instance-687724c4-a660-11e8-ab64-c6905d8f8b70.redis'
 running in child container 
ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53 did 
not terminate after 1mins, sending SIGKILL to the container
W0823 11:20:03.177597 35653 default_executor.cpp:1222] Escalation to SIGKILL 
the task 
'test_11c6bfe0-a660-11e8-8861-4f65393a63f6.instance-687724c4-a660-11e8-ab64-c6905d8f8b70.redis'
 running in child container 
ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53 
failed: The agent failed to send signal Killed (9) to the container 
ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53: 
Unable to send signal to container: No such process; Retrying in 1secs
I0823 11:20:04.178180 35727 default_executor.cpp:1213] Task 
'test_11c6bfe0-a660-11e8-8861-4f65393a63f6.instance-687724c4-a660-11e8-ab64-c6905d8f8b70.redis'
 running in child container 
ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53 did 
not terminate after 1mins, sending SIGKILL to the container
W0823 11:20:04.182360 35658 default_executor.cpp:1222] Escalation to SIGKILL 
the task 
'test_11c6bfe0-a660-11e8-8861-4f65393a63f6.instance-687724c4-a660-11e8-ab64-c6905d8f8b70.redis'
 running in child container 
ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53 
failed: The agent failed to send signal Killed (9) to the container 
ab8332fe-bd03-47b0-962d-cc1d724a9f13.aa1ffbe8-816c-400b-ad6d-0413b0c1ec53: 
Unable to send signal to container: No such process; Retrying in 1secs
I0823 11:20:04.460697 35662 default_executor.cpp:202] Received KILL event
{noformat}{quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to