[jira] [Commented] (MESOS-9180) tasks get stuck in TASK_KILLING on the default executor

2019-02-06 Thread Gilbert Song (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-9180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16762217#comment-16762217
 ] 

Gilbert Song commented on MESOS-9180:
-

[~Kirill P], could you add the agent logs for triaging. Also this may related 
to the recent stuck task fix due to a FD leak MESOS-9151 and MESOS-9501, could 
you please upgrade and verify if you still have this issue?

> tasks get stuck in TASK_KILLING on the default executor
> ---
>
> Key: MESOS-9180
> URL: https://issues.apache.org/jira/browse/MESOS-9180
> Project: Mesos
>  Issue Type: Bug
>  Components: executor
>Affects Versions: 1.6.1
> Environment: Ubuntu 18.04, Ubuntu 16.04
>Reporter: Kirill Plyashkevich
>Priority: Critical
>  Labels: containerization
>
> during our load tests tasks get stuck in TASK_KILLING state
> {quote}{noformat}
> I0823 16:30:20.367563 21608 executor.cpp:192] Version: 1.6.1
> I0823 16:30:20.439478 21684 default_executor.cpp:202] Received SUBSCRIBED 
> event
> I0823 16:30:20.441012 21684 default_executor.cpp:206] Subscribed executor on 
> XX.XXX.XX.XXX
> I0823 16:30:20.916216 21665 default_executor.cpp:202] Received LAUNCH_GROUP 
> event
> I0823 16:30:20.917373 21645 default_executor.cpp:426] Setting 
> 'MESOS_CONTAINER_IP' to: 172.26.10.222
> I0823 16:30:22.573794 21658 default_executor.cpp:202] Received ACKNOWLEDGED 
> event
> I0823 16:30:22.575518 21637 default_executor.cpp:202] Received ACKNOWLEDGED 
> event
> I0823 16:30:22.577137 21665 default_executor.cpp:202] Received ACKNOWLEDGED 
> event
> I0823 16:30:33.091509 21642 default_executor.cpp:661] Finished launching 
> tasks [ 
> test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.akka,
>  
> test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.redis,
>  
> test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.delivery
>  ] in child containers [ 
> 3680beff-96d2-4ebd-832c-9cbbddf8c507.8e04f74f-cb8b-46b9-8758-340455a844c8, 
> 3680beff-96d2-4ebd-832c-9cbbddf8c507.fc60bf0f-5814-4ea9-a37f-89ebe3e2f5f7, 
> 3680beff-96d2-4ebd-832c-9cbbddf8c507.ab481072-c8ab-4a76-be8b-7f4431220e7b ]
> I0823 16:30:33.091567 21642 default_executor.cpp:685] Waiting on child 
> containers of tasks [ 
> test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.akka,
>  
> test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.redis,
>  
> test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.delivery
>  ]
> I0823 16:30:33.096014 21647 default_executor.cpp:746] Waiting for child 
> container 
> 3680beff-96d2-4ebd-832c-9cbbddf8c507.8e04f74f-cb8b-46b9-8758-340455a844c8 of 
> task 
> 'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.akka'
> I0823 16:30:33.096310 21647 default_executor.cpp:746] Waiting for child 
> container 
> 3680beff-96d2-4ebd-832c-9cbbddf8c507.fc60bf0f-5814-4ea9-a37f-89ebe3e2f5f7 of 
> task 
> 'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.redis'
> I0823 16:30:33.096470 21647 default_executor.cpp:746] Waiting for child 
> container 
> 3680beff-96d2-4ebd-832c-9cbbddf8c507.ab481072-c8ab-4a76-be8b-7f4431220e7b of 
> task 
> 'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.delivery'
> I0823 16:30:33.521510 21648 default_executor.cpp:202] Received ACKNOWLEDGED 
> event
> I0823 16:30:33.522073 21652 default_executor.cpp:202] Received ACKNOWLEDGED 
> event
> I0823 16:30:33.523569 21679 default_executor.cpp:202] Received ACKNOWLEDGED 
> event
> I0823 16:30:38.593736 21668 checker_process.cpp:814] Output of the COMMAND 
> health check for task 
> 'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.redis'
>  (stdout):
> 0
> PONG
> I0823 16:30:38.593777 21668 checker_process.cpp:817] Output of the COMMAND 
> health check for task 
> 'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.redis'
>  (stderr):
> I0823 16:30:38.610167 21650 checker_process.cpp:814] Output of the COMMAND 
> health check for task 
> 'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.akka'
>  (stdout):
> I0823 16:30:38.610194 21650 checker_process.cpp:817] Output of the COMMAND 
> health check for task 
> 'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.akka'
>  (stderr):
> I0823 16:30:38.700561 21681 checker_process.cpp:814] Output of the COMMAND 
> health check for task 
> 'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.delivery'
>  (stdout):
> I0823 16:30:38.700598 21681 

[jira] [Commented] (MESOS-9180) tasks get stuck in TASK_KILLING on the default executor

2018-08-23 Thread Kirill Plyashkevich (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-9180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16590356#comment-16590356
 ] 

Kirill Plyashkevich commented on MESOS-9180:


somewhat related to MESOS-8679, but in this case killing is actually being 
retried.

> tasks get stuck in TASK_KILLING on the default executor
> ---
>
> Key: MESOS-9180
> URL: https://issues.apache.org/jira/browse/MESOS-9180
> Project: Mesos
>  Issue Type: Bug
>  Components: executor
>Affects Versions: 1.6.1
> Environment: Ubuntu 18.04, Ubuntu 16.04
>Reporter: Kirill Plyashkevich
>Priority: Critical
>
> during our load tests tasks get stuck in TASK_KILLING state
> {quote}{noformat}
> I0823 16:30:20.367563 21608 executor.cpp:192] Version: 1.6.1
> I0823 16:30:20.439478 21684 default_executor.cpp:202] Received SUBSCRIBED 
> event
> I0823 16:30:20.441012 21684 default_executor.cpp:206] Subscribed executor on 
> XX.XXX.XX.XXX
> I0823 16:30:20.916216 21665 default_executor.cpp:202] Received LAUNCH_GROUP 
> event
> I0823 16:30:20.917373 21645 default_executor.cpp:426] Setting 
> 'MESOS_CONTAINER_IP' to: 172.26.10.222
> I0823 16:30:22.573794 21658 default_executor.cpp:202] Received ACKNOWLEDGED 
> event
> I0823 16:30:22.575518 21637 default_executor.cpp:202] Received ACKNOWLEDGED 
> event
> I0823 16:30:22.577137 21665 default_executor.cpp:202] Received ACKNOWLEDGED 
> event
> I0823 16:30:33.091509 21642 default_executor.cpp:661] Finished launching 
> tasks [ 
> test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.akka,
>  
> test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.redis,
>  
> test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.delivery
>  ] in child containers [ 
> 3680beff-96d2-4ebd-832c-9cbbddf8c507.8e04f74f-cb8b-46b9-8758-340455a844c8, 
> 3680beff-96d2-4ebd-832c-9cbbddf8c507.fc60bf0f-5814-4ea9-a37f-89ebe3e2f5f7, 
> 3680beff-96d2-4ebd-832c-9cbbddf8c507.ab481072-c8ab-4a76-be8b-7f4431220e7b ]
> I0823 16:30:33.091567 21642 default_executor.cpp:685] Waiting on child 
> containers of tasks [ 
> test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.akka,
>  
> test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.redis,
>  
> test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.delivery
>  ]
> I0823 16:30:33.096014 21647 default_executor.cpp:746] Waiting for child 
> container 
> 3680beff-96d2-4ebd-832c-9cbbddf8c507.8e04f74f-cb8b-46b9-8758-340455a844c8 of 
> task 
> 'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.akka'
> I0823 16:30:33.096310 21647 default_executor.cpp:746] Waiting for child 
> container 
> 3680beff-96d2-4ebd-832c-9cbbddf8c507.fc60bf0f-5814-4ea9-a37f-89ebe3e2f5f7 of 
> task 
> 'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.redis'
> I0823 16:30:33.096470 21647 default_executor.cpp:746] Waiting for child 
> container 
> 3680beff-96d2-4ebd-832c-9cbbddf8c507.ab481072-c8ab-4a76-be8b-7f4431220e7b of 
> task 
> 'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.delivery'
> I0823 16:30:33.521510 21648 default_executor.cpp:202] Received ACKNOWLEDGED 
> event
> I0823 16:30:33.522073 21652 default_executor.cpp:202] Received ACKNOWLEDGED 
> event
> I0823 16:30:33.523569 21679 default_executor.cpp:202] Received ACKNOWLEDGED 
> event
> I0823 16:30:38.593736 21668 checker_process.cpp:814] Output of the COMMAND 
> health check for task 
> 'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.redis'
>  (stdout):
> 0
> PONG
> I0823 16:30:38.593777 21668 checker_process.cpp:817] Output of the COMMAND 
> health check for task 
> 'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.redis'
>  (stderr):
> I0823 16:30:38.610167 21650 checker_process.cpp:814] Output of the COMMAND 
> health check for task 
> 'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.akka'
>  (stdout):
> I0823 16:30:38.610194 21650 checker_process.cpp:817] Output of the COMMAND 
> health check for task 
> 'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.akka'
>  (stderr):
> I0823 16:30:38.700561 21681 checker_process.cpp:814] Output of the COMMAND 
> health check for task 
> 'test_cb88dd0c-a6e0-11e8-888f-fb74b926ae8c.instance-08d37bd7-a6e1-11e8-9e12-0242e3789894.delivery'
>  (stdout):
> I0823 16:30:38.700598 21681 checker_process.cpp:817] Output of the COMMAND 
> health check for task 
>