[jira] [Updated] (MESOS-7964) Heavy-duty GC makes the agent unresponsive

2017-09-26 Thread Yan Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Xu updated MESOS-7964:
--
Fix Version/s: 1.5.0

> Heavy-duty GC makes the agent unresponsive
> --
>
> Key: MESOS-7964
> URL: https://issues.apache.org/jira/browse/MESOS-7964
> Project: Mesos
>  Issue Type: Bug
>  Components: agent
>Affects Versions: 1.4.0
>Reporter: Chun-Hung Hsiao
>Assignee: Chun-Hung Hsiao
> Fix For: 1.4.1, 1.5.0
>
>
> An agent is observed to performe heavy-duty GC every half an hour:
> {noformat}
> Sep 07 18:15:56 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 18:15:56.900282 16054 slave.cpp:5920] Current disk 
> usage 93.61%. Max allowed age: 0ns
> Sep 07 18:15:56 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 18:15:56.900476 16054 gc.cpp:218] Pruning 
> directories with remaining removal time 1.99022105972148days
> ...
> Sep 07 18:22:08 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 18:22:08.173645 16050 gc.cpp:178] Deleted 
> '/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S20/frameworks/9750f9be-89d9-4e02-80d3-bdced653e9c3-0258/executors/node__f33065c9-eb42-44a7-9013-25bafc306bd5'
> ...
> Sep 07 18:41:08 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 18:41:08.195329 16051 slave.cpp:5920] Current disk 
> usage 90.85%. Max allowed age: 0ns
> Sep 07 18:41:08 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 18:41:08.195503 16051 gc.cpp:218] Pruning 
> directories with remaining removal time 1.99028708946667days
> ...
> Sep 07 18:49:01 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 18:49:01.253906 16049 gc.cpp:178] Deleted 
> '/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S20/frameworks/9750f9be-89d9-4e02-80d3-bdced653e9c3-0258/executors/node__014b451a-30de-41ee-b0b1-3733c790382c/runs/c5b922e8-eee0-4793-8637-7abbd7f8507e'
> ...
> Sep 07 19:08:01 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 19:08:01.291092 16048 slave.cpp:5920] Current disk 
> usage 91.39%. Max allowed age: 0ns
> Sep 07 19:08:01 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 19:08:01.291285 16048 gc.cpp:218] Pruning 
> directories with remaining removal time 1.99028598086815days
> ...
> Sep 07 19:14:50 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: W0907 19:14:50.737226 16050 gc.cpp:174] Failed to delete 
> '/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S20/frameworks/9750f9be-89d9-4e02-80d3-bdced653e9c3-0258/executors/node__4139bf2e-e33b-4743-8527-f8f50ac49280/runs/b1991e28-7ff8-476f-8122-1a483e431ff2':
>  No such file or directory
> ...
> Sep 07 19:33:50 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 19:33:50.758191 16052 slave.cpp:5920] Current disk 
> usage 91.39%. Max allowed age: 0ns
> Sep 07 19:33:50 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 19:33:50.758872 16047 gc.cpp:218] Pruning 
> directories with remaining removal time 1.99028057238519days
> ...
> Sep 07 19:39:43 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 19:39:43.081485 16052 gc.cpp:178] Deleted 
> '/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S20/frameworks/9750f9be-89d9-4e02-80d3-bdced653e9c3-0258/executors/node__d89dce1f-609b-4cf8-957a-5ba198be7828'
> ...
> Sep 07 19:59:43 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 19:59:43.150535 16048 slave.cpp:5920] Current disk 
> usage 94.56%. Max allowed age: 0ns
> Sep 07 19:59:43 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 19:59:43.150869 16054 gc.cpp:218] Pruning 
> directories with remaining removal time 1.98959316198222days
> ...
> Sep 07 20:06:16 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 20:06:16.251552 16051 gc.cpp:178] Deleted 
> '/var/lib/mesos/slave/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S20/frameworks/9750f9be-89d9-4e02-80d3-bdced653e9c3-0259/executors/data__45283e7d-9a5e-4d4b-9901-b7f1e096cd54/runs/5cfc5e3e-3975-41aa-846b-c125eb529fbe'
> {noformat}
> Each GC activity took 5+ minutes. During the period, the agent became 
> unresponsive, the health check timed out, and no endpoint responded as well. 
> When a disk-usage GC is trigged, around 300 pruning actors would be generated 
> (https://github.com/apache/mesos/blob/master/src/slave/gc.cpp#L229). My 
> hypothesis is that these actors would used all of the 

[jira] [Updated] (MESOS-7964) Heavy-duty GC makes the agent unresponsive

2017-09-26 Thread Yan Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Xu updated MESOS-7964:
--
Affects Version/s: 1.4.0

> Heavy-duty GC makes the agent unresponsive
> --
>
> Key: MESOS-7964
> URL: https://issues.apache.org/jira/browse/MESOS-7964
> Project: Mesos
>  Issue Type: Bug
>  Components: agent
>Affects Versions: 1.4.0
>Reporter: Chun-Hung Hsiao
>Assignee: Chun-Hung Hsiao
> Fix For: 1.4.1
>
>
> An agent is observed to performe heavy-duty GC every half an hour:
> {noformat}
> Sep 07 18:15:56 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 18:15:56.900282 16054 slave.cpp:5920] Current disk 
> usage 93.61%. Max allowed age: 0ns
> Sep 07 18:15:56 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 18:15:56.900476 16054 gc.cpp:218] Pruning 
> directories with remaining removal time 1.99022105972148days
> ...
> Sep 07 18:22:08 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 18:22:08.173645 16050 gc.cpp:178] Deleted 
> '/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S20/frameworks/9750f9be-89d9-4e02-80d3-bdced653e9c3-0258/executors/node__f33065c9-eb42-44a7-9013-25bafc306bd5'
> ...
> Sep 07 18:41:08 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 18:41:08.195329 16051 slave.cpp:5920] Current disk 
> usage 90.85%. Max allowed age: 0ns
> Sep 07 18:41:08 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 18:41:08.195503 16051 gc.cpp:218] Pruning 
> directories with remaining removal time 1.99028708946667days
> ...
> Sep 07 18:49:01 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 18:49:01.253906 16049 gc.cpp:178] Deleted 
> '/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S20/frameworks/9750f9be-89d9-4e02-80d3-bdced653e9c3-0258/executors/node__014b451a-30de-41ee-b0b1-3733c790382c/runs/c5b922e8-eee0-4793-8637-7abbd7f8507e'
> ...
> Sep 07 19:08:01 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 19:08:01.291092 16048 slave.cpp:5920] Current disk 
> usage 91.39%. Max allowed age: 0ns
> Sep 07 19:08:01 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 19:08:01.291285 16048 gc.cpp:218] Pruning 
> directories with remaining removal time 1.99028598086815days
> ...
> Sep 07 19:14:50 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: W0907 19:14:50.737226 16050 gc.cpp:174] Failed to delete 
> '/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S20/frameworks/9750f9be-89d9-4e02-80d3-bdced653e9c3-0258/executors/node__4139bf2e-e33b-4743-8527-f8f50ac49280/runs/b1991e28-7ff8-476f-8122-1a483e431ff2':
>  No such file or directory
> ...
> Sep 07 19:33:50 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 19:33:50.758191 16052 slave.cpp:5920] Current disk 
> usage 91.39%. Max allowed age: 0ns
> Sep 07 19:33:50 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 19:33:50.758872 16047 gc.cpp:218] Pruning 
> directories with remaining removal time 1.99028057238519days
> ...
> Sep 07 19:39:43 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 19:39:43.081485 16052 gc.cpp:178] Deleted 
> '/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S20/frameworks/9750f9be-89d9-4e02-80d3-bdced653e9c3-0258/executors/node__d89dce1f-609b-4cf8-957a-5ba198be7828'
> ...
> Sep 07 19:59:43 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 19:59:43.150535 16048 slave.cpp:5920] Current disk 
> usage 94.56%. Max allowed age: 0ns
> Sep 07 19:59:43 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 19:59:43.150869 16054 gc.cpp:218] Pruning 
> directories with remaining removal time 1.98959316198222days
> ...
> Sep 07 20:06:16 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 20:06:16.251552 16051 gc.cpp:178] Deleted 
> '/var/lib/mesos/slave/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S20/frameworks/9750f9be-89d9-4e02-80d3-bdced653e9c3-0259/executors/data__45283e7d-9a5e-4d4b-9901-b7f1e096cd54/runs/5cfc5e3e-3975-41aa-846b-c125eb529fbe'
> {noformat}
> Each GC activity took 5+ minutes. During the period, the agent became 
> unresponsive, the health check timed out, and no endpoint responded as well. 
> When a disk-usage GC is trigged, around 300 pruning actors would be generated 
> (https://github.com/apache/mesos/blob/master/src/slave/gc.cpp#L229). My 
> hypothesis is that these actors would used all of the 

[jira] [Updated] (MESOS-7964) Heavy-duty GC makes the agent unresponsive

2017-09-19 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-7964:
--
Sprint: Mesosphere Sprint 63, Mesosphere Sprint 64  (was: Mesosphere Sprint 
63)

> Heavy-duty GC makes the agent unresponsive
> --
>
> Key: MESOS-7964
> URL: https://issues.apache.org/jira/browse/MESOS-7964
> Project: Mesos
>  Issue Type: Bug
>  Components: agent
>Reporter: Chun-Hung Hsiao
>Assignee: Chun-Hung Hsiao
> Fix For: 1.4.1
>
>
> An agent is observed to performe heavy-duty GC every half an hour:
> {noformat}
> Sep 07 18:15:56 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 18:15:56.900282 16054 slave.cpp:5920] Current disk 
> usage 93.61%. Max allowed age: 0ns
> Sep 07 18:15:56 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 18:15:56.900476 16054 gc.cpp:218] Pruning 
> directories with remaining removal time 1.99022105972148days
> ...
> Sep 07 18:22:08 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 18:22:08.173645 16050 gc.cpp:178] Deleted 
> '/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S20/frameworks/9750f9be-89d9-4e02-80d3-bdced653e9c3-0258/executors/node__f33065c9-eb42-44a7-9013-25bafc306bd5'
> ...
> Sep 07 18:41:08 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 18:41:08.195329 16051 slave.cpp:5920] Current disk 
> usage 90.85%. Max allowed age: 0ns
> Sep 07 18:41:08 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 18:41:08.195503 16051 gc.cpp:218] Pruning 
> directories with remaining removal time 1.99028708946667days
> ...
> Sep 07 18:49:01 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 18:49:01.253906 16049 gc.cpp:178] Deleted 
> '/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S20/frameworks/9750f9be-89d9-4e02-80d3-bdced653e9c3-0258/executors/node__014b451a-30de-41ee-b0b1-3733c790382c/runs/c5b922e8-eee0-4793-8637-7abbd7f8507e'
> ...
> Sep 07 19:08:01 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 19:08:01.291092 16048 slave.cpp:5920] Current disk 
> usage 91.39%. Max allowed age: 0ns
> Sep 07 19:08:01 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 19:08:01.291285 16048 gc.cpp:218] Pruning 
> directories with remaining removal time 1.99028598086815days
> ...
> Sep 07 19:14:50 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: W0907 19:14:50.737226 16050 gc.cpp:174] Failed to delete 
> '/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S20/frameworks/9750f9be-89d9-4e02-80d3-bdced653e9c3-0258/executors/node__4139bf2e-e33b-4743-8527-f8f50ac49280/runs/b1991e28-7ff8-476f-8122-1a483e431ff2':
>  No such file or directory
> ...
> Sep 07 19:33:50 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 19:33:50.758191 16052 slave.cpp:5920] Current disk 
> usage 91.39%. Max allowed age: 0ns
> Sep 07 19:33:50 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 19:33:50.758872 16047 gc.cpp:218] Pruning 
> directories with remaining removal time 1.99028057238519days
> ...
> Sep 07 19:39:43 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 19:39:43.081485 16052 gc.cpp:178] Deleted 
> '/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S20/frameworks/9750f9be-89d9-4e02-80d3-bdced653e9c3-0258/executors/node__d89dce1f-609b-4cf8-957a-5ba198be7828'
> ...
> Sep 07 19:59:43 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 19:59:43.150535 16048 slave.cpp:5920] Current disk 
> usage 94.56%. Max allowed age: 0ns
> Sep 07 19:59:43 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 19:59:43.150869 16054 gc.cpp:218] Pruning 
> directories with remaining removal time 1.98959316198222days
> ...
> Sep 07 20:06:16 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 20:06:16.251552 16051 gc.cpp:178] Deleted 
> '/var/lib/mesos/slave/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S20/frameworks/9750f9be-89d9-4e02-80d3-bdced653e9c3-0259/executors/data__45283e7d-9a5e-4d4b-9901-b7f1e096cd54/runs/5cfc5e3e-3975-41aa-846b-c125eb529fbe'
> {noformat}
> Each GC activity took 5+ minutes. During the period, the agent became 
> unresponsive, the health check timed out, and no endpoint responded as well. 
> When a disk-usage GC is trigged, around 300 pruning actors would be generated 
> (https://github.com/apache/mesos/blob/master/src/slave/gc.cpp#L229). My 
> hypothesis is that 

[jira] [Updated] (MESOS-7964) Heavy-duty GC makes the agent unresponsive

2017-09-11 Thread Chun-Hung Hsiao (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chun-Hung Hsiao updated MESOS-7964:
---
Component/s: agent

> Heavy-duty GC makes the agent unresponsive
> --
>
> Key: MESOS-7964
> URL: https://issues.apache.org/jira/browse/MESOS-7964
> Project: Mesos
>  Issue Type: Bug
>  Components: agent
>Reporter: Chun-Hung Hsiao
>Assignee: Chun-Hung Hsiao
> Fix For: 1.4.1
>
>
> An agent is observed to performe heavy-duty GC every half an hour:
> {noformat}
> Sep 07 18:15:56 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 18:15:56.900282 16054 slave.cpp:5920] Current disk 
> usage 93.61%. Max allowed age: 0ns
> Sep 07 18:15:56 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 18:15:56.900476 16054 gc.cpp:218] Pruning 
> directories with remaining removal time 1.99022105972148days
> ...
> Sep 07 18:22:08 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 18:22:08.173645 16050 gc.cpp:178] Deleted 
> '/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S20/frameworks/9750f9be-89d9-4e02-80d3-bdced653e9c3-0258/executors/node__f33065c9-eb42-44a7-9013-25bafc306bd5'
> ...
> Sep 07 18:41:08 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 18:41:08.195329 16051 slave.cpp:5920] Current disk 
> usage 90.85%. Max allowed age: 0ns
> Sep 07 18:41:08 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 18:41:08.195503 16051 gc.cpp:218] Pruning 
> directories with remaining removal time 1.99028708946667days
> ...
> Sep 07 18:49:01 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 18:49:01.253906 16049 gc.cpp:178] Deleted 
> '/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S20/frameworks/9750f9be-89d9-4e02-80d3-bdced653e9c3-0258/executors/node__014b451a-30de-41ee-b0b1-3733c790382c/runs/c5b922e8-eee0-4793-8637-7abbd7f8507e'
> ...
> Sep 07 19:08:01 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 19:08:01.291092 16048 slave.cpp:5920] Current disk 
> usage 91.39%. Max allowed age: 0ns
> Sep 07 19:08:01 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 19:08:01.291285 16048 gc.cpp:218] Pruning 
> directories with remaining removal time 1.99028598086815days
> ...
> Sep 07 19:14:50 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: W0907 19:14:50.737226 16050 gc.cpp:174] Failed to delete 
> '/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S20/frameworks/9750f9be-89d9-4e02-80d3-bdced653e9c3-0258/executors/node__4139bf2e-e33b-4743-8527-f8f50ac49280/runs/b1991e28-7ff8-476f-8122-1a483e431ff2':
>  No such file or directory
> ...
> Sep 07 19:33:50 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 19:33:50.758191 16052 slave.cpp:5920] Current disk 
> usage 91.39%. Max allowed age: 0ns
> Sep 07 19:33:50 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 19:33:50.758872 16047 gc.cpp:218] Pruning 
> directories with remaining removal time 1.99028057238519days
> ...
> Sep 07 19:39:43 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 19:39:43.081485 16052 gc.cpp:178] Deleted 
> '/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S20/frameworks/9750f9be-89d9-4e02-80d3-bdced653e9c3-0258/executors/node__d89dce1f-609b-4cf8-957a-5ba198be7828'
> ...
> Sep 07 19:59:43 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 19:59:43.150535 16048 slave.cpp:5920] Current disk 
> usage 94.56%. Max allowed age: 0ns
> Sep 07 19:59:43 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 19:59:43.150869 16054 gc.cpp:218] Pruning 
> directories with remaining removal time 1.98959316198222days
> ...
> Sep 07 20:06:16 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 20:06:16.251552 16051 gc.cpp:178] Deleted 
> '/var/lib/mesos/slave/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S20/frameworks/9750f9be-89d9-4e02-80d3-bdced653e9c3-0259/executors/data__45283e7d-9a5e-4d4b-9901-b7f1e096cd54/runs/5cfc5e3e-3975-41aa-846b-c125eb529fbe'
> {noformat}
> Each GC activity took 5+ minutes. During the period, the agent became 
> unresponsive, the health check timed out, and no endpoint responded as well. 
> When a disk-usage GC is trigged, around 300 pruning actors would be generated 
> (https://github.com/apache/mesos/blob/master/src/slave/gc.cpp#L229). My 
> hypothesis is that these actors would used all of the worker threads, and 

[jira] [Updated] (MESOS-7964) Heavy-duty GC makes the agent unresponsive

2017-09-11 Thread Chun-Hung Hsiao (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chun-Hung Hsiao updated MESOS-7964:
---
Sprint: Mesosphere Sprint 63

> Heavy-duty GC makes the agent unresponsive
> --
>
> Key: MESOS-7964
> URL: https://issues.apache.org/jira/browse/MESOS-7964
> Project: Mesos
>  Issue Type: Bug
>Reporter: Chun-Hung Hsiao
>Assignee: Chun-Hung Hsiao
> Fix For: 1.4.1
>
>
> An agent is observed to performe heavy-duty GC every half an hour:
> {noformat}
> Sep 07 18:15:56 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 18:15:56.900282 16054 slave.cpp:5920] Current disk 
> usage 93.61%. Max allowed age: 0ns
> Sep 07 18:15:56 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 18:15:56.900476 16054 gc.cpp:218] Pruning 
> directories with remaining removal time 1.99022105972148days
> ...
> Sep 07 18:22:08 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 18:22:08.173645 16050 gc.cpp:178] Deleted 
> '/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S20/frameworks/9750f9be-89d9-4e02-80d3-bdced653e9c3-0258/executors/node__f33065c9-eb42-44a7-9013-25bafc306bd5'
> ...
> Sep 07 18:41:08 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 18:41:08.195329 16051 slave.cpp:5920] Current disk 
> usage 90.85%. Max allowed age: 0ns
> Sep 07 18:41:08 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 18:41:08.195503 16051 gc.cpp:218] Pruning 
> directories with remaining removal time 1.99028708946667days
> ...
> Sep 07 18:49:01 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 18:49:01.253906 16049 gc.cpp:178] Deleted 
> '/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S20/frameworks/9750f9be-89d9-4e02-80d3-bdced653e9c3-0258/executors/node__014b451a-30de-41ee-b0b1-3733c790382c/runs/c5b922e8-eee0-4793-8637-7abbd7f8507e'
> ...
> Sep 07 19:08:01 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 19:08:01.291092 16048 slave.cpp:5920] Current disk 
> usage 91.39%. Max allowed age: 0ns
> Sep 07 19:08:01 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 19:08:01.291285 16048 gc.cpp:218] Pruning 
> directories with remaining removal time 1.99028598086815days
> ...
> Sep 07 19:14:50 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: W0907 19:14:50.737226 16050 gc.cpp:174] Failed to delete 
> '/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S20/frameworks/9750f9be-89d9-4e02-80d3-bdced653e9c3-0258/executors/node__4139bf2e-e33b-4743-8527-f8f50ac49280/runs/b1991e28-7ff8-476f-8122-1a483e431ff2':
>  No such file or directory
> ...
> Sep 07 19:33:50 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 19:33:50.758191 16052 slave.cpp:5920] Current disk 
> usage 91.39%. Max allowed age: 0ns
> Sep 07 19:33:50 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 19:33:50.758872 16047 gc.cpp:218] Pruning 
> directories with remaining removal time 1.99028057238519days
> ...
> Sep 07 19:39:43 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 19:39:43.081485 16052 gc.cpp:178] Deleted 
> '/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S20/frameworks/9750f9be-89d9-4e02-80d3-bdced653e9c3-0258/executors/node__d89dce1f-609b-4cf8-957a-5ba198be7828'
> ...
> Sep 07 19:59:43 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 19:59:43.150535 16048 slave.cpp:5920] Current disk 
> usage 94.56%. Max allowed age: 0ns
> Sep 07 19:59:43 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 19:59:43.150869 16054 gc.cpp:218] Pruning 
> directories with remaining removal time 1.98959316198222days
> ...
> Sep 07 20:06:16 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 20:06:16.251552 16051 gc.cpp:178] Deleted 
> '/var/lib/mesos/slave/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S20/frameworks/9750f9be-89d9-4e02-80d3-bdced653e9c3-0259/executors/data__45283e7d-9a5e-4d4b-9901-b7f1e096cd54/runs/5cfc5e3e-3975-41aa-846b-c125eb529fbe'
> {noformat}
> Each GC activity took 5+ minutes. During the period, the agent became 
> unresponsive, the health check timed out, and no endpoint responded as well. 
> When a disk-usage GC is trigged, around 300 pruning actors would be generated 
> (https://github.com/apache/mesos/blob/master/src/slave/gc.cpp#L229). My 
> hypothesis is that these actors would used all of the worker threads, and 
> some of them took 

[jira] [Updated] (MESOS-7964) Heavy-duty GC makes the agent unresponsive

2017-09-11 Thread Chun-Hung Hsiao (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chun-Hung Hsiao updated MESOS-7964:
---
 Story Points: 2
Fix Version/s: 1.4.1

> Heavy-duty GC makes the agent unresponsive
> --
>
> Key: MESOS-7964
> URL: https://issues.apache.org/jira/browse/MESOS-7964
> Project: Mesos
>  Issue Type: Bug
>Reporter: Chun-Hung Hsiao
>Assignee: Chun-Hung Hsiao
> Fix For: 1.4.1
>
>
> An agent is observed to performe heavy-duty GC every half an hour:
> {noformat}
> Sep 07 18:15:56 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 18:15:56.900282 16054 slave.cpp:5920] Current disk 
> usage 93.61%. Max allowed age: 0ns
> Sep 07 18:15:56 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 18:15:56.900476 16054 gc.cpp:218] Pruning 
> directories with remaining removal time 1.99022105972148days
> ...
> Sep 07 18:22:08 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 18:22:08.173645 16050 gc.cpp:178] Deleted 
> '/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S20/frameworks/9750f9be-89d9-4e02-80d3-bdced653e9c3-0258/executors/node__f33065c9-eb42-44a7-9013-25bafc306bd5'
> ...
> Sep 07 18:41:08 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 18:41:08.195329 16051 slave.cpp:5920] Current disk 
> usage 90.85%. Max allowed age: 0ns
> Sep 07 18:41:08 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 18:41:08.195503 16051 gc.cpp:218] Pruning 
> directories with remaining removal time 1.99028708946667days
> ...
> Sep 07 18:49:01 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 18:49:01.253906 16049 gc.cpp:178] Deleted 
> '/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S20/frameworks/9750f9be-89d9-4e02-80d3-bdced653e9c3-0258/executors/node__014b451a-30de-41ee-b0b1-3733c790382c/runs/c5b922e8-eee0-4793-8637-7abbd7f8507e'
> ...
> Sep 07 19:08:01 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 19:08:01.291092 16048 slave.cpp:5920] Current disk 
> usage 91.39%. Max allowed age: 0ns
> Sep 07 19:08:01 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 19:08:01.291285 16048 gc.cpp:218] Pruning 
> directories with remaining removal time 1.99028598086815days
> ...
> Sep 07 19:14:50 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: W0907 19:14:50.737226 16050 gc.cpp:174] Failed to delete 
> '/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S20/frameworks/9750f9be-89d9-4e02-80d3-bdced653e9c3-0258/executors/node__4139bf2e-e33b-4743-8527-f8f50ac49280/runs/b1991e28-7ff8-476f-8122-1a483e431ff2':
>  No such file or directory
> ...
> Sep 07 19:33:50 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 19:33:50.758191 16052 slave.cpp:5920] Current disk 
> usage 91.39%. Max allowed age: 0ns
> Sep 07 19:33:50 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 19:33:50.758872 16047 gc.cpp:218] Pruning 
> directories with remaining removal time 1.99028057238519days
> ...
> Sep 07 19:39:43 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 19:39:43.081485 16052 gc.cpp:178] Deleted 
> '/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S20/frameworks/9750f9be-89d9-4e02-80d3-bdced653e9c3-0258/executors/node__d89dce1f-609b-4cf8-957a-5ba198be7828'
> ...
> Sep 07 19:59:43 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 19:59:43.150535 16048 slave.cpp:5920] Current disk 
> usage 94.56%. Max allowed age: 0ns
> Sep 07 19:59:43 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 19:59:43.150869 16054 gc.cpp:218] Pruning 
> directories with remaining removal time 1.98959316198222days
> ...
> Sep 07 20:06:16 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 20:06:16.251552 16051 gc.cpp:178] Deleted 
> '/var/lib/mesos/slave/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S20/frameworks/9750f9be-89d9-4e02-80d3-bdced653e9c3-0259/executors/data__45283e7d-9a5e-4d4b-9901-b7f1e096cd54/runs/5cfc5e3e-3975-41aa-846b-c125eb529fbe'
> {noformat}
> Each GC activity took 5+ minutes. During the period, the agent became 
> unresponsive, the health check timed out, and no endpoint responded as well. 
> When a disk-usage GC is trigged, around 300 pruning actors would be generated 
> (https://github.com/apache/mesos/blob/master/src/slave/gc.cpp#L229). My 
> hypothesis is that these actors would used all of the worker threads, and 
> some 

[jira] [Updated] (MESOS-7964) Heavy-duty GC makes the agent unresponsive

2017-09-11 Thread Chun-Hung Hsiao (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chun-Hung Hsiao updated MESOS-7964:
---
Shepherd: Yan Xu

> Heavy-duty GC makes the agent unresponsive
> --
>
> Key: MESOS-7964
> URL: https://issues.apache.org/jira/browse/MESOS-7964
> Project: Mesos
>  Issue Type: Bug
>Reporter: Chun-Hung Hsiao
>Assignee: Chun-Hung Hsiao
>
> An agent is observed to performe heavy-duty GC every half an hour:
> {noformat}
> Sep 07 18:15:56 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 18:15:56.900282 16054 slave.cpp:5920] Current disk 
> usage 93.61%. Max allowed age: 0ns
> Sep 07 18:15:56 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 18:15:56.900476 16054 gc.cpp:218] Pruning 
> directories with remaining removal time 1.99022105972148days
> ...
> Sep 07 18:22:08 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 18:22:08.173645 16050 gc.cpp:178] Deleted 
> '/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S20/frameworks/9750f9be-89d9-4e02-80d3-bdced653e9c3-0258/executors/node__f33065c9-eb42-44a7-9013-25bafc306bd5'
> ...
> Sep 07 18:41:08 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 18:41:08.195329 16051 slave.cpp:5920] Current disk 
> usage 90.85%. Max allowed age: 0ns
> Sep 07 18:41:08 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 18:41:08.195503 16051 gc.cpp:218] Pruning 
> directories with remaining removal time 1.99028708946667days
> ...
> Sep 07 18:49:01 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 18:49:01.253906 16049 gc.cpp:178] Deleted 
> '/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S20/frameworks/9750f9be-89d9-4e02-80d3-bdced653e9c3-0258/executors/node__014b451a-30de-41ee-b0b1-3733c790382c/runs/c5b922e8-eee0-4793-8637-7abbd7f8507e'
> ...
> Sep 07 19:08:01 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 19:08:01.291092 16048 slave.cpp:5920] Current disk 
> usage 91.39%. Max allowed age: 0ns
> Sep 07 19:08:01 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 19:08:01.291285 16048 gc.cpp:218] Pruning 
> directories with remaining removal time 1.99028598086815days
> ...
> Sep 07 19:14:50 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: W0907 19:14:50.737226 16050 gc.cpp:174] Failed to delete 
> '/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S20/frameworks/9750f9be-89d9-4e02-80d3-bdced653e9c3-0258/executors/node__4139bf2e-e33b-4743-8527-f8f50ac49280/runs/b1991e28-7ff8-476f-8122-1a483e431ff2':
>  No such file or directory
> ...
> Sep 07 19:33:50 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 19:33:50.758191 16052 slave.cpp:5920] Current disk 
> usage 91.39%. Max allowed age: 0ns
> Sep 07 19:33:50 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 19:33:50.758872 16047 gc.cpp:218] Pruning 
> directories with remaining removal time 1.99028057238519days
> ...
> Sep 07 19:39:43 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 19:39:43.081485 16052 gc.cpp:178] Deleted 
> '/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S20/frameworks/9750f9be-89d9-4e02-80d3-bdced653e9c3-0258/executors/node__d89dce1f-609b-4cf8-957a-5ba198be7828'
> ...
> Sep 07 19:59:43 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 19:59:43.150535 16048 slave.cpp:5920] Current disk 
> usage 94.56%. Max allowed age: 0ns
> Sep 07 19:59:43 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 19:59:43.150869 16054 gc.cpp:218] Pruning 
> directories with remaining removal time 1.98959316198222days
> ...
> Sep 07 20:06:16 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 20:06:16.251552 16051 gc.cpp:178] Deleted 
> '/var/lib/mesos/slave/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S20/frameworks/9750f9be-89d9-4e02-80d3-bdced653e9c3-0259/executors/data__45283e7d-9a5e-4d4b-9901-b7f1e096cd54/runs/5cfc5e3e-3975-41aa-846b-c125eb529fbe'
> {noformat}
> Each GC activity took 5+ minutes. During the period, the agent became 
> unresponsive, the health check timed out, and no endpoint responded as well. 
> When a disk-usage GC is trigged, around 300 pruning actors would be generated 
> (https://github.com/apache/mesos/blob/master/src/slave/gc.cpp#L229). My 
> hypothesis is that these actors would used all of the worker threads, and 
> some of them took a long time to finish (possibly due to