[jira] [Updated] (MESOS-4869) /usr/libexec/mesos/mesos-health-check using/leaking a lot of memory
[ https://issues.apache.org/jira/browse/MESOS-4869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] haosdent updated MESOS-4869: Labels: health-check (was: ) > /usr/libexec/mesos/mesos-health-check using/leaking a lot of memory > --- > > Key: MESOS-4869 > URL: https://issues.apache.org/jira/browse/MESOS-4869 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.27.1 >Reporter: Anthony Scalisi >Priority: Critical > Labels: health-check > > We switched our health checks in Marathon from HTTP to COMMAND: > {noformat} > "healthChecks": [ > { > "protocol": "COMMAND", > "path": "/ops/ping", > "command": { "value": "curl --silent -f -X GET > http://$HOST:$PORT0/ops/ping > /dev/null" }, > "gracePeriodSeconds": 90, > "intervalSeconds": 2, > "portIndex": 0, > "timeoutSeconds": 5, > "maxConsecutiveFailures": 3 > } > ] > {noformat} > All our applications have the same health check (and /ops/ping endpoint). > Even though we have the issue on all our Meos slaves, I'm going to focus on a > particular one: *mesos-slave-i-e3a9c724*. > The slave has 16 gigs of memory, with about 12 gigs allocated for 8 tasks: > !https://i.imgur.com/gbRf804.png! > Here is a *docker ps* on it: > {noformat} > root@mesos-slave-i-e3a9c724 # docker ps > CONTAINER IDIMAGE COMMAND CREATED >STATUS PORTS NAMES > 4f7c0aa8d03ajava:8 "/bin/sh -c 'JAVA_OPT" 6 hours ago >Up 6 hours 0.0.0.0:31926->8080/tcp > mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.3dbb1004-5bb8-432f-8fd8-b863bd29341d > 66f2fc8f8056java:8 "/bin/sh -c 'JAVA_OPT" 6 hours ago >Up 6 hours 0.0.0.0:31939->8080/tcp > mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.60972150-b2b1-45d8-8a55-d63e81b8372a > f7382f241fcejava:8 "/bin/sh -c 'JAVA_OPT" 6 hours ago >Up 6 hours 0.0.0.0:31656->8080/tcp > mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.39731a2f-d29e-48d1-9927-34ab8c5f557d > 880934c0049ejava:8 "/bin/sh -c 'JAVA_OPT" 24 hours ago >Up 24 hours 0.0.0.0:31371->8080/tcp > mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.23dfe408-ab8f-40be-bf6f-ce27fe885ee0 > 5eab1f8dac4ajava:8 "/bin/sh -c 'JAVA_OPT" 46 hours ago >Up 46 hours 0.0.0.0:31500->8080/tcp > mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.5ac75198-283f-4349-a220-9e9645b313e7 > b63740fe56e7java:8 "/bin/sh -c 'JAVA_OPT" 46 hours ago >Up 46 hours 0.0.0.0:31382->8080/tcp > mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.5d417f16-df24-49d5-a5b0-38a7966460fe > 5c7a9ea77b0ejava:8 "/bin/sh -c 'JAVA_OPT" 2 days ago >Up 2 days 0.0.0.0:31186->8080/tcp > mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.b05043c5-44fc-40bf-aea2-10354e8f5ab4 > 53065e7a31adjava:8 "/bin/sh -c 'JAVA_OPT" 2 days ago >Up 2 days 0.0.0.0:31839->8080/tcp > mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.f0a3f4c5-ecdb-4f97-bede-d744feda670c > {noformat} > Here is a *docker stats* on it: > {noformat} > root@mesos-slave-i-e3a9c724 # docker stats > CONTAINER CPU % MEM USAGE / LIMIT MEM % > NET I/O BLOCK I/O > 4f7c0aa8d03a2.93% 797.3 MB / 1.611 GB 49.50% > 1.277 GB / 1.189 GB 155.6 kB / 151.6 kB > 53065e7a31ad8.30% 738.9 MB / 1.611 GB 45.88% > 419.6 MB / 554.3 MB 98.3 kB / 61.44 kB > 5c7a9ea77b0e4.91% 1.081 GB / 1.611 GB 67.10% > 423 MB / 526.5 MB 3.219 MB / 61.44 kB > 5eab1f8dac4a3.13% 1.007 GB / 1.611 GB 62.53% > 2.737 GB / 2.564 GB 6.566 MB / 118.8 kB > 66f2fc8f80563.15% 768.1 MB / 1.611 GB 47.69% > 258.5 MB / 252.8 MB 1.86 MB / 151.6 kB > 880934c0049e10.07% 735.1 MB / 1.611 GB 45.64% > 1.451 GB / 1.399 GB 573.4 kB / 94.21 kB > b63740fe56e712.04% 629 MB / 1.611 GB 39.06% > 10.29 GB / 9.344 GB 8.102 MB / 61.44 kB > f7382f241fce6.21% 505 MB / 1.611 GB 31.36% > 153.4 MB / 151.9 MB 5.837 MB / 94.21 kB > {noformat} > Not much else is running on the slave, yet the used memory doesn't map to the > tasks memory: > {noformat} > Mem:16047M used:13340M buffers:1139M cache:776M > {noformat} > If I exec into the container (*java:8* image), I can see correctly the shell > calls to execute the curl specified in the
[jira] [Updated] (MESOS-4869) /usr/libexec/mesos/mesos-health-check using/leaking a lot of memory
[ https://issues.apache.org/jira/browse/MESOS-4869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anthony Scalisi updated MESOS-4869: --- Description: We switched our health checks in Marathon from HTTP to COMMAND: {noformat} "healthChecks": [ { "protocol": "COMMAND", "path": "/ops/ping", "command": { "value": "curl --silent -f -X GET http://$HOST:$PORT0/ops/ping > /dev/null" }, "gracePeriodSeconds": 90, "intervalSeconds": 2, "portIndex": 0, "timeoutSeconds": 5, "maxConsecutiveFailures": 3 } ] {noformat} All our applications have the same health check (and /ops/ping endpoint). Even though we have the issue on all our Meos slaves, I'm going to focus on a particular one: *mesos-slave-i-e3a9c724*. The slave has 16 gigs of memory, with about 12 gigs allocated for 8 tasks: !https://i.imgur.com/gbRf804.png! Here is a *docker ps* on it: {noformat} root@mesos-slave-i-e3a9c724 # docker ps CONTAINER IDIMAGE COMMAND CREATED STATUS PORTS NAMES 4f7c0aa8d03ajava:8 "/bin/sh -c 'JAVA_OPT" 6 hours ago Up 6 hours 0.0.0.0:31926->8080/tcp mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.3dbb1004-5bb8-432f-8fd8-b863bd29341d 66f2fc8f8056java:8 "/bin/sh -c 'JAVA_OPT" 6 hours ago Up 6 hours 0.0.0.0:31939->8080/tcp mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.60972150-b2b1-45d8-8a55-d63e81b8372a f7382f241fcejava:8 "/bin/sh -c 'JAVA_OPT" 6 hours ago Up 6 hours 0.0.0.0:31656->8080/tcp mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.39731a2f-d29e-48d1-9927-34ab8c5f557d 880934c0049ejava:8 "/bin/sh -c 'JAVA_OPT" 24 hours ago Up 24 hours 0.0.0.0:31371->8080/tcp mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.23dfe408-ab8f-40be-bf6f-ce27fe885ee0 5eab1f8dac4ajava:8 "/bin/sh -c 'JAVA_OPT" 46 hours ago Up 46 hours 0.0.0.0:31500->8080/tcp mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.5ac75198-283f-4349-a220-9e9645b313e7 b63740fe56e7java:8 "/bin/sh -c 'JAVA_OPT" 46 hours ago Up 46 hours 0.0.0.0:31382->8080/tcp mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.5d417f16-df24-49d5-a5b0-38a7966460fe 5c7a9ea77b0ejava:8 "/bin/sh -c 'JAVA_OPT" 2 days ago Up 2 days 0.0.0.0:31186->8080/tcp mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.b05043c5-44fc-40bf-aea2-10354e8f5ab4 53065e7a31adjava:8 "/bin/sh -c 'JAVA_OPT" 2 days ago Up 2 days 0.0.0.0:31839->8080/tcp mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.f0a3f4c5-ecdb-4f97-bede-d744feda670c {noformat} Here is a *docker stats* on it: {noformat} root@mesos-slave-i-e3a9c724 # docker stats CONTAINER CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O 4f7c0aa8d03a2.93% 797.3 MB / 1.611 GB 49.50% 1.277 GB / 1.189 GB 155.6 kB / 151.6 kB 53065e7a31ad8.30% 738.9 MB / 1.611 GB 45.88% 419.6 MB / 554.3 MB 98.3 kB / 61.44 kB 5c7a9ea77b0e4.91% 1.081 GB / 1.611 GB 67.10% 423 MB / 526.5 MB 3.219 MB / 61.44 kB 5eab1f8dac4a3.13% 1.007 GB / 1.611 GB 62.53% 2.737 GB / 2.564 GB 6.566 MB / 118.8 kB 66f2fc8f80563.15% 768.1 MB / 1.611 GB 47.69% 258.5 MB / 252.8 MB 1.86 MB / 151.6 kB 880934c0049e10.07% 735.1 MB / 1.611 GB 45.64% 1.451 GB / 1.399 GB 573.4 kB / 94.21 kB b63740fe56e712.04% 629 MB / 1.611 GB 39.06% 10.29 GB / 9.344 GB 8.102 MB / 61.44 kB f7382f241fce6.21% 505 MB / 1.611 GB 31.36% 153.4 MB / 151.9 MB 5.837 MB / 94.21 kB {noformat} Not much else is running on the slave, yet the used memory doesn't map to the tasks memory: {noformat} Mem:16047M used:13340M buffers:1139M cache:776M {noformat} If I exec into the container (*java:8* image), I can see correctly the shell calls to execute the curl specified in the health check as expected and exit correctly. The only change we noticed since the memory usage woes was related to moving to Mesos doing the health checks instead, so I decided to take a look: {noformat} root@mesos-slave-i-e3a9c724 # ps awwx | grep health_check | grep -v grep 2504 ?Sl47:33 /usr/libexec/mesos/mesos-health-check --executor=(1)@10.92.32.63:53432 --health_check_json={"command":{"shell":true,"value":"docker exec mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.f0a3f4c5-ecdb-4f97-bede-d744feda670c sh -c \" curl --silent -f -X GET
[jira] [Updated] (MESOS-4869) /usr/libexec/mesos/mesos-health-check using/leaking a lot of memory
[ https://issues.apache.org/jira/browse/MESOS-4869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anthony Scalisi updated MESOS-4869: --- Description: We switched our health checks in Marathon from HTTP to COMMAND: {noformat} "healthChecks": [ { "protocol": "COMMAND", "path": "/ops/ping", "command": { "value": "curl --silent -f -X GET http://$HOST:$PORT0/ops/ping > /dev/null" }, "gracePeriodSeconds": 90, "intervalSeconds": 2, "portIndex": 0, "timeoutSeconds": 5, "maxConsecutiveFailures": 3 } ] {noformat} All our applications have the same health check (and /ops/ping endpoint). Even though we have the issue on all our Meos slaves, I'm going to focus on a particular one: *mesos-slave-i-e3a9c724*. The slave has 16 gigs of memory, with about 12 gigs allocated for 8 tasks: !https://i.imgur.com/gbRf804.png! Here is a *docker ps* on it: {noformat} root@mesos-slave-i-e3a9c724 # docker ps CONTAINER IDIMAGE COMMAND CREATED STATUS PORTS NAMES 4f7c0aa8d03ajava:8 "/bin/sh -c 'JAVA_OPT" 6 hours ago Up 6 hours 0.0.0.0:31926->8080/tcp mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.3dbb1004-5bb8-432f-8fd8-b863bd29341d 66f2fc8f8056java:8 "/bin/sh -c 'JAVA_OPT" 6 hours ago Up 6 hours 0.0.0.0:31939->8080/tcp mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.60972150-b2b1-45d8-8a55-d63e81b8372a f7382f241fcejava:8 "/bin/sh -c 'JAVA_OPT" 6 hours ago Up 6 hours 0.0.0.0:31656->8080/tcp mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.39731a2f-d29e-48d1-9927-34ab8c5f557d 880934c0049ejava:8 "/bin/sh -c 'JAVA_OPT" 24 hours ago Up 24 hours 0.0.0.0:31371->8080/tcp mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.23dfe408-ab8f-40be-bf6f-ce27fe885ee0 5eab1f8dac4ajava:8 "/bin/sh -c 'JAVA_OPT" 46 hours ago Up 46 hours 0.0.0.0:31500->8080/tcp mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.5ac75198-283f-4349-a220-9e9645b313e7 b63740fe56e7java:8 "/bin/sh -c 'JAVA_OPT" 46 hours ago Up 46 hours 0.0.0.0:31382->8080/tcp mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.5d417f16-df24-49d5-a5b0-38a7966460fe 5c7a9ea77b0ejava:8 "/bin/sh -c 'JAVA_OPT" 2 days ago Up 2 days 0.0.0.0:31186->8080/tcp mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.b05043c5-44fc-40bf-aea2-10354e8f5ab4 53065e7a31adjava:8 "/bin/sh -c 'JAVA_OPT" 2 days ago Up 2 days 0.0.0.0:31839->8080/tcp mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.f0a3f4c5-ecdb-4f97-bede-d744feda670c {noformat} Here is a *docker stats* on it: {noformat} root@mesos-slave-i-e3a9c724 # docker stats CONTAINER CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O 4f7c0aa8d03a2.93% 797.3 MB / 1.611 GB 49.50% 1.277 GB / 1.189 GB 155.6 kB / 151.6 kB 53065e7a31ad8.30% 738.9 MB / 1.611 GB 45.88% 419.6 MB / 554.3 MB 98.3 kB / 61.44 kB 5c7a9ea77b0e4.91% 1.081 GB / 1.611 GB 67.10% 423 MB / 526.5 MB 3.219 MB / 61.44 kB 5eab1f8dac4a3.13% 1.007 GB / 1.611 GB 62.53% 2.737 GB / 2.564 GB 6.566 MB / 118.8 kB 66f2fc8f80563.15% 768.1 MB / 1.611 GB 47.69% 258.5 MB / 252.8 MB 1.86 MB / 151.6 kB 880934c0049e10.07% 735.1 MB / 1.611 GB 45.64% 1.451 GB / 1.399 GB 573.4 kB / 94.21 kB b63740fe56e712.04% 629 MB / 1.611 GB 39.06% 10.29 GB / 9.344 GB 8.102 MB / 61.44 kB f7382f241fce6.21% 505 MB / 1.611 GB 31.36% 153.4 MB / 151.9 MB 5.837 MB / 94.21 kB {noformat} Not much else is running on the slave, yet the used memory doesn't map to the tasks memory: {noformat} Mem:16047M used:13340M buffers:1139M cache:776M {noformat} If I exec into the container (*java:8* image), I can see correctly the shell calls to execute the curl specified in the health check as expected and exit correctly. The only change we noticed since the memory usage woes was related to moving to Mesos doing the health checks instead, so I decided to take a look: {noformat} root@mesos-slave-i-e3a9c724 # ps awwx | grep health_check | grep -v grep 2504 ?Sl47:33 /usr/libexec/mesos/mesos-health-check --executor=(1)@10.92.32.63:53432 --health_check_json={"command":{"shell":true,"value":"docker exec mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.f0a3f4c5-ecdb-4f97-bede-d744feda670c sh -c \" curl --silent -f -X GET
[jira] [Updated] (MESOS-4869) /usr/libexec/mesos/mesos-health-check using/leaking a lot of memory
[ https://issues.apache.org/jira/browse/MESOS-4869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anthony Scalisi updated MESOS-4869: --- Description: We switched our health checks in Marathon from HTTP to COMMAND: {noformat} "healthChecks": [ { "protocol": "COMMAND", "path": "/ops/ping", "command": { "value": "curl --silent -f -X GET http://$HOST:$PORT0/ops/ping > /dev/null" }, "gracePeriodSeconds": 90, "intervalSeconds": 2, "portIndex": 0, "timeoutSeconds": 5, "maxConsecutiveFailures": 3 } ] {noformat} All our applications have the same health check (and /ops/ping endpoint). Even though we have the issue on all our Meos slaves, I'm going to focus on a particular one: *mesos-slave-i-e3a9c724*. The slave has 16 gigs of memory, with about 12 gigs allocated for 8 tasks: !https://i.imgur.com/gbRf804.png! Here is a *docker ps* on it: {noformat} root@mesos-slave-i-e3a9c724 # docker ps CONTAINER IDIMAGE COMMAND CREATED STATUS PORTS NAMES 4f7c0aa8d03ajava:8 "/bin/sh -c 'JAVA_OPT" 6 hours ago Up 6 hours 0.0.0.0:31926->8080/tcp mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.3dbb1004-5bb8-432f-8fd8-b863bd29341d 66f2fc8f8056java:8 "/bin/sh -c 'JAVA_OPT" 6 hours ago Up 6 hours 0.0.0.0:31939->8080/tcp mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.60972150-b2b1-45d8-8a55-d63e81b8372a f7382f241fcejava:8 "/bin/sh -c 'JAVA_OPT" 6 hours ago Up 6 hours 0.0.0.0:31656->8080/tcp mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.39731a2f-d29e-48d1-9927-34ab8c5f557d 880934c0049ejava:8 "/bin/sh -c 'JAVA_OPT" 24 hours ago Up 24 hours 0.0.0.0:31371->8080/tcp mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.23dfe408-ab8f-40be-bf6f-ce27fe885ee0 5eab1f8dac4ajava:8 "/bin/sh -c 'JAVA_OPT" 46 hours ago Up 46 hours 0.0.0.0:31500->8080/tcp mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.5ac75198-283f-4349-a220-9e9645b313e7 b63740fe56e7java:8 "/bin/sh -c 'JAVA_OPT" 46 hours ago Up 46 hours 0.0.0.0:31382->8080/tcp mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.5d417f16-df24-49d5-a5b0-38a7966460fe 5c7a9ea77b0ejava:8 "/bin/sh -c 'JAVA_OPT" 2 days ago Up 2 days 0.0.0.0:31186->8080/tcp mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.b05043c5-44fc-40bf-aea2-10354e8f5ab4 53065e7a31adjava:8 "/bin/sh -c 'JAVA_OPT" 2 days ago Up 2 days 0.0.0.0:31839->8080/tcp mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.f0a3f4c5-ecdb-4f97-bede-d744feda670c {noformat} Here is a *docker stats* on it: {quote} root@mesos-slave-i-e3a9c724 # docker stats CONTAINER CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O 4f7c0aa8d03a2.93% 797.3 MB / 1.611 GB 49.50% 1.277 GB / 1.189 GB 155.6 kB / 151.6 kB 53065e7a31ad8.30% 738.9 MB / 1.611 GB 45.88% 419.6 MB / 554.3 MB 98.3 kB / 61.44 kB 5c7a9ea77b0e4.91% 1.081 GB / 1.611 GB 67.10% 423 MB / 526.5 MB 3.219 MB / 61.44 kB 5eab1f8dac4a3.13% 1.007 GB / 1.611 GB 62.53% 2.737 GB / 2.564 GB 6.566 MB / 118.8 kB 66f2fc8f80563.15% 768.1 MB / 1.611 GB 47.69% 258.5 MB / 252.8 MB 1.86 MB / 151.6 kB 880934c0049e10.07% 735.1 MB / 1.611 GB 45.64% 1.451 GB / 1.399 GB 573.4 kB / 94.21 kB b63740fe56e712.04% 629 MB / 1.611 GB 39.06% 10.29 GB / 9.344 GB 8.102 MB / 61.44 kB f7382f241fce6.21% 505 MB / 1.611 GB 31.36% 153.4 MB / 151.9 MB 5.837 MB / 94.21 kB {noformat} Not much else is running on the slave, yet the used memory doesn't map to the tasks memory: {noformat} Mem:16047M used:13340M buffers:1139M cache:776M {noformat} If I exec into the container (*java:8* image), I can see correctly the shell calls to execute the curl specified in the health check as expected and exit correctly. The only change we noticed since the memory usage woes was related to moving to Mesos doing the health checks instead, so I decided to take a look: {noformat} root@mesos-slave-i-e3a9c724 # ps awwx | grep health_check | grep -v grep 2504 ?Sl47:33 /usr/libexec/mesos/mesos-health-check --executor=(1)@10.92.32.63:53432 --health_check_json={"command":{"shell":true,"value":"docker exec mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.f0a3f4c5-ecdb-4f97-bede-d744feda670c sh -c \" curl --silent -f -X GET
[jira] [Updated] (MESOS-4869) /usr/libexec/mesos/mesos-health-check using/leaking a lot of memory
[ https://issues.apache.org/jira/browse/MESOS-4869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anthony Scalisi updated MESOS-4869: --- Description: We switched our health checks in Marathon from HTTP to COMMAND: {noformat} "healthChecks": [ { "protocol": "COMMAND", "path": "/ops/ping", "command": { "value": "curl --silent -f -X GET http://$HOST:$PORT0/ops/ping > /dev/null" }, "gracePeriodSeconds": 90, "intervalSeconds": 2, "portIndex": 0, "timeoutSeconds": 5, "maxConsecutiveFailures": 3 } ] {noformat} All our applications have the same health check (and /ops/ping endpoint). Even though we have the issue on all our Meos slaves, I'm going to focus on a particular one: *mesos-slave-i-e3a9c724*. The slave has 16 gigs of memory, with about 12 gigs allocated for 8 tasks: !https://i.imgur.com/gbRf804.png! Here is a *docker ps* on it: {noformat} root@mesos-slave-i-e3a9c724 # docker ps CONTAINER IDIMAGE COMMAND CREATED STATUS PORTS NAMES 4f7c0aa8d03ajava:8 "/bin/sh -c 'JAVA_OPT" 6 hours ago Up 6 hours 0.0.0.0:31926->8080/tcp mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.3dbb1004-5bb8-432f-8fd8-b863bd29341d 66f2fc8f8056java:8 "/bin/sh -c 'JAVA_OPT" 6 hours ago Up 6 hours 0.0.0.0:31939->8080/tcp mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.60972150-b2b1-45d8-8a55-d63e81b8372a f7382f241fcejava:8 "/bin/sh -c 'JAVA_OPT" 6 hours ago Up 6 hours 0.0.0.0:31656->8080/tcp mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.39731a2f-d29e-48d1-9927-34ab8c5f557d 880934c0049ejava:8 "/bin/sh -c 'JAVA_OPT" 24 hours ago Up 24 hours 0.0.0.0:31371->8080/tcp mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.23dfe408-ab8f-40be-bf6f-ce27fe885ee0 5eab1f8dac4ajava:8 "/bin/sh -c 'JAVA_OPT" 46 hours ago Up 46 hours 0.0.0.0:31500->8080/tcp mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.5ac75198-283f-4349-a220-9e9645b313e7 b63740fe56e7java:8 "/bin/sh -c 'JAVA_OPT" 46 hours ago Up 46 hours 0.0.0.0:31382->8080/tcp mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.5d417f16-df24-49d5-a5b0-38a7966460fe 5c7a9ea77b0ejava:8 "/bin/sh -c 'JAVA_OPT" 2 days ago Up 2 days 0.0.0.0:31186->8080/tcp mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.b05043c5-44fc-40bf-aea2-10354e8f5ab4 53065e7a31adjava:8 "/bin/sh -c 'JAVA_OPT" 2 days ago Up 2 days 0.0.0.0:31839->8080/tcp mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.f0a3f4c5-ecdb-4f97-bede-d744feda670c {quote} Here is a *docker stats* on it: {quote} root@mesos-slave-i-e3a9c724 # docker stats CONTAINER CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O 4f7c0aa8d03a2.93% 797.3 MB / 1.611 GB 49.50% 1.277 GB / 1.189 GB 155.6 kB / 151.6 kB 53065e7a31ad8.30% 738.9 MB / 1.611 GB 45.88% 419.6 MB / 554.3 MB 98.3 kB / 61.44 kB 5c7a9ea77b0e4.91% 1.081 GB / 1.611 GB 67.10% 423 MB / 526.5 MB 3.219 MB / 61.44 kB 5eab1f8dac4a3.13% 1.007 GB / 1.611 GB 62.53% 2.737 GB / 2.564 GB 6.566 MB / 118.8 kB 66f2fc8f80563.15% 768.1 MB / 1.611 GB 47.69% 258.5 MB / 252.8 MB 1.86 MB / 151.6 kB 880934c0049e10.07% 735.1 MB / 1.611 GB 45.64% 1.451 GB / 1.399 GB 573.4 kB / 94.21 kB b63740fe56e712.04% 629 MB / 1.611 GB 39.06% 10.29 GB / 9.344 GB 8.102 MB / 61.44 kB f7382f241fce6.21% 505 MB / 1.611 GB 31.36% 153.4 MB / 151.9 MB 5.837 MB / 94.21 kB {noformat} Not much else is running on the slave, yet the used memory doesn't map to the tasks memory: {noformat} Mem:16047M used:13340M buffers:1139M cache:776M {noformat} If I exec into the container (*java:8* image), I can see correctly the shell calls to execute the curl specified in the health check as expected and exit correctly. The only change we noticed since the memory usage woes was related to moving to Mesos doing the health checks instead, so I decided to take a look: {noformat} root@mesos-slave-i-e3a9c724 # ps awwx | grep health_check | grep -v grep 2504 ?Sl47:33 /usr/libexec/mesos/mesos-health-check --executor=(1)@10.92.32.63:53432 --health_check_json={"command":{"shell":true,"value":"docker exec mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.f0a3f4c5-ecdb-4f97-bede-d744feda670c sh -c \" curl --silent -f -X GET http:\/\/$HOST:$PORT0\/ops\/ping >