Hi Tim,
Things have gotten slightly odder (if that's possible). When I now start
the application 5 or so containers, only one "ecxconfigdb" gets started -
and even he took a few tries. That is, I see him failing, moving to
deploying, then starting again. But I've no evidence (no STDOUT, and no
docker ctr logs) that show why.
In any event, ecxconfigdb does start. Happily, when I try to stop the
application I am seeing the phenomena I posted before: killing docker task,
shutting down repeated many times. The UN-stopped container is now running
at 100% CPU.
I will try modifying docker_stop_timeout. Back shortly
Thanks again.
-Paul
PS: what do you make of the "broken pipe" error in the docker.log?
*from /var/log/upstart/docker.log*
[34mINFO[0m[3054] GET /v1.15/images/mongo:2.6.8/json
[34mINFO[0m[3054] GET
/v1.21/images/mesos-20160114-153418-1674208327-5050-3798-S0.c5c35d59-1318-4a96-b850-b0b788815f1b/json
[31mERRO[0m[3054] Handler for GET
/v1.21/images/mesos-20160114-153418-1674208327-5050-3798-S0.c5c35d59-1318-4a96-b850-b0b788815f1b/json
returned error: No such image:
mesos-20160114-153418-1674208327-5050-3798-S0.c5c35d59-1318-4a96-b850-b0b788815f1b
[31mERRO[0m[3054] HTTP Error
[31merr[0m=No such image:
mesos-20160114-153418-1674208327-5050-3798-S0.c5c35d59-1318-4a96-b850-b0b788815f1b
[31mstatusCode[0m=404
[34mINFO[0m[3054] GET /v1.15/containers/weave/json
[34mINFO[0m[3054] POST
/v1.21/containers/create?name=mesos-20160114-153418-1674208327-5050-3798-S0.c5c35d59-1318-4a96-b850-b0b788815f1b
[34mINFO[0m[3054] POST
/v1.21/containers/cf7fc7c483248e30f1dbb5990ce8874f2bfbe936c74eed1fc9af6f70653a1d47/attach?stderr=1&stdout=1&stream=1
[34mINFO[0m[3054] POST
/v1.21/containers/cf7fc7c483248e30f1dbb5990ce8874f2bfbe936c74eed1fc9af6f70653a1d47/start
[34mINFO[0m[3054] GET
/v1.15/containers/cf7fc7c483248e30f1dbb5990ce8874f2bfbe936c74eed1fc9af6f70653a1d47/json
[34mINFO[0m[3054] GET
/v1.15/containers/cf7fc7c483248e30f1dbb5990ce8874f2bfbe936c74eed1fc9af6f70653a1d47/json
[34mINFO[0m[3054] GET /v1.15/containers/weave/json
[34mINFO[0m[3054] GET
/v1.15/containers/cf7fc7c483248e30f1dbb5990ce8874f2bfbe936c74eed1fc9af6f70653a1d47/json
[34mINFO[0m[3054] GET
/v1.15/containers/cf7fc7c483248e30f1dbb5990ce8874f2bfbe936c74eed1fc9af6f70653a1d47/json
[34mINFO[0m[3054] GET /v1.15/containers/weave/json
[34mINFO[0m[3054] GET
/v1.15/containers/cf7fc7c483248e30f1dbb5990ce8874f2bfbe936c74eed1fc9af6f70653a1d47/json
[34mINFO[0m[3054] GET
/v1.21/containers/mesos-20160114-153418-1674208327-5050-3798-S0.c5c35d59-1318-4a96-b850-b0b788815f1b/json
[34mINFO[0m[3111] GET /v1.21/containers/json
[34mINFO[0m[3120] GET /v1.21/containers/cf7/json
[34mINFO[0m[3120] GET
/v1.21/containers/cf7/logs?stderr=1&stdout=1&tail=all
[34mINFO[0m[3153] GET /containers/json
[34mINFO[0m[3153] GET
/containers/cf7fc7c483248e30f1dbb5990ce8874f2bfbe936c74eed1fc9af6f70653a1d47/json
[34mINFO[0m[3153] GET
/containers/56111722ef83134f6c73c5e3aa27de3f34f1fa73efdec3257c3cc9b283e40729/json
[34mINFO[0m[3153] GET
/containers/b9e9b79a8d431455bfcaafca59223017b2470a47a294075d656eeffdaaefad33/json
[34mINFO[0m[3175] GET /containers/json
[34mINFO[0m[3175] GET
/containers/cf7fc7c483248e30f1dbb5990ce8874f2bfbe936c74eed1fc9af6f70653a1d47/json
[34mINFO[0m[3175] GET
/containers/56111722ef83134f6c73c5e3aa27de3f34f1fa73efdec3257c3cc9b283e40729/json
[34mINFO[0m[3175] GET
/containers/b9e9b79a8d431455bfcaafca59223017b2470a47a294075d656eeffdaaefad33/json
*[34mINFO[0m[3175] POST
/v1.21/containers/mesos-20160114-153418-1674208327-5050-3798-S0.c5c35d59-1318-4a96-b850-b0b788815f1b/stop*
?t=15
*[31mERRO[0m[3175] attach: stdout: write unix @: broken pipe *
*[34mINFO[0m[3190] Container
cf7fc7c483248e30f1dbb5990ce8874f2bfbe936c74eed1fc9af6f70653a1d47 failed to
exit within 15 seconds of SIGTERM - using the force *
*[34mINFO[0m[3200] Container cf7fc7c48324 failed to exit within 10
seconds of kill - trying direct SIGKILL *
*STDOUT from Mesos:*
*--container="mesos-20160114-153418-1674208327-5050-3798-S0.c5c35d59-1318-4a96-b850-b0b788815f1b"
*--docker="/usr/local/ecxmcc/weaveShim" --help="false"
--initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO"
--mapped_directory="/mnt/mesos/sandbox" --quiet="false"
--sandbox_directory="/tmp/mesos/slaves/20160114-153418-1674208327-5050-3798-S0/frameworks/20160114-103414-1674208327-5050-3293-0000/executors/ecxconfigdb.c3cae92e-baff-11e5-8afe-82f779ac6285/runs/c5c35d59-1318-4a96-b850-b0b788815f1b"
--stop_timeout="15secs"
--container="mesos-20160114-153418-1674208327-5050-3798-S0.c5c35d59-1318-4a96-b850-b0b788815f1b"
--docker="/usr/local/ecxmcc/weaveShim" --help="false"
--initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO"
--mapped_directory="/mnt/mesos/sandbox" --quiet="false"
--sandbox_directory="/tmp/mesos/slaves/20160114-153418-1674208327-5050-3798-S0/frameworks/20160114-103414-1674208327-5050-3293-0000/executors/ecxconfigdb.c3cae92e-baff-11e5-8afe-82f779ac6285/runs/c5c35d59-1318-4a96-b850-b0b788815f1b"
--stop_timeout="15secs"
Registered docker executor on 71.100.202.99
Starting task ecxconfigdb.c3cae92e-baff-11e5-8afe-82f779ac6285
2016-01-14T20:45:38.613+0000 [initandlisten] MongoDB starting : pid=1
port=27017 dbpath=/data/db 64-bit host=ecxconfigdb
2016-01-14T20:45:38.614+0000 [initandlisten] db version v2.6.8
2016-01-14T20:45:38.614+0000 [initandlisten] git version:
3abc04d6d4f71de00b57378e3277def8fd7a6700
2016-01-14T20:45:38.614+0000 [initandlisten] build info: Linux
build5.nj1.10gen.cc 2.6.32-431.3.1.el6.x86_64 #1 SMP Fri Jan 3 21:39:27 UTC
2014 x86_64 BOOST_LIB_VERSION=1_49
2016-01-14T20:45:38.614+0000 [initandlisten] allocator: tcmalloc
2016-01-14T20:45:38.614+0000 [initandlisten] options: { storage: { journal:
{ enabled: true } } }
2016-01-14T20:45:38.616+0000 [initandlisten] journal dir=/data/db/journal
2016-01-14T20:45:38.616+0000 [initandlisten] recover : no journal files
present, no recovery needed
2016-01-14T20:45:39.006+0000 [initandlisten] waiting for connections on
port 27017
2016-01-14T20:46:38.975+0000 [clientcursormon] mem (MB) res:77 virt:12942
2016-01-14T20:46:38.975+0000 [clientcursormon] mapped (incl journal
view):12762
2016-01-14T20:46:38.975+0000 [clientcursormon] connections:0
Killing docker task
Shutting down
Killing docker task
Shutting down
Killing docker task
Shutting down
On Thu, Jan 14, 2016 at 3:38 PM, Paul Bell <[email protected]> wrote:
> Hey Tim,
>
> Thank you very much for your reply.
>
> Yes, I am in the midst of trying to reproduce the problem. If successful
> (so to speak), I will do as you ask.
>
> Cordially,
>
> Paul
>
> On Thu, Jan 14, 2016 at 3:19 PM, Tim Chen <[email protected]> wrote:
>
>> Hi Paul,
>>
>> Looks like we've already issued the docker stop as you seen in the ps
>> output, but the containers are still running. Can you look at the Docker
>> daemon logs and see what's going on there?
>>
>> And also can you also try to modify docker_stop_timeout to 0 so that we
>> SIGKILL the containers right away, and see if this still happens?
>>
>> Tim
>>
>>
>>
>> On Thu, Jan 14, 2016 at 11:52 AM, Paul Bell <[email protected]> wrote:
>>
>>> Hi All,
>>>
>>> It's been quite some time since I've posted here and that's chiefly
>>> because up until a day or two ago, things were working really well.
>>>
>>> I actually may have posted about this some time back. But then the
>>> problem seemed more intermittent.
>>>
>>> In summa, several "docker stops" don't work, i.e., the containers are
>>> not stopped.
>>>
>>> Deployment:
>>>
>>> one Ubuntu VM (vmWare) LTS 14.04 with kernel 3.19
>>> Zookeeper
>>> Mesos-master (0.23.0)
>>> Mesos-slave (0.23.0)
>>> Marathon (0.10.0)
>>> Docker 1.9.1
>>> Weave 1.1.0
>>> Our application contains which include
>>> MongoDB (4)
>>> PostGres
>>> ECX (our product)
>>>
>>> The only thing that's changed at all in the config above is the version
>>> of Docker. Used to be 1.6.2 but I today upgraded it hoping to solve the
>>> problem.
>>>
>>>
>>> My automater program stops the application by sending Marathon an "http
>>> delete" for each running up. Every now & then (reliably reproducible today)
>>> not all containers get stopped. Most recently, 3 containers failed to stop.
>>>
>>> Here are the attendant phenomena:
>>>
>>> Marathon shows the 3 applications in deployment mode (presumably
>>> "deployment" in the sense of "stopping")
>>>
>>> *ps output:*
>>>
>>> root@71:~# ps -ef | grep docker
>>> root 3823 1 0 13:55 ? 00:00:02 /usr/bin/docker daemon
>>> -H unix:///var/run/docker.sock -H tcp://0.0.0.0:4243
>>> root 4967 1 0 13:57 ? 00:00:01 /usr/sbin/mesos-slave
>>> --master=zk://71.100.202.99:2181/mesos --log_dir=/var/log/mesos
>>> --containerizers=docker,mesos --docker=/usr/local/ecxmcc/weaveShim
>>> --docker_stop_timeout=15secs --executor_registration_timeout=5mins
>>> --hostname=71.100.202.99 --ip=71.100.202.99
>>> --attributes=hostType:ecx,shard1 --resources=ports:[31000-31999,8443-8443]
>>> root 5263 3823 0 13:57 ? 00:00:00 docker-proxy -proto tcp
>>> -host-ip 0.0.0.0 -host-port 6783 -container-ip 172.17.0.2 -container-port
>>> 6783
>>> root 5271 3823 0 13:57 ? 00:00:00 docker-proxy -proto udp
>>> -host-ip 0.0.0.0 -host-port 6783 -container-ip 172.17.0.2 -container-port
>>> 6783
>>> root 5279 3823 0 13:57 ? 00:00:00 docker-proxy -proto tcp
>>> -host-ip 172.17.0.1 -host-port 53 -container-ip 172.17.0.2 -container-port
>>> 53
>>> root 5287 3823 0 13:57 ? 00:00:00 docker-proxy -proto udp
>>> -host-ip 172.17.0.1 -host-port 53 -container-ip 172.17.0.2 -container-port
>>> 53
>>> root 7119 4967 0 14:00 ? 00:00:01 mesos-docker-executor
>>> --container=mesos-20160114-135722-1674208327-5050-4917-S0.bfc5a419-30f8-43f7-af2f-5582394532f2
>>> --docker=/usr/local/ecxmcc/weaveShim --help=false
>>> --mapped_directory=/mnt/mesos/sandbox
>>> --sandbox_directory=/tmp/mesos/slaves/20160114-135722-1674208327-5050-4917-S0/frameworks/20160114-103414-1674208327-5050-3293-0000/executors/ecxconfigdb.1e6e0779-baf1-11e5-8c36-522bd4cc5ea9/runs/bfc5a419-30f8-43f7-af2f-5582394532f2
>>> --stop_timeout=15secs
>>> root 7378 4967 0 14:00 ? 00:00:01 mesos-docker-executor
>>> --container=mesos-20160114-135722-1674208327-5050-4917-S0.9b700cdc-3d29-49b7-a7fc-e543a91f7b89
>>> --docker=/usr/local/ecxmcc/weaveShim --help=false
>>> --mapped_directory=/mnt/mesos/sandbox
>>> --sandbox_directory=/tmp/mesos/slaves/20160114-135722-1674208327-5050-4917-S0/frameworks/20160114-103414-1674208327-5050-3293-0000/executors/ecxcatalogdbs1.25911dda-baf1-11e5-8c36-522bd4cc5ea9/runs/9b700cdc-3d29-49b7-a7fc-e543a91f7b89
>>> --stop_timeout=15secs
>>> root 7640 4967 0 14:01 ? 00:00:01 mesos-docker-executor
>>> --container=mesos-20160114-135722-1674208327-5050-4917-S0.d7d861d3-cfc9-424d-b341-0631edea4298
>>> --docker=/usr/local/ecxmcc/weaveShim --help=false
>>> --mapped_directory=/mnt/mesos/sandbox
>>> --sandbox_directory=/tmp/mesos/slaves/20160114-135722-1674208327-5050-4917-S0/frameworks/20160114-103414-1674208327-5050-3293-0000/executors/mongoconfig.2cb9163b-baf1-11e5-8c36-522bd4cc5ea9/runs/d7d861d3-cfc9-424d-b341-0631edea4298
>>> --stop_timeout=15secs
>>> *root 9696 9695 0 14:06 ? 00:00:00 /usr/bin/docker stop -t
>>> 15
>>> mesos-20160114-135722-1674208327-5050-4917-S0.d7d861d3-cfc9-424d-b341-0631edea4298*
>>> *root 9709 9708 0 14:06 ? 00:00:00 /usr/bin/docker stop -t
>>> 15
>>> mesos-20160114-135722-1674208327-5050-4917-S0.9b700cdc-3d29-49b7-a7fc-e543a91f7b89*
>>> *root 9720 9719 0 14:06 ? 00:00:00 /usr/bin/docker stop -t
>>> 15
>>> mesos-20160114-135722-1674208327-5050-4917-S0.bfc5a419-30f8-43f7-af2f-5582394532f2*
>>>
>>> *docker ps output:*
>>>
>>> root@71:~# docker ps
>>> CONTAINER ID IMAGE COMMAND
>>> CREATED STATUS PORTS
>>> NAMES
>>> 5abafbfe7de2 mongo:2.6.8 "/w/w /entrypoint.sh "
>>> 11 minutes ago Up 11 minutes 27017/tcp
>>>
>>>
>>> mesos-20160114-135722-1674208327-5050-4917-S0.d7d861d3-cfc9-424d-b341-0631edea4298
>>> a8449682ca2e mongo:2.6.8 "/w/w /entrypoint.sh "
>>> 11 minutes ago Up 11 minutes 27017/tcp
>>>
>>>
>>> mesos-20160114-135722-1674208327-5050-4917-S0.9b700cdc-3d29-49b7-a7fc-e543a91f7b89
>>> 3b956457374b mongo:2.6.8 "/w/w /entrypoint.sh "
>>> 11 minutes ago Up 11 minutes 27017/tcp
>>>
>>>
>>> mesos-20160114-135722-1674208327-5050-4917-S0.bfc5a419-30f8-43f7-af2f-5582394532f2
>>> 4c1588bb3d4b weaveworks/weaveexec:v1.1.0 "/home/weave/weavepro"
>>> 15 minutes ago Up 15 minutes
>>> weaveproxy
>>> a26a0363584b weaveworks/weave:v1.1.0 "/home/weave/weaver -"
>>> 15 minutes ago Up 15 minutes 172.17.0.1:53->53/tcp,
>>> 172.17.0.1:53->53/udp, 0.0.0.0:6783->6783/tcp, 0.0.0.0:6783->6783/udp
>>> weave
>>>
>>> *from /var/log/syslog:*
>>>
>>>
>>> Jan 14 14:10:02 71 mesos-master[4917]: I0114 14:10:02.356405 5002
>>> master.cpp:2944] Asked to kill task
>>> mongoconfig.2cb9163b-baf1-11e5-8c36-522bd4cc5ea9 of framework
>>> 20160114-103414-1674208327-5050-3293-0000
>>> *Jan 14 14:10:02 71 mesos-master[4917]: I0114 14:10:02.356459 5002
>>> master.cpp:3034] Telling slave 20160114-135722-1674208327-5050-4917-S0 at
>>> slave(1)@71.100.202.99:5051 <http://71.100.202.99:5051> (71.100.202.99) to
>>> kill task mongoconfig.2cb9163b-baf1-11e5-8c36-522bd4cc5ea9 of framework
>>> 20160114-103414-1674208327-5050-3293-0000 (marathon) at
>>> [email protected]:46167
>>> <http://[email protected]:46167>*
>>> *Jan 14 14:10:02 71 mesos-slave[4967]: I0114 14:10:02.356729 5042
>>> slave.cpp:1755] Asked to kill task
>>> mongoconfig.2cb9163b-baf1-11e5-8c36-522bd4cc5ea9 of framework
>>> 20160114-103414-1674208327-5050-3293-0000*
>>> Jan 14 14:10:02 71 mesos-master[4917]: I0114 14:10:02.378295 5004
>>> http.cpp:283] HTTP GET for /master/state.json from 172.19.15.61:65038
>>> with User-Agent='Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36
>>> (KHTML, like Gecko) Chrome/46.0.2490.86 Safari/537.36'
>>> Jan 14 14:10:02 71 mesos-master[4917]: I0114 14:10:02.425904 5001
>>> master.cpp:2944] Asked to kill task
>>> ecxcatalogdbs1.25911dda-baf1-11e5-8c36-522bd4cc5ea9 of framework
>>> 20160114-103414-1674208327-5050-3293-0000
>>> Jan 14 14:10:02 71 mesos-master[4917]: I0114 14:10:02.425935 5001
>>> master.cpp:3034] Telling slave 20160114-135722-1674208327-5050-4917-S0 at
>>> slave(1)@71.100.202.99:5051 (71.100.202.99) to kill task
>>> ecxcatalogdbs1.25911dda-baf1-11e5-8c36-522bd4cc5ea9 of framework
>>> 20160114-103414-1674208327-5050-3293-0000 (marathon) at
>>> [email protected]:46167
>>> Jan 14 14:10:02 71 mesos-slave[4967]: I0114 14:10:02.426136 5041
>>> slave.cpp:1755] Asked to kill task
>>> ecxcatalogdbs1.25911dda-baf1-11e5-8c36-522bd4cc5ea9 of framework
>>> 20160114-103414-1674208327-5050-3293-0000
>>> Jan 14 14:10:02 71 mesos-master[4917]: I0114 14:10:02.435932 4998
>>> master.cpp:2944] Asked to kill task
>>> ecxconfigdb.1e6e0779-baf1-11e5-8c36-522bd4cc5ea9 of framework
>>> 20160114-103414-1674208327-5050-3293-0000
>>> Jan 14 14:10:02 71 mesos-master[4917]: I0114 14:10:02.435958 4998
>>> master.cpp:3034] Telling slave 20160114-135722-1674208327-5050-4917-S0 at
>>> slave(1)@71.100.202.99:5051 (71.100.202.99) to kill task
>>> ecxconfigdb.1e6e0779-baf1-11e5-8c36-522bd4cc5ea9 of framework
>>> 20160114-103414-1674208327-5050-3293-0000 (marathon) at
>>> [email protected]:46167
>>> Jan 14 14:10:02 71 mesos-slave[4967]: I0114 14:10:02.436151 5038
>>> slave.cpp:1755] Asked to kill task
>>> ecxconfigdb.1e6e0779-baf1-11e5-8c36-522bd4cc5ea9 of framework
>>> 20160114-103414-1674208327-5050-3293-0000
>>> Jan 14 14:10:03 71 mesos-master[4917]: I0114 14:10:03.759009 5001
>>> master.cpp:4290] Sending 1 offers to framework
>>> 20160114-103414-1674208327-5050-3293-0000 (marathon) at
>>> [email protected]:46167
>>> Jan 14 14:10:03 71 marathon[4937]: [2016-01-14 14:10:03,765] INFO
>>> started processing 1 offers, launching at most 1 tasks per offer and 1000
>>> tasks in total (mesosphere.marathon.tasks.IterativeOfferMatcher$:132)
>>> Jan 14 14:10:03 71 marathon[4937]: [2016-01-14 14:10:03,766] INFO Offer
>>> [20160114-135722-1674208327-5050-4917-O128]. Decline with default filter
>>> refuseSeconds (use --decline_offer_duration to configure)
>>> (mesosphere.marathon.tasks.IterativeOfferMatcher$:231)
>>>
>>>
>>> *from Mesos STDOUT of unstopped container:*
>>>
>>> Starting task mongoconfig.2cb9163b-baf1-11e5-8c36-522bd4cc5ea9
>>> 2016-01-14T19:01:10.997+0000 [initandlisten] MongoDB starting : pid=1
>>> port=27019 dbpath=/data/db/config master=1 64-bit host=mongoconfig
>>> 2016-01-14T19:01:10.998+0000 [initandlisten] db version v2.6.8
>>> 2016-01-14T19:01:10.998+0000 [initandlisten] git version:
>>> 3abc04d6d4f71de00b57378e3277def8fd7a6700
>>> 2016-01-14T19:01:10.998+0000 [initandlisten] build info: Linux
>>> build5.nj1.10gen.cc 2.6.32-431.3.1.el6.x86_64 #1 SMP Fri Jan 3 21:39:27
>>> UTC 2014 x86_64 BOOST_LIB_VERSION=1_49
>>> 2016-01-14T19:01:10.998+0000 [initandlisten] allocator: tcmalloc
>>> 2016-01-14T19:01:10.998+0000 [initandlisten] options: { sharding: {
>>> clusterRole: "configsvr" }, storage: { dbPath: "/data/db/config", journal:
>>> { enabled: true } } }
>>> 2016-01-14T19:01:10.999+0000 [initandlisten] journal
>>> dir=/data/db/config/journal
>>> 2016-01-14T19:01:11.000+0000 [initandlisten] recover : no journal files
>>> present, no recovery needed
>>> 2016-01-14T19:01:11.429+0000 [initandlisten] warning:
>>> ClientCursor::staticYield can't unlock b/c of recursive lock ns: top: {
>>> opid: 11, active: true, secs_running: 0, microsecs_running: 36, op:
>>> "query", ns: "local.oplog.$main", query: { query: {}, orderby: { $natural:
>>> -1 } }, client: "0.0.0.0:0", desc: "initandlisten", threadId:
>>> "0x7f8f73075b40", locks: { ^: "W" }, waitingForLock: false, numYields: 0,
>>> lockStats: { timeLockedMicros: {}, timeAcquiringMicros: {} } }
>>> 2016-01-14T19:01:11.429+0000 [initandlisten] waiting for connections on
>>> port 27019
>>> 2016-01-14T19:01:17.405+0000 [initandlisten] connection accepted from
>>> 10.2.0.3:51189 #1 (1 connection now open)
>>> 2016-01-14T19:01:17.413+0000 [initandlisten] connection accepted from
>>> 10.2.0.3:51190 #2 (2 connections now open)
>>> 2016-01-14T19:01:17.413+0000 [initandlisten] connection accepted from
>>> 10.2.0.3:51191 #3 (3 connections now open)
>>> 2016-01-14T19:01:17.414+0000 [conn3] first cluster operation detected,
>>> adding sharding hook to enable versioning and authentication to remote
>>> servers
>>> 2016-01-14T19:01:17.414+0000 [conn3] CMD fsync: sync:1 lock:0
>>> 2016-01-14T19:01:17.415+0000 [conn3] CMD fsync: sync:1 lock:0
>>> 2016-01-14T19:01:17.415+0000 [conn3] CMD fsync: sync:1 lock:0
>>> 2016-01-14T19:01:17.415+0000 [conn3] CMD fsync: sync:1 lock:0
>>> 2016-01-14T19:01:17.416+0000 [conn3] CMD fsync: sync:1 lock:0
>>> 2016-01-14T19:01:17.416+0000 [conn3] CMD fsync: sync:1 lock:0
>>> 2016-01-14T19:01:17.416+0000 [conn3] CMD fsync: sync:1 lock:0
>>> 2016-01-14T19:01:17.419+0000 [initandlisten] connection accepted from
>>> 10.2.0.3:51193 #4 (4 connections now open)
>>> 2016-01-14T19:01:17.420+0000 [initandlisten] connection accepted from
>>> 10.2.0.3:51194 #5 (5 connections now open)
>>> 2016-01-14T19:01:17.442+0000 [conn1] end connection 10.2.0.3:51189 (4
>>> connections now open)
>>> 2016-01-14T19:02:11.285+0000 [clientcursormon] mem (MB) res:59 virt:385
>>> 2016-01-14T19:02:11.285+0000 [clientcursormon] mapped (incl journal
>>> view):192
>>> 2016-01-14T19:02:11.285+0000 [clientcursormon] connections:4
>>> 2016-01-14T19:03:11.293+0000 [clientcursormon] mem (MB) res:72 virt:385
>>> 2016-01-14T19:03:11.294+0000 [clientcursormon] mapped (incl journal
>>> view):192
>>> 2016-01-14T19:03:11.294+0000 [clientcursormon] connections:4
>>> Killing docker task
>>> Shutting down
>>> Killing docker task
>>> Shutting down
>>> Killing docker task
>>> Shutting down
>>> Killing docker task
>>> Shutting down
>>> Killing docker task
>>> Shutting down
>>> Killing docker task
>>> Shutting down
>>> Killing docker task
>>> Shutting down
>>> Killing docker task
>>> Shutting down
>>> Killing docker task
>>> Shutting down
>>> Killing docker task
>>> Shutting down
>>> Killing docker task
>>> Shutting down
>>> Killing docker task
>>> Shutting down
>>> Killing docker task
>>> Shutting down
>>> Killing docker task
>>> Shutting down
>>> Killing docker task
>>> Shutting down
>>> Killing docker task
>>>
>>> Most disturbing in all of this is that while I can stop the deployments
>>> in Marathon (which properly end the "docker stop" commands visible in ps
>>> output), I can not bounce docker, not by Upstart, nor via kill command).
>>> Ultimately, I have to reboot the VM.
>>>
>>> FWIW, the 3 mongod containers (apparently stuck in their Killing docker
>>> task/shutting down loop) are running at 100%CPU as evinced by both "docker
>>> stats" and "top".
>>>
>>> I would truly be grateful for some guidance on this - even a mere
>>> work-around would be appreciated.
>>>
>>> Thank you.
>>>
>>> -Paul
>>>
>>
>>
>