Hi Tim,

I set docker_stop_timeout to zero as you asked. I am pleased to report
(though a bit fearful about being pleased) that this change seems to have
shut everyone down pretty much instantly.

Can you explain what's happening, e.g., does docker_stop_timeout=0 cause
the immediate use of "kill -9" as opposed to "kill -2"?

I will keep testing the behavior.

Thank you.

-Paul

On Thu, Jan 14, 2016 at 3:59 PM, Paul Bell <[email protected]> wrote:

> Hi Tim,
>
> Things have gotten slightly odder (if that's possible). When I now start
> the application 5 or so containers, only one "ecxconfigdb" gets started -
> and even he took a few tries. That is, I see him failing, moving to
> deploying, then starting again. But I've no evidence (no STDOUT, and no
> docker ctr logs) that show why.
>
> In any event, ecxconfigdb does start. Happily, when I try to stop the
> application I am seeing the phenomena I posted before: killing docker task,
> shutting down repeated many times. The UN-stopped container is now running
> at 100% CPU.
>
> I will try modifying docker_stop_timeout. Back shortly
>
> Thanks again.
>
> -Paul
>
> PS: what do you make of the "broken pipe" error in the docker.log?
>
> *from /var/log/upstart/docker.log*
>
> [34mINFO [0m[3054] GET /v1.15/images/mongo:2.6.8/json
> [34mINFO [0m[3054] GET
> /v1.21/images/mesos-20160114-153418-1674208327-5050-3798-S0.c5c35d59-1318-4a96-b850-b0b788815f1b/json
> [31mERRO [0m[3054] Handler for GET
> /v1.21/images/mesos-20160114-153418-1674208327-5050-3798-S0.c5c35d59-1318-4a96-b850-b0b788815f1b/json
> returned error: No such image:
> mesos-20160114-153418-1674208327-5050-3798-S0.c5c35d59-1318-4a96-b850-b0b788815f1b
> [31mERRO [0m[3054] HTTP Error                                     [31merr
> [0m=No such image:
> mesos-20160114-153418-1674208327-5050-3798-S0.c5c35d59-1318-4a96-b850-b0b788815f1b
> [31mstatusCode [0m=404
> [34mINFO [0m[3054] GET /v1.15/containers/weave/json
> [34mINFO [0m[3054] POST
> /v1.21/containers/create?name=mesos-20160114-153418-1674208327-5050-3798-S0.c5c35d59-1318-4a96-b850-b0b788815f1b
> [34mINFO [0m[3054] POST
> /v1.21/containers/cf7fc7c483248e30f1dbb5990ce8874f2bfbe936c74eed1fc9af6f70653a1d47/attach?stderr=1&stdout=1&stream=1
> [34mINFO [0m[3054] POST
> /v1.21/containers/cf7fc7c483248e30f1dbb5990ce8874f2bfbe936c74eed1fc9af6f70653a1d47/start
> [34mINFO [0m[3054] GET
> /v1.15/containers/cf7fc7c483248e30f1dbb5990ce8874f2bfbe936c74eed1fc9af6f70653a1d47/json
> [34mINFO [0m[3054] GET
> /v1.15/containers/cf7fc7c483248e30f1dbb5990ce8874f2bfbe936c74eed1fc9af6f70653a1d47/json
> [34mINFO [0m[3054] GET /v1.15/containers/weave/json
> [34mINFO [0m[3054] GET
> /v1.15/containers/cf7fc7c483248e30f1dbb5990ce8874f2bfbe936c74eed1fc9af6f70653a1d47/json
> [34mINFO [0m[3054] GET
> /v1.15/containers/cf7fc7c483248e30f1dbb5990ce8874f2bfbe936c74eed1fc9af6f70653a1d47/json
> [34mINFO [0m[3054] GET /v1.15/containers/weave/json
> [34mINFO [0m[3054] GET
> /v1.15/containers/cf7fc7c483248e30f1dbb5990ce8874f2bfbe936c74eed1fc9af6f70653a1d47/json
> [34mINFO [0m[3054] GET
> /v1.21/containers/mesos-20160114-153418-1674208327-5050-3798-S0.c5c35d59-1318-4a96-b850-b0b788815f1b/json
> [34mINFO [0m[3111] GET /v1.21/containers/json
> [34mINFO [0m[3120] GET /v1.21/containers/cf7/json
> [34mINFO [0m[3120] GET
> /v1.21/containers/cf7/logs?stderr=1&stdout=1&tail=all
> [34mINFO [0m[3153] GET /containers/json
> [34mINFO [0m[3153] GET
> /containers/cf7fc7c483248e30f1dbb5990ce8874f2bfbe936c74eed1fc9af6f70653a1d47/json
> [34mINFO [0m[3153] GET
> /containers/56111722ef83134f6c73c5e3aa27de3f34f1fa73efdec3257c3cc9b283e40729/json
> [34mINFO [0m[3153] GET
> /containers/b9e9b79a8d431455bfcaafca59223017b2470a47a294075d656eeffdaaefad33/json
> [34mINFO [0m[3175] GET /containers/json
> [34mINFO [0m[3175] GET
> /containers/cf7fc7c483248e30f1dbb5990ce8874f2bfbe936c74eed1fc9af6f70653a1d47/json
> [34mINFO [0m[3175] GET
> /containers/56111722ef83134f6c73c5e3aa27de3f34f1fa73efdec3257c3cc9b283e40729/json
> [34mINFO [0m[3175] GET
> /containers/b9e9b79a8d431455bfcaafca59223017b2470a47a294075d656eeffdaaefad33/json
> * [34mINFO [0m[3175] POST
> /v1.21/containers/mesos-20160114-153418-1674208327-5050-3798-S0.c5c35d59-1318-4a96-b850-b0b788815f1b/stop*
> ?t=15
> * [31mERRO [0m[3175] attach: stdout: write unix @: broken pipe    *
> * [34mINFO [0m[3190] Container
> cf7fc7c483248e30f1dbb5990ce8874f2bfbe936c74eed1fc9af6f70653a1d47 failed to
> exit within 15 seconds of SIGTERM - using the force *
> * [34mINFO [0m[3200] Container cf7fc7c48324 failed to exit within 10
> seconds of kill - trying direct SIGKILL *
>
> *STDOUT from Mesos:*
>
> *--container="mesos-20160114-153418-1674208327-5050-3798-S0.c5c35d59-1318-4a96-b850-b0b788815f1b"
> *--docker="/usr/local/ecxmcc/weaveShim" --help="false"
> --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO"
> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
> --sandbox_directory="/tmp/mesos/slaves/20160114-153418-1674208327-5050-3798-S0/frameworks/20160114-103414-1674208327-5050-3293-0000/executors/ecxconfigdb.c3cae92e-baff-11e5-8afe-82f779ac6285/runs/c5c35d59-1318-4a96-b850-b0b788815f1b"
> --stop_timeout="15secs"
> --container="mesos-20160114-153418-1674208327-5050-3798-S0.c5c35d59-1318-4a96-b850-b0b788815f1b"
> --docker="/usr/local/ecxmcc/weaveShim" --help="false"
> --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO"
> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
> --sandbox_directory="/tmp/mesos/slaves/20160114-153418-1674208327-5050-3798-S0/frameworks/20160114-103414-1674208327-5050-3293-0000/executors/ecxconfigdb.c3cae92e-baff-11e5-8afe-82f779ac6285/runs/c5c35d59-1318-4a96-b850-b0b788815f1b"
> --stop_timeout="15secs"
> Registered docker executor on 71.100.202.99
> Starting task ecxconfigdb.c3cae92e-baff-11e5-8afe-82f779ac6285
> 2016-01-14T20:45:38.613+0000 [initandlisten] MongoDB starting : pid=1
> port=27017 dbpath=/data/db 64-bit host=ecxconfigdb
> 2016-01-14T20:45:38.614+0000 [initandlisten] db version v2.6.8
> 2016-01-14T20:45:38.614+0000 [initandlisten] git version:
> 3abc04d6d4f71de00b57378e3277def8fd7a6700
> 2016-01-14T20:45:38.614+0000 [initandlisten] build info: Linux
> build5.nj1.10gen.cc 2.6.32-431.3.1.el6.x86_64 #1 SMP Fri Jan 3 21:39:27
> UTC 2014 x86_64 BOOST_LIB_VERSION=1_49
> 2016-01-14T20:45:38.614+0000 [initandlisten] allocator: tcmalloc
> 2016-01-14T20:45:38.614+0000 [initandlisten] options: { storage: {
> journal: { enabled: true } } }
> 2016-01-14T20:45:38.616+0000 [initandlisten] journal dir=/data/db/journal
> 2016-01-14T20:45:38.616+0000 [initandlisten] recover : no journal files
> present, no recovery needed
> 2016-01-14T20:45:39.006+0000 [initandlisten] waiting for connections on
> port 27017
> 2016-01-14T20:46:38.975+0000 [clientcursormon] mem (MB) res:77 virt:12942
> 2016-01-14T20:46:38.975+0000 [clientcursormon]  mapped (incl journal
> view):12762
> 2016-01-14T20:46:38.975+0000 [clientcursormon]  connections:0
> Killing docker task
> Shutting down
> Killing docker task
> Shutting down
> Killing docker task
> Shutting down
>
> On Thu, Jan 14, 2016 at 3:38 PM, Paul Bell <[email protected]> wrote:
>
>> Hey Tim,
>>
>> Thank you very much for your reply.
>>
>> Yes, I am in the midst of trying to reproduce the problem. If successful
>> (so to speak), I will do as you ask.
>>
>> Cordially,
>>
>> Paul
>>
>> On Thu, Jan 14, 2016 at 3:19 PM, Tim Chen <[email protected]> wrote:
>>
>>> Hi Paul,
>>>
>>> Looks like we've already issued the docker stop as you seen in the ps
>>> output, but the containers are still running. Can you look at the Docker
>>> daemon logs and see what's going on there?
>>>
>>> And also can you also try to modify docker_stop_timeout to 0 so that we
>>> SIGKILL the containers right away, and see if this still happens?
>>>
>>> Tim
>>>
>>>
>>>
>>> On Thu, Jan 14, 2016 at 11:52 AM, Paul Bell <[email protected]> wrote:
>>>
>>>> Hi All,
>>>>
>>>> It's been quite some time since I've posted here and that's chiefly
>>>> because up until a day or two ago, things were working really well.
>>>>
>>>> I actually may have posted about this some time back. But then the
>>>> problem seemed more intermittent.
>>>>
>>>> In summa, several "docker stops" don't work, i.e., the containers are
>>>> not stopped.
>>>>
>>>> Deployment:
>>>>
>>>> one Ubuntu VM (vmWare) LTS 14.04 with kernel 3.19
>>>> Zookeeper
>>>> Mesos-master (0.23.0)
>>>> Mesos-slave (0.23.0)
>>>> Marathon (0.10.0)
>>>> Docker 1.9.1
>>>> Weave 1.1.0
>>>> Our application contains which include
>>>>     MongoDB (4)
>>>>     PostGres
>>>>     ECX (our product)
>>>>
>>>> The only thing that's changed at all in the config above is the version
>>>> of Docker. Used to be 1.6.2 but I today upgraded it hoping to solve the
>>>> problem.
>>>>
>>>>
>>>> My automater program stops the application by sending Marathon an "http
>>>> delete" for each running up. Every now & then (reliably reproducible today)
>>>> not all containers get stopped. Most recently, 3 containers failed to stop.
>>>>
>>>> Here are the attendant phenomena:
>>>>
>>>> Marathon shows the 3 applications in deployment mode (presumably
>>>> "deployment" in the sense of "stopping")
>>>>
>>>> *ps output:*
>>>>
>>>> root@71:~# ps -ef | grep docker
>>>> root      3823     1  0 13:55 ?        00:00:02 /usr/bin/docker daemon
>>>> -H unix:///var/run/docker.sock -H tcp://0.0.0.0:4243
>>>> root      4967     1  0 13:57 ?        00:00:01 /usr/sbin/mesos-slave
>>>> --master=zk://71.100.202.99:2181/mesos --log_dir=/var/log/mesos
>>>> --containerizers=docker,mesos --docker=/usr/local/ecxmcc/weaveShim
>>>> --docker_stop_timeout=15secs --executor_registration_timeout=5mins
>>>> --hostname=71.100.202.99 --ip=71.100.202.99
>>>> --attributes=hostType:ecx,shard1 --resources=ports:[31000-31999,8443-8443]
>>>> root      5263  3823  0 13:57 ?        00:00:00 docker-proxy -proto tcp
>>>> -host-ip 0.0.0.0 -host-port 6783 -container-ip 172.17.0.2 -container-port
>>>> 6783
>>>> root      5271  3823  0 13:57 ?        00:00:00 docker-proxy -proto udp
>>>> -host-ip 0.0.0.0 -host-port 6783 -container-ip 172.17.0.2 -container-port
>>>> 6783
>>>> root      5279  3823  0 13:57 ?        00:00:00 docker-proxy -proto tcp
>>>> -host-ip 172.17.0.1 -host-port 53 -container-ip 172.17.0.2 -container-port
>>>> 53
>>>> root      5287  3823  0 13:57 ?        00:00:00 docker-proxy -proto udp
>>>> -host-ip 172.17.0.1 -host-port 53 -container-ip 172.17.0.2 -container-port
>>>> 53
>>>> root      7119  4967  0 14:00 ?        00:00:01 mesos-docker-executor
>>>> --container=mesos-20160114-135722-1674208327-5050-4917-S0.bfc5a419-30f8-43f7-af2f-5582394532f2
>>>> --docker=/usr/local/ecxmcc/weaveShim --help=false
>>>> --mapped_directory=/mnt/mesos/sandbox
>>>> --sandbox_directory=/tmp/mesos/slaves/20160114-135722-1674208327-5050-4917-S0/frameworks/20160114-103414-1674208327-5050-3293-0000/executors/ecxconfigdb.1e6e0779-baf1-11e5-8c36-522bd4cc5ea9/runs/bfc5a419-30f8-43f7-af2f-5582394532f2
>>>> --stop_timeout=15secs
>>>> root      7378  4967  0 14:00 ?        00:00:01 mesos-docker-executor
>>>> --container=mesos-20160114-135722-1674208327-5050-4917-S0.9b700cdc-3d29-49b7-a7fc-e543a91f7b89
>>>> --docker=/usr/local/ecxmcc/weaveShim --help=false
>>>> --mapped_directory=/mnt/mesos/sandbox
>>>> --sandbox_directory=/tmp/mesos/slaves/20160114-135722-1674208327-5050-4917-S0/frameworks/20160114-103414-1674208327-5050-3293-0000/executors/ecxcatalogdbs1.25911dda-baf1-11e5-8c36-522bd4cc5ea9/runs/9b700cdc-3d29-49b7-a7fc-e543a91f7b89
>>>> --stop_timeout=15secs
>>>> root      7640  4967  0 14:01 ?        00:00:01 mesos-docker-executor
>>>> --container=mesos-20160114-135722-1674208327-5050-4917-S0.d7d861d3-cfc9-424d-b341-0631edea4298
>>>> --docker=/usr/local/ecxmcc/weaveShim --help=false
>>>> --mapped_directory=/mnt/mesos/sandbox
>>>> --sandbox_directory=/tmp/mesos/slaves/20160114-135722-1674208327-5050-4917-S0/frameworks/20160114-103414-1674208327-5050-3293-0000/executors/mongoconfig.2cb9163b-baf1-11e5-8c36-522bd4cc5ea9/runs/d7d861d3-cfc9-424d-b341-0631edea4298
>>>> --stop_timeout=15secs
>>>> *root      9696  9695  0 14:06 ?        00:00:00 /usr/bin/docker stop
>>>> -t 15
>>>> mesos-20160114-135722-1674208327-5050-4917-S0.d7d861d3-cfc9-424d-b341-0631edea4298*
>>>> *root      9709  9708  0 14:06 ?        00:00:00 /usr/bin/docker stop
>>>> -t 15
>>>> mesos-20160114-135722-1674208327-5050-4917-S0.9b700cdc-3d29-49b7-a7fc-e543a91f7b89*
>>>> *root      9720  9719  0 14:06 ?        00:00:00 /usr/bin/docker stop
>>>> -t 15
>>>> mesos-20160114-135722-1674208327-5050-4917-S0.bfc5a419-30f8-43f7-af2f-5582394532f2*
>>>>
>>>> *docker ps output:*
>>>>
>>>> root@71:~# docker ps
>>>> CONTAINER ID        IMAGE                         COMMAND
>>>>    CREATED             STATUS              PORTS
>>>>                                                                NAMES
>>>> 5abafbfe7de2        mongo:2.6.8                   "/w/w /entrypoint.sh
>>>> "   11 minutes ago      Up 11 minutes       27017/tcp
>>>>
>>>>  
>>>> mesos-20160114-135722-1674208327-5050-4917-S0.d7d861d3-cfc9-424d-b341-0631edea4298
>>>> a8449682ca2e        mongo:2.6.8                   "/w/w /entrypoint.sh
>>>> "   11 minutes ago      Up 11 minutes       27017/tcp
>>>>
>>>>  
>>>> mesos-20160114-135722-1674208327-5050-4917-S0.9b700cdc-3d29-49b7-a7fc-e543a91f7b89
>>>> 3b956457374b        mongo:2.6.8                   "/w/w /entrypoint.sh
>>>> "   11 minutes ago      Up 11 minutes       27017/tcp
>>>>
>>>>  
>>>> mesos-20160114-135722-1674208327-5050-4917-S0.bfc5a419-30f8-43f7-af2f-5582394532f2
>>>> 4c1588bb3d4b        weaveworks/weaveexec:v1.1.0
>>>> "/home/weave/weavepro"   15 minutes ago      Up 15 minutes
>>>>
>>>>          weaveproxy
>>>> a26a0363584b        weaveworks/weave:v1.1.0       "/home/weave/weaver
>>>> -"   15 minutes ago      Up 15 minutes       172.17.0.1:53->53/tcp,
>>>> 172.17.0.1:53->53/udp, 0.0.0.0:6783->6783/tcp, 0.0.0.0:6783->6783/udp
>>>>   weave
>>>>
>>>> *from /var/log/syslog:*
>>>>
>>>>
>>>> Jan 14 14:10:02 71 mesos-master[4917]: I0114 14:10:02.356405  5002
>>>> master.cpp:2944] Asked to kill task
>>>> mongoconfig.2cb9163b-baf1-11e5-8c36-522bd4cc5ea9 of framework
>>>> 20160114-103414-1674208327-5050-3293-0000
>>>> *Jan 14 14:10:02 71 mesos-master[4917]: I0114 14:10:02.356459  5002
>>>> master.cpp:3034] Telling slave 20160114-135722-1674208327-5050-4917-S0 at
>>>> slave(1)@71.100.202.99:5051 <http://71.100.202.99:5051> (71.100.202.99) to
>>>> kill task mongoconfig.2cb9163b-baf1-11e5-8c36-522bd4cc5ea9 of framework
>>>> 20160114-103414-1674208327-5050-3293-0000 (marathon) at
>>>> [email protected]:46167
>>>> <http://[email protected]:46167>*
>>>> *Jan 14 14:10:02 71 mesos-slave[4967]: I0114 14:10:02.356729  5042
>>>> slave.cpp:1755] Asked to kill task
>>>> mongoconfig.2cb9163b-baf1-11e5-8c36-522bd4cc5ea9 of framework
>>>> 20160114-103414-1674208327-5050-3293-0000*
>>>> Jan 14 14:10:02 71 mesos-master[4917]: I0114 14:10:02.378295  5004
>>>> http.cpp:283] HTTP GET for /master/state.json from 172.19.15.61:65038
>>>> with User-Agent='Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36
>>>> (KHTML, like Gecko) Chrome/46.0.2490.86 Safari/537.36'
>>>> Jan 14 14:10:02 71 mesos-master[4917]: I0114 14:10:02.425904  5001
>>>> master.cpp:2944] Asked to kill task
>>>> ecxcatalogdbs1.25911dda-baf1-11e5-8c36-522bd4cc5ea9 of framework
>>>> 20160114-103414-1674208327-5050-3293-0000
>>>> Jan 14 14:10:02 71 mesos-master[4917]: I0114 14:10:02.425935  5001
>>>> master.cpp:3034] Telling slave 20160114-135722-1674208327-5050-4917-S0 at
>>>> slave(1)@71.100.202.99:5051 (71.100.202.99) to kill task
>>>> ecxcatalogdbs1.25911dda-baf1-11e5-8c36-522bd4cc5ea9 of framework
>>>> 20160114-103414-1674208327-5050-3293-0000 (marathon) at
>>>> [email protected]:46167
>>>> Jan 14 14:10:02 71 mesos-slave[4967]: I0114 14:10:02.426136  5041
>>>> slave.cpp:1755] Asked to kill task
>>>> ecxcatalogdbs1.25911dda-baf1-11e5-8c36-522bd4cc5ea9 of framework
>>>> 20160114-103414-1674208327-5050-3293-0000
>>>> Jan 14 14:10:02 71 mesos-master[4917]: I0114 14:10:02.435932  4998
>>>> master.cpp:2944] Asked to kill task
>>>> ecxconfigdb.1e6e0779-baf1-11e5-8c36-522bd4cc5ea9 of framework
>>>> 20160114-103414-1674208327-5050-3293-0000
>>>> Jan 14 14:10:02 71 mesos-master[4917]: I0114 14:10:02.435958  4998
>>>> master.cpp:3034] Telling slave 20160114-135722-1674208327-5050-4917-S0 at
>>>> slave(1)@71.100.202.99:5051 (71.100.202.99) to kill task
>>>> ecxconfigdb.1e6e0779-baf1-11e5-8c36-522bd4cc5ea9 of framework
>>>> 20160114-103414-1674208327-5050-3293-0000 (marathon) at
>>>> [email protected]:46167
>>>> Jan 14 14:10:02 71 mesos-slave[4967]: I0114 14:10:02.436151  5038
>>>> slave.cpp:1755] Asked to kill task
>>>> ecxconfigdb.1e6e0779-baf1-11e5-8c36-522bd4cc5ea9 of framework
>>>> 20160114-103414-1674208327-5050-3293-0000
>>>> Jan 14 14:10:03 71 mesos-master[4917]: I0114 14:10:03.759009  5001
>>>> master.cpp:4290] Sending 1 offers to framework
>>>> 20160114-103414-1674208327-5050-3293-0000 (marathon) at
>>>> [email protected]:46167
>>>> Jan 14 14:10:03 71 marathon[4937]: [2016-01-14 14:10:03,765] INFO
>>>> started processing 1 offers, launching at most 1 tasks per offer and 1000
>>>> tasks in total (mesosphere.marathon.tasks.IterativeOfferMatcher$:132)
>>>> Jan 14 14:10:03 71 marathon[4937]: [2016-01-14 14:10:03,766] INFO Offer
>>>> [20160114-135722-1674208327-5050-4917-O128]. Decline with default filter
>>>> refuseSeconds (use --decline_offer_duration to configure)
>>>> (mesosphere.marathon.tasks.IterativeOfferMatcher$:231)
>>>>
>>>>
>>>> *from Mesos STDOUT of unstopped container:*
>>>>
>>>> Starting task mongoconfig.2cb9163b-baf1-11e5-8c36-522bd4cc5ea9
>>>> 2016-01-14T19:01:10.997+0000 [initandlisten] MongoDB starting : pid=1
>>>> port=27019 dbpath=/data/db/config master=1 64-bit host=mongoconfig
>>>> 2016-01-14T19:01:10.998+0000 [initandlisten] db version v2.6.8
>>>> 2016-01-14T19:01:10.998+0000 [initandlisten] git version:
>>>> 3abc04d6d4f71de00b57378e3277def8fd7a6700
>>>> 2016-01-14T19:01:10.998+0000 [initandlisten] build info: Linux
>>>> build5.nj1.10gen.cc 2.6.32-431.3.1.el6.x86_64 #1 SMP Fri Jan 3
>>>> 21:39:27 UTC 2014 x86_64 BOOST_LIB_VERSION=1_49
>>>> 2016-01-14T19:01:10.998+0000 [initandlisten] allocator: tcmalloc
>>>> 2016-01-14T19:01:10.998+0000 [initandlisten] options: { sharding: {
>>>> clusterRole: "configsvr" }, storage: { dbPath: "/data/db/config", journal:
>>>> { enabled: true } } }
>>>> 2016-01-14T19:01:10.999+0000 [initandlisten] journal
>>>> dir=/data/db/config/journal
>>>> 2016-01-14T19:01:11.000+0000 [initandlisten] recover : no journal files
>>>> present, no recovery needed
>>>> 2016-01-14T19:01:11.429+0000 [initandlisten] warning:
>>>> ClientCursor::staticYield can't unlock b/c of recursive lock ns:  top: {
>>>> opid: 11, active: true, secs_running: 0, microsecs_running: 36, op:
>>>> "query", ns: "local.oplog.$main", query: { query: {}, orderby: { $natural:
>>>> -1 } }, client: "0.0.0.0:0", desc: "initandlisten", threadId:
>>>> "0x7f8f73075b40", locks: { ^: "W" }, waitingForLock: false, numYields: 0,
>>>> lockStats: { timeLockedMicros: {}, timeAcquiringMicros: {} } }
>>>> 2016-01-14T19:01:11.429+0000 [initandlisten] waiting for connections on
>>>> port 27019
>>>> 2016-01-14T19:01:17.405+0000 [initandlisten] connection accepted from
>>>> 10.2.0.3:51189 #1 (1 connection now open)
>>>> 2016-01-14T19:01:17.413+0000 [initandlisten] connection accepted from
>>>> 10.2.0.3:51190 #2 (2 connections now open)
>>>> 2016-01-14T19:01:17.413+0000 [initandlisten] connection accepted from
>>>> 10.2.0.3:51191 #3 (3 connections now open)
>>>> 2016-01-14T19:01:17.414+0000 [conn3] first cluster operation detected,
>>>> adding sharding hook to enable versioning and authentication to remote
>>>> servers
>>>> 2016-01-14T19:01:17.414+0000 [conn3] CMD fsync: sync:1 lock:0
>>>> 2016-01-14T19:01:17.415+0000 [conn3] CMD fsync: sync:1 lock:0
>>>> 2016-01-14T19:01:17.415+0000 [conn3] CMD fsync: sync:1 lock:0
>>>> 2016-01-14T19:01:17.415+0000 [conn3] CMD fsync: sync:1 lock:0
>>>> 2016-01-14T19:01:17.416+0000 [conn3] CMD fsync: sync:1 lock:0
>>>> 2016-01-14T19:01:17.416+0000 [conn3] CMD fsync: sync:1 lock:0
>>>> 2016-01-14T19:01:17.416+0000 [conn3] CMD fsync: sync:1 lock:0
>>>> 2016-01-14T19:01:17.419+0000 [initandlisten] connection accepted from
>>>> 10.2.0.3:51193 #4 (4 connections now open)
>>>> 2016-01-14T19:01:17.420+0000 [initandlisten] connection accepted from
>>>> 10.2.0.3:51194 #5 (5 connections now open)
>>>> 2016-01-14T19:01:17.442+0000 [conn1] end connection 10.2.0.3:51189 (4
>>>> connections now open)
>>>> 2016-01-14T19:02:11.285+0000 [clientcursormon] mem (MB) res:59 virt:385
>>>> 2016-01-14T19:02:11.285+0000 [clientcursormon]  mapped (incl journal
>>>> view):192
>>>> 2016-01-14T19:02:11.285+0000 [clientcursormon]  connections:4
>>>> 2016-01-14T19:03:11.293+0000 [clientcursormon] mem (MB) res:72 virt:385
>>>> 2016-01-14T19:03:11.294+0000 [clientcursormon]  mapped (incl journal
>>>> view):192
>>>> 2016-01-14T19:03:11.294+0000 [clientcursormon]  connections:4
>>>> Killing docker task
>>>> Shutting down
>>>> Killing docker task
>>>> Shutting down
>>>> Killing docker task
>>>> Shutting down
>>>> Killing docker task
>>>> Shutting down
>>>> Killing docker task
>>>> Shutting down
>>>> Killing docker task
>>>> Shutting down
>>>> Killing docker task
>>>> Shutting down
>>>> Killing docker task
>>>> Shutting down
>>>> Killing docker task
>>>> Shutting down
>>>> Killing docker task
>>>> Shutting down
>>>> Killing docker task
>>>> Shutting down
>>>> Killing docker task
>>>> Shutting down
>>>> Killing docker task
>>>> Shutting down
>>>> Killing docker task
>>>> Shutting down
>>>> Killing docker task
>>>> Shutting down
>>>> Killing docker task
>>>>
>>>> Most disturbing in all of this is that while I can stop the deployments
>>>> in Marathon (which properly end the "docker stop" commands visible in ps
>>>> output), I can not bounce docker, not by Upstart, nor via kill command).
>>>> Ultimately, I have to reboot the VM.
>>>>
>>>> FWIW, the 3 mongod containers (apparently stuck in their Killing docker
>>>> task/shutting down loop) are running at 100%CPU as evinced by both "docker
>>>> stats" and "top".
>>>>
>>>> I would truly be grateful for some guidance on this - even a mere
>>>> work-around would be appreciated.
>>>>
>>>> Thank you.
>>>>
>>>> -Paul
>>>>
>>>
>>>
>>
>

Reply via email to