Hi,
I noticed some services on the headnode go down once in a while. I've
started to dig deeper - one of these issues is cloudapi having svc-error
state. I took the opportunity to look at the logs. The cloudapi service
reports:
{"name":"cloudapi","hostname":"0fa04ad9-0c21-408a-bd7a-2d5d382c06dd","pid":17730,"level":30,"msg":"bootstrap
listening at http://[::]:8084","time":"2017-02-15T01:57:52.649Z","v":0}
(node) warning: possible EventEmitter memory leak detected. 11 added
listeners added. Use emitter.setMaxListeners() to increase limit.
Trace
at CueBallDNSResolver.addListener (events.js:239:17)
at FSMStateHandle.on
(/opt/smartdc/cloudapi/node_modules/cueball/node_modules/mooremachine/lib/fsm.js:101:6)
at CueBallDNSResolver.state_bootstrap_ns
(/opt/smartdc/cloudapi/node_modules/cueball/lib/resolver.js:368:5)
at CueBallDNSResolver.FSM._gotoState
(/opt/smartdc/cloudapi/node_modules/cueball/node_modules/mooremachine/lib/fsm.js:273:4)
at CueBallDNSResolver.FSM._gotoState
(/opt/smartdc/cloudapi/node_modules/cueball/node_modules/mooremachine/lib/fsm.js:300:8)
at FSMStateHandle.gotoState
(/opt/smartdc/cloudapi/node_modules/cueball/node_modules/mooremachine/lib/fsm.js:52:23)
at CueBallDNSResolver.<anonymous>
(/opt/smartdc/cloudapi/node_modules/cueball/lib/resolver.js:288:5)
at emitNone (events.js:67:13)
at CueBallDNSResolver.emit (events.js:166:7)
at CueBallDNSResolver.start
(/opt/smartdc/cloudapi/node_modules/cueball/lib/resolver.js:264:7)
{"name":"cloudapi","hostname":"0fa04ad9-0c21-408a-bd7a-2d5d382c06dd","pid":17730,"level":30,"msg":"cueball
kang monitor started on port 9094","time":"2017-02-15T01:57:52.708Z","v":0}
{"name":"cloudapi","hostname":"0fa04ad9-0c21-408a-bd7a-2d5d382c06dd","pid":17730,"level":30,"msg":"bootstrap
shut down","time":"2017-02-15T01:58:29.783Z","v":0}
{"name":"cloudapi","hostname":"0fa04ad9-0c21-408a-bd7a-2d5d382c06dd","pid":17730,"level":30,"msg":"cloudapi
listening at http://[::]:8084","time":"2017-02-15T01:58:29.784Z","v":0}
I think this causes the haproxy service to go into maintainence state:
maintenance 1:56:40 svc:/pkgsrc/haproxy:default
[ Feb 15 01:55:43 Executing start method ("/opt/local/sbin/haproxy -f
/opt/smartdc/cloudapi/etc/haproxy.cfg -D"). ]
[ Feb 15 01:56:13 Method or service exit timed out. Killing contract 1264.
]
A simple svcadm clear haproxy resolves the issue.
vmadm reboot on the clouapi VM also does the trick.
But periodically crapping out is going to be a problem in production.
I noticed this is an old issue with nodejs?
https://github.com/nodejs/node-v0.x-archive/issues/5108. Not sure if this
is related.
Anyone insights would be great!
-------------------------------------------
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription:
https://www.listbox.com/member/?member_id=25769125&id_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com