This is really weird, even if your fact cache got corrupt others would still 
work

My guess is somehow the agents lost one of their subscriptions to the 
middleware,
the stomp gem stores a view of whats been subscribed and on reconnect it 
resubscribes
and I guess something could have gone wrong there.  Or your activemq somehow 
evicted
them from some subscriptions - it has some slow/fast consumer defenses but is 
should
not do that, but this is the main things I can think of.

You might want to poke around in ActiveMQ and see its logs if anything 
interesting
shows up there or something.

You can also cycle the mcollectived log level without restarting using USR2 
signals
to the daemon, be interesting to see what logs shows up when you do that.



----- Original Message -----
> From: "Christopher Wood" <christopher_w...@pobox.com>
> To: "mcollective-users" <mcollective-users@googlegroups.com>
> Sent: Wednesday, 31 August, 2016 23:37:11
> Subject: [mcollective-users] long running mcollectived which can't use filters

> I have some mcollective daemons which have been running since August 5th which
> appear to not be able to use filters now. They aren't using debug logging and
> restarting them fixes whatever ails them. Do any of you have suggestions about
> troubleshooting this without a daemon restart? I'd like to know what it is so 
> I
> can prevent this going forward.
> 
> More details:
> 
> This is with mcollective 2.8.8 on CentOS 6 (2.6.32-642.3.1.el6.x86_64) 
> installed
> via puppet-agent 1.4.2.
> 
> They respond easily to these queries (timeouts extended ludicrously to avoid
> timeout issues, and I see the behaviour across plugins  not just in the puppet
> plugin):
> 
> mco puppet status --dt 30 -t 30
> mco puppet status --dt 30 -t 30 -T mail
> 
> But these queries do not get a reply:
> 
> mco puppet status --dt 30 -t 30 -I hostname.domain.com
> mco puppet status --dt 30 -t 30 -F fqdn=hostname.domain.com
> mco puppet status --dt 30 -t 30 -S fqdn=hostname.domain.com
> 
> The daemons I've restarted in the set are fine.
> 
> I do have a fact update cron job but I thought the mv was atomic in this
> context. It seems like losing this file would be an issue but the file does
> exist and is readable.
> 
> */5 * * * * /opt/puppetlabs/bin/facter -y -p --show-legacy
> >/etc/puppetlabs/mcollective/facts.new && /bin/mv
> /etc/puppetlabs/mcollective/facts.new /etc/puppetlabs/mcollective/facts.yaml
> 
> I did see this write go across strace once while stracing but it seems to be a
> blind since that's just a FIFO.
> 
> write(6, "!", 1)                        = 1
> 
> mcollecti  2620      root    6w     FIFO                0,8      0t0      
> 16091
> pipe
> 
> This point is when I figured out I was out of ideas.
> 
> --
> 
> ---
> You received this message because you are subscribed to the Google Groups
> "mcollective-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email
> to mcollective-users+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"mcollective-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mcollective-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to