This is really weird, even if your fact cache got corrupt others would still work
My guess is somehow the agents lost one of their subscriptions to the middleware, the stomp gem stores a view of whats been subscribed and on reconnect it resubscribes and I guess something could have gone wrong there. Or your activemq somehow evicted them from some subscriptions - it has some slow/fast consumer defenses but is should not do that, but this is the main things I can think of. You might want to poke around in ActiveMQ and see its logs if anything interesting shows up there or something. You can also cycle the mcollectived log level without restarting using USR2 signals to the daemon, be interesting to see what logs shows up when you do that. ----- Original Message ----- > From: "Christopher Wood" <christopher_w...@pobox.com> > To: "mcollective-users" <mcollective-users@googlegroups.com> > Sent: Wednesday, 31 August, 2016 23:37:11 > Subject: [mcollective-users] long running mcollectived which can't use filters > I have some mcollective daemons which have been running since August 5th which > appear to not be able to use filters now. They aren't using debug logging and > restarting them fixes whatever ails them. Do any of you have suggestions about > troubleshooting this without a daemon restart? I'd like to know what it is so > I > can prevent this going forward. > > More details: > > This is with mcollective 2.8.8 on CentOS 6 (2.6.32-642.3.1.el6.x86_64) > installed > via puppet-agent 1.4.2. > > They respond easily to these queries (timeouts extended ludicrously to avoid > timeout issues, and I see the behaviour across plugins not just in the puppet > plugin): > > mco puppet status --dt 30 -t 30 > mco puppet status --dt 30 -t 30 -T mail > > But these queries do not get a reply: > > mco puppet status --dt 30 -t 30 -I hostname.domain.com > mco puppet status --dt 30 -t 30 -F fqdn=hostname.domain.com > mco puppet status --dt 30 -t 30 -S fqdn=hostname.domain.com > > The daemons I've restarted in the set are fine. > > I do have a fact update cron job but I thought the mv was atomic in this > context. It seems like losing this file would be an issue but the file does > exist and is readable. > > */5 * * * * /opt/puppetlabs/bin/facter -y -p --show-legacy > >/etc/puppetlabs/mcollective/facts.new && /bin/mv > /etc/puppetlabs/mcollective/facts.new /etc/puppetlabs/mcollective/facts.yaml > > I did see this write go across strace once while stracing but it seems to be a > blind since that's just a FIFO. > > write(6, "!", 1) = 1 > > mcollecti 2620 root 6w FIFO 0,8 0t0 > 16091 > pipe > > This point is when I figured out I was out of ideas. > > -- > > --- > You received this message because you are subscribed to the Google Groups > "mcollective-users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email > to mcollective-users+unsubscr...@googlegroups.com. > For more options, visit https://groups.google.com/d/optout. -- --- You received this message because you are subscribed to the Google Groups "mcollective-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to mcollective-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.