Sometimes (with variable frequency) storeconfigs stores the wrong data
in the fact_values table.  This has the end result that exported
resources, when collected, have invalid configuration.

The most recent example: the "hostname" fact for one of our nodes got,
in stead, the value that should have gone in the "processorcount"
fact.  The had the end result that the node's nagios configuration
started trying to monitor a host "8" rather than "cn19", and ssh keys
for cn19 were collected at other nodes as "8,8.example.com <keytext>"
in stead of "cn19,cn19.example.com <keytext>".  The hostname fact is
the only destination that I've noticed the corrupted data in, but the
source has been swapfree/swapsize, processor[n], operatingsystem,
operatingsystemrelease, kernelrelease, and others.

I realize that I don't have much of a "simple, repeatable, minimal"
test case here, but I've been trying to figure it out for months to no
avail.  I had hoped that an upgrade to 2.6 would make this problem go
away, but no:  we've just now experienced it again.  For the record,
we've seen it since sometime in the 0.24.x branch (when we started
using it).

It might have something to do with an appropriately high load on
storeconfigs.  I ran it for 2 days with nodes exporting data (but not
collecting) to see if it would happen again, and I didn't notice any
corruption.  Then, today, I enabled collection (e.g., ssh_known_hosts)
on all (~138) hosts, and soon after found a corrupt nagios
configuration.  (Then again, it might just be that it's more probably
with more nodes doing the collection.)

I've never seen the actual facter command return one of these bits of
misplaced data: the furthest back I've been able to trace it is to the
facts_values table.

We're using a single puppet master, with storeconfigs storing to a
postgresql database on a different host from the puppet master host.
Everything works in the majority of cases, but fails just often enough
to make it really, really annoying.

Any help anyone can provide, including insight into where I might look
to track down the cause even further, would be much appreciated.
Thanks.

~jon

-- 
You received this message because you are subscribed to the Google Groups 
"Puppet Users" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/puppet-users?hl=en.

Reply via email to