I'm not 100% sure if the subject correctly describes the problem I've been having, but it's the closest I can get with my troubleshooting. My setup looks like this:
* 2 puppetmasters running 0.25.4 on Ubuntu, running under passenger * backend content (etc and var) shared over NFS * haproxy load balancing across the 2 puppetmasters * mysql for stored configs I just upgraded from 0.24.8 to 0.25.4 a couple of weeks ago. The setup we've been using above has worked fine since we implemented it months ago, so I don't believe that there is any problem with NFS or the load balancer. I have a handful of custom functions, and after updating to 0.25.4, puppetmaster started complaining about one of them, a simple function called nagios_name. This function takes an FQDN and turns it into a name we use in Nagios and mcollective (turning "support.arces.net" into "arces.support" for example). The function is basic ruby and is available for you to look at here: http://monachus.pastebin.com/yLF1syqU. The function works fine. The error that puppetmaster reports is: Unknown function nagios_name at /var/www/localhost/puppet/etc/ manifests/outsidein_nodes.pp:16 on node some.node.com. It doesn't report this all of the time - instead it reports it about 40% of the time, while other nodes before and after it do not report the error. It seems that a node with a problem will always have the problem, and a node where it works will always work. This reinforces the fact that the function is fine - it works and has worked for months. My thought is that it's some sort of caching issue, and I even thought it might be a race condition with the backend storage being NFS - one puppetmaster loading a cached yaml file before the other was done writing it or something. I've done all of the following, all with no success: * turn off one puppetmaster so traffic isn't split across them * move yaml files for node/facts to local storage instead of NFS * enable IP-based persistence in haproxy so that traffic from a client always goes to the same puppetmaster * --ignorecache in config.ru for puppetmaster What I've discovered, however, is more interesting. It appears that if I go into the actual nagios_name.rb file and change it in any way (add a single character of whitespace) and restart Apache, the error goes away. The file is detected as different and loaded for delivery to the clients, and everything works fine after that. I discovered this by adding debug() statements to the function 2 weeks ago, only to find that it worked fine from then on. The problem resurfaced today when I turned the 2nd puppetmaster back on, and I decided to try it with whitespace - same thing. Clears it right up. This tells me that there is some sort of caching wonkiness happening somewhere, but I'm not able to figure out where. Perhaps one of the variables the function is looking for (fqdn?) isn't available at the time it's requested, resulting in a compile error that isn't always visible? I'm pleased to have a workaround, but to go from "Unknown function" to "everything is cool" by adding a space to the file and saving it isn't really much of a long-term solution. I'm sending this to the list rather than filing a bug report to see if anyone has experienced anything like this or has any thoughts. If there's any further information I can give to help narrow down the source of the problem, I'm happy to do so. Adrian -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
