Howdy, Lately our puppetmasters have been returning 500 errors at a reasonably high rate. On any given day, 200-300 (out of many thousands) of our clients are reporting that they've received a 500 during a run. So far, the only firm reports I've received have been regarding our Ubuntu clients, for which we happen to have central logging of puppet errors. I've done some analysis of the logs that I have and found that the majority of the 500s (~83%) received by our Ubuntu clients are on file metadata operations:
//shell/File[/etc/csh.cshrc]: Failed to retrieve current state of resource: Error 500 on SERVER: <!DOCTYPE HTML PUBLIC ""-//IETF//DTD HTML 2.0//EN""> <html><head> <title>500 Internal Server Error</title> [etc...] The selection of files appears to be random and generally retrying the puppet run succeeds. Our puppetmasters are behind load balanced VIPs and we're running 0.25.5 on Ubuntu Lucid with the stock Lucid Apache2/Ruby/Passenger stack. I've gone through the bug tracker and while I see some issues that are close, they don't match our specific symptoms (most of the bugs I saw showed the clients getting 400s, not 500s, or that affect sites already on 2.6.x). I'd really like to say that this started when we upgraded the puppetmasters to Lucid, but we weren't tracking the apache error rates at the time of the upgrades, so all I have to go on is the bug report, which was filed weeks after the Lucid upgrades were complete. To answer some questions I've seen while searching for previous puppet-users threads about metadata fetch problems: - We do use environments and every client is in a non-default environment. That is to say that we do not have an environment named 'production' at all. - We do use an external node classifier but I don't think that it's issue #3910. For one thing, the clients are receiving error code 500, not 400 as Nigel reported in #3910. Actually, I can get a 400 just by giving '--environment production', as reported in #3910, and the resulting stack trace is different from the ones I list below. - We have a cronjob that runs on the puppetmasters every 10min to update the manifests from version control which does a checkout to a temp location and then rsyncs over to the live location. It is possible that files are being updated during some of the 500 errors, but our OS teams have well defined release schedules (releases every 2-4 weeks) and the 500s continue between releases. I enabled debug logging and tracing on a few of our puppetmasters so I could get the stack traces logged. I'm getting the stack traces in the apache error log but they're arriving without timestamps and without the actual exception detail message. Also, unfortunately, we're getting a lot of exceptions right now due to an unrelated problem with our Mac clients so the logs are very noisy and it's difficult to tell what is what. Also, since there are no timestamps[1], it's hard to correlate the stack traces with specific logs from puppetmasterd. That said, here are some candidate stack traces, which are hopefully complete: Related to locking: /usr/lib/ruby/1.8/puppet/external/lock.rb:36:in `flush' /usr/lib/ruby/1.8/puppet/external/lock.rb:36:in `lock_exclusive' /usr/lib/ruby/1.8/puppet/util/file_locking.rb:38:in `writelock' /usr/lib/ruby/1.8/puppet/util/file_locking.rb:37:in `open' /usr/lib/ruby/1.8/puppet/util/file_locking.rb:37:in `writelock' /usr/lib/ruby/1.8/sync.rb:230:in `synchronize' /usr/lib/ruby/1.8/puppet/util/file_locking.rb:36:in `writelock' /usr/lib/ruby/1.8/puppet/indirector/yaml.rb:40:in `save' /usr/lib/ruby/1.8/puppet/indirector/indirection.rb:262:in `save' /usr/lib/ruby/1.8/puppet/node/facts.rb:15:in `save' /usr/lib/ruby/1.8/puppet/indirector.rb:65:in `save' /usr/lib/ruby/1.8/puppet/indirector/catalog/compiler.rb:26:in `extract_facts_from_request' /usr/lib/ruby/1.8/puppet/indirector/catalog/compiler.rb:31:in `find' Two different paths to a "fileserver module not mounted" exception: /usr/lib/ruby/1.8/puppet/network/handler/fileserver.rb:401:in `splitpath' /usr/lib/ruby/1.8/puppet/network/handler/fileserver.rb:236:in `convert' /usr/lib/ruby/1.8/puppet/network/handler/fileserver.rb:133:in `list' /usr/lib/ruby/1.8/rubygems/custom_require.rb:31 /usr/lib/ruby/1.8/puppet/network/xmlrpc/processor.rb:52:in `call' /usr/lib/ruby/1.8/puppet/network/xmlrpc/processor.rb:52:in `protect_service' /usr/lib/ruby/1.8/puppet/network/xmlrpc/processor.rb:85 /usr/lib/ruby/1.8/xmlrpc/server.rb:338:in `call' /usr/lib/ruby/1.8/xmlrpc/server.rb:338:in `dispatch' /usr/lib/ruby/1.8/xmlrpc/server.rb:325:in `each' /usr/lib/ruby/1.8/xmlrpc/server.rb:325:in `dispatch' /usr/lib/ruby/1.8/xmlrpc/server.rb:368:in `call_method' /usr/lib/ruby/1.8/xmlrpc/server.rb:380:in `handle' /usr/lib/ruby/1.8/puppet/network/xmlrpc/processor.rb:44:in `process' /usr/lib/ruby/1.8/puppet/network/http/rack/xmlrpc.rb:35:in `process' /usr/lib/ruby/1.8/puppet/network/handler/fileserver.rb:401:in `splitpath' /usr/lib/ruby/1.8/puppet/network/handler/fileserver.rb:236:in `convert' /usr/lib/ruby/1.8/puppet/network/handler/fileserver.rb:68:in `describe' /usr/lib/ruby/1.8/rubygems/custom_require.rb:31 /usr/lib/ruby/1.8/puppet/network/xmlrpc/processor.rb:52:in `call' /usr/lib/ruby/1.8/puppet/network/xmlrpc/processor.rb:52:in `protect_service' /usr/lib/ruby/1.8/puppet/network/xmlrpc/processor.rb:85 /usr/lib/ruby/1.8/xmlrpc/server.rb:338:in `call' /usr/lib/ruby/1.8/xmlrpc/server.rb:338:in `dispatch' /usr/lib/ruby/1.8/xmlrpc/server.rb:325:in `each' /usr/lib/ruby/1.8/xmlrpc/server.rb:325:in `dispatch' /usr/lib/ruby/1.8/xmlrpc/server.rb:368:in `call_method' /usr/lib/ruby/1.8/xmlrpc/server.rb:380:in `handle' /usr/lib/ruby/1.8/puppet/network/xmlrpc/processor.rb:44:in `process' /usr/lib/ruby/1.8/puppet/network/http/rack/xmlrpc.rb:35:in `process' And there are a few of these: /usr/lib/ruby/1.8/puppet/network/xmlrpc/processor.rb:52:in `call' /usr/lib/ruby/1.8/puppet/network/xmlrpc/processor.rb:52:in `protect_service' /usr/lib/ruby/1.8/puppet/network/xmlrpc/processor.rb:85 /usr/lib/ruby/1.8/xmlrpc/server.rb:338:in `call' /usr/lib/ruby/1.8/xmlrpc/server.rb:338:in `dispatch' /usr/lib/ruby/1.8/xmlrpc/server.rb:325:in `each' /usr/lib/ruby/1.8/xmlrpc/server.rb:325:in `dispatch' /usr/lib/ruby/1.8/xmlrpc/server.rb:368:in `call_method' /usr/lib/ruby/1.8/xmlrpc/server.rb:380:in `handle' /usr/lib/ruby/1.8/puppet/network/xmlrpc/processor.rb:44:in `process' /usr/lib/ruby/1.8/puppet/network/http/rack/xmlrpc.rb:35:in `process' /usr/lib/ruby/1.8/puppet/network/handler/master.rb:65:in `getconfig' /usr/lib/ruby/1.8/rubygems/custom_require.rb:31 Jason [1] I'm not sure why there are no timestamps on the logged stack traces. Other logs that come directly from apache (such as the server restart message) have timestamps. -- "His face disappeared. If someone has no face left, you know it's serious." -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
