Howdy,

Lately our puppetmasters have been returning 500 errors at a
reasonably high rate.  On any given day, 200-300 (out of many
thousands) of our clients are reporting that they've received a 500
during a run.  So far, the only firm reports I've received have been
regarding our Ubuntu clients, for which we happen to have central
logging of puppet errors. I've done some analysis of the logs that I
have and found that the majority of the 500s (~83%) received by our
Ubuntu clients are on file metadata operations:

//shell/File[/etc/csh.cshrc]: Failed to retrieve current state of
resource: Error 500 on SERVER: <!DOCTYPE HTML PUBLIC ""-//IETF//DTD
HTML 2.0//EN"">
<html><head>
<title>500 Internal Server Error</title>

[etc...]

The selection of files appears to be random and generally retrying the
puppet run succeeds.

Our puppetmasters are behind load balanced VIPs and we're running
0.25.5 on Ubuntu Lucid with the stock Lucid Apache2/Ruby/Passenger
stack.  I've gone through the bug tracker and while I see some issues
that are close, they don't match our specific symptoms (most of the
bugs I saw showed the clients getting 400s, not 500s, or that affect
sites already on 2.6.x).  I'd really like to say that this started
when we upgraded the puppetmasters to Lucid, but we weren't tracking
the apache error rates at the time of the upgrades, so all I have to
go on is the bug report, which was filed weeks after the Lucid
upgrades were complete.

To answer some questions I've seen while searching for previous
puppet-users threads about metadata fetch problems:

- We do use environments and every client is in a non-default
environment.  That is to say that we do not have an environment named
'production' at all.

- We do use an external node classifier but I don't think that it's
issue #3910.  For one thing, the clients are receiving error code 500,
not 400 as Nigel reported in #3910.  Actually, I can get a 400 just by
giving '--environment production', as reported in #3910, and the
resulting stack trace is different from the ones I list below.

- We have a cronjob that runs on the puppetmasters every 10min to
update the manifests from version control which does a checkout to a
temp location and then rsyncs over to the live location.  It is
possible that files are being updated during some of the 500 errors,
but our OS teams have well defined release schedules (releases every
2-4 weeks) and the 500s continue between releases.

I enabled debug logging and tracing on a few of our puppetmasters so I
could get the stack traces logged.  I'm getting the stack traces in
the apache error log but they're arriving without timestamps and
without the actual exception detail message.  Also, unfortunately,
we're getting a lot of exceptions right now due to an unrelated
problem with our Mac clients so the logs are very noisy and it's
difficult to tell what is what.  Also, since there are no
timestamps[1], it's hard to correlate the stack traces with specific
logs from puppetmasterd.  That said, here are some candidate stack
traces, which are hopefully complete:

Related to locking:
/usr/lib/ruby/1.8/puppet/external/lock.rb:36:in `flush'
/usr/lib/ruby/1.8/puppet/external/lock.rb:36:in `lock_exclusive'
/usr/lib/ruby/1.8/puppet/util/file_locking.rb:38:in `writelock'
/usr/lib/ruby/1.8/puppet/util/file_locking.rb:37:in `open'
/usr/lib/ruby/1.8/puppet/util/file_locking.rb:37:in `writelock'
/usr/lib/ruby/1.8/sync.rb:230:in `synchronize'
/usr/lib/ruby/1.8/puppet/util/file_locking.rb:36:in `writelock'
/usr/lib/ruby/1.8/puppet/indirector/yaml.rb:40:in `save'
/usr/lib/ruby/1.8/puppet/indirector/indirection.rb:262:in `save'
/usr/lib/ruby/1.8/puppet/node/facts.rb:15:in `save'
/usr/lib/ruby/1.8/puppet/indirector.rb:65:in `save'
/usr/lib/ruby/1.8/puppet/indirector/catalog/compiler.rb:26:in
`extract_facts_from_request'
/usr/lib/ruby/1.8/puppet/indirector/catalog/compiler.rb:31:in `find'

Two different paths to a "fileserver module not mounted" exception:
/usr/lib/ruby/1.8/puppet/network/handler/fileserver.rb:401:in `splitpath'
/usr/lib/ruby/1.8/puppet/network/handler/fileserver.rb:236:in `convert'
/usr/lib/ruby/1.8/puppet/network/handler/fileserver.rb:133:in `list'
/usr/lib/ruby/1.8/rubygems/custom_require.rb:31
/usr/lib/ruby/1.8/puppet/network/xmlrpc/processor.rb:52:in `call'
/usr/lib/ruby/1.8/puppet/network/xmlrpc/processor.rb:52:in `protect_service'
/usr/lib/ruby/1.8/puppet/network/xmlrpc/processor.rb:85
/usr/lib/ruby/1.8/xmlrpc/server.rb:338:in `call'
/usr/lib/ruby/1.8/xmlrpc/server.rb:338:in `dispatch'
/usr/lib/ruby/1.8/xmlrpc/server.rb:325:in `each'
/usr/lib/ruby/1.8/xmlrpc/server.rb:325:in `dispatch'
/usr/lib/ruby/1.8/xmlrpc/server.rb:368:in `call_method'
/usr/lib/ruby/1.8/xmlrpc/server.rb:380:in `handle'
/usr/lib/ruby/1.8/puppet/network/xmlrpc/processor.rb:44:in `process'
/usr/lib/ruby/1.8/puppet/network/http/rack/xmlrpc.rb:35:in `process'

/usr/lib/ruby/1.8/puppet/network/handler/fileserver.rb:401:in `splitpath'
/usr/lib/ruby/1.8/puppet/network/handler/fileserver.rb:236:in `convert'
/usr/lib/ruby/1.8/puppet/network/handler/fileserver.rb:68:in `describe'
/usr/lib/ruby/1.8/rubygems/custom_require.rb:31
/usr/lib/ruby/1.8/puppet/network/xmlrpc/processor.rb:52:in `call'
/usr/lib/ruby/1.8/puppet/network/xmlrpc/processor.rb:52:in `protect_service'
/usr/lib/ruby/1.8/puppet/network/xmlrpc/processor.rb:85
/usr/lib/ruby/1.8/xmlrpc/server.rb:338:in `call'
/usr/lib/ruby/1.8/xmlrpc/server.rb:338:in `dispatch'
/usr/lib/ruby/1.8/xmlrpc/server.rb:325:in `each'
/usr/lib/ruby/1.8/xmlrpc/server.rb:325:in `dispatch'
/usr/lib/ruby/1.8/xmlrpc/server.rb:368:in `call_method'
/usr/lib/ruby/1.8/xmlrpc/server.rb:380:in `handle'
/usr/lib/ruby/1.8/puppet/network/xmlrpc/processor.rb:44:in `process'
/usr/lib/ruby/1.8/puppet/network/http/rack/xmlrpc.rb:35:in `process'

And there are a few of these:
/usr/lib/ruby/1.8/puppet/network/xmlrpc/processor.rb:52:in `call'
/usr/lib/ruby/1.8/puppet/network/xmlrpc/processor.rb:52:in `protect_service'
/usr/lib/ruby/1.8/puppet/network/xmlrpc/processor.rb:85
/usr/lib/ruby/1.8/xmlrpc/server.rb:338:in `call'
/usr/lib/ruby/1.8/xmlrpc/server.rb:338:in `dispatch'
/usr/lib/ruby/1.8/xmlrpc/server.rb:325:in `each'
/usr/lib/ruby/1.8/xmlrpc/server.rb:325:in `dispatch'
/usr/lib/ruby/1.8/xmlrpc/server.rb:368:in `call_method'
/usr/lib/ruby/1.8/xmlrpc/server.rb:380:in `handle'
/usr/lib/ruby/1.8/puppet/network/xmlrpc/processor.rb:44:in `process'
/usr/lib/ruby/1.8/puppet/network/http/rack/xmlrpc.rb:35:in `process'
/usr/lib/ruby/1.8/puppet/network/handler/master.rb:65:in `getconfig'
/usr/lib/ruby/1.8/rubygems/custom_require.rb:31

Jason

[1]  I'm not sure why there are no timestamps on the logged stack
traces.  Other logs that come directly from apache (such as the server
restart message) have timestamps.

-- 
"His face disappeared. If someone has no face left, you know it's serious."

-- 
You received this message because you are subscribed to the Google Groups 
"Puppet Users" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/puppet-users?hl=en.

Reply via email to