[Puppet - Bug #12623] Long timeout for SRV DNS rsolution

tickets Tue, 14 Feb 2012 14:52:00 -0800

Issue #12623 has been updated by Chris Price.


Discussed this with Josh.  There is a configuration option that you can use to 
turn this off; when I get a minute, I'm going to re-break my DNS, confirm that 
I see the slowdown again, and then try running with that option turned off just 
to confirm.

Pending the outcome of that experiment, we may want to reconsider whether or 
not this is enabled by default.

One other bit of info: I now have a suspicion that this affects EL flavors of 
linux out-of-the-box much more readily than Debian flavors.  I do not recall 
having this problem with my ubuntu 10.04 VM, but did have it with Cent5 and 
Cent6.  My theory is that "hostname" returns (for all intents and purposes) the 
contents of /etc/hostname on debian systems, and debian doesn't seem to try to 
put any domain name in there.  Meanwhile, on the EL flavors, "hostname" seems 
to return what's in /etc/sysconfig/networking... which, for me, on 
out-of-the-box Cent installs appears to be "hostname.localdomain".  Facter then 
sees this as the FQDN and things get weird from there... all of this could use 
further investigation, though.
----------------------------------------
Bug #12623: Long timeout for SRV DNS rsolution 
https://projects.puppetlabs.com/issues/12623#change-54700

Author: Chris Price
Status: Unreviewed
Priority: Normal
Assignee: 
Category: network
Target version: 
Affected Puppet version: development
Keywords: 
Branch: 


This issue has come up for me several times when setting up nodes for 
acceptance tests.  It is probably most easily reproducible with a fresh copy of 
our CentOS6 VM.

In /etc/sysconfig/network there is a line that looks like this:

<pre>
HOSTNAME=pe-centos6.localdomain
</pre>

This causes facter to return that same string for "fqdn".  Then, when you 
attempt to use this VM as an agent node for acceptance tests, it blocks for an 
*extremely* long time during the 03_ValidateSignCerts phase.  Running the 
offending commands outside of the acceptance test framework:

<pre>
puppet master --dns_alt_names="puppet,$(hostname -s),$(hostname -f)" --verbose 
--no-daemonize --logdest=/var/lib/puppet/log/puppetmaster.log --debug --trace
</pre>

and

<pre>
puppet agent --test --debug
</pre>

Will reproduce the slowness, and you'll see the following in the agent output:

<pre>
debug: Searching for SRV records for domain: localdomain
debug: Found 0 SRV records for: _x-puppet-report._tcp.localdomain
</pre>

There may be a delay of 5 minutes or more in between those two lines being 
printed, however.

The code that this occurs in is in lib/puppet/network/resolver.rb, in the 
method each_srv_record.  However, this code is simply calling into the ruby 
Resolve::DNS.getresources() method.  Reviewing the documentation for this 
method, I don't see a way to specify a timeout interval.

I'm also not entirely sure whether reducing the timeout is the correct 
solution, or if there is some other alternative.

As a workaround, simply fixing the "HOSTNAME" line in /etc/sysconfig/network so 
that it does not contain the domain name seems to dramatically improve the 
performance.


-- 
You have received this notification because you have either subscribed to it, 
or are involved in it.
To change your notification preferences, please click here: 
http://projects.puppetlabs.com/my/account

-- 
You received this message because you are subscribed to the Google Groups 
"Puppet Bugs" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/puppet-bugs?hl=en.

[Puppet - Bug #12623] Long timeout for SRV DNS rsolution

Reply via email to