Issue #16841 has been updated by Shane Madden.

In general terms, yes - though I'm afraid that it'd be a bit.. lumpy.. if I 
were to put a pull request together myself.

The main points I'm seeing, based on the original [SRV record 
merge](https://github.com/puppetlabs/puppet/commit/26ce9c79672d578e9aa03d8341d8c315fcf30c8b):

lib/puppet/indirector/facts/rest.rb:

    use_srv_service(:inventory)

lib/puppet/network/resolver.rb:

    case service_name
      ...
      when :inventory then service = '_x-puppet-inventory'

That seems like most of what's needed, aside from updating the spec files to 
account for this new case, which I'd just be stumbling through at this point.
----------------------------------------
Bug #16841: inventory_server setting ignored when using SRV records
https://projects.puppetlabs.com/issues/16841#change-74347

Author: Shane Madden
Status: Accepted
Priority: Normal
Assignee: eric sorenson
Category: 
Target version: 
Affected Puppet version: 3.0.0
Keywords: 
Branch: 


Having `use_srv_records = true` configured causes the `inventory_server` 
setting to be ignored, and the generic `_x-puppet._tcp.$srvdomain` lookup to be 
used instead.  This is a problem because the set of masters which should serve 
inventory will not necessarily match with the set of masters that are in the 
SRV record.

This can cause bad behavior.  Say a master's configured to point to a SRV 
domain with two equal-weight records; itself (puppet-m.example.com) and the 
real inventory server (puppet-i.example.com).

DNS for example.com:

    _x-puppet._tcp IN SRV 0 5 8140 puppet-i
    _x-puppet._tcp IN SRV 0 5 8140 puppet-m

puppet.conf:

    [main]
        use_srv_records = true
        srv_domain = example.com
    [master]
        facts_terminus = inventory_service
        inventory_server = puppet-i.example.com
        inventory_port = 8140

When puppet-m.example.com has a node check in looking for a catalog, it makes 
the attempt to pull the facts from the inventory service.  It ignores the 
`inventory_server` config and goes straight for the SRV records:

    Debug: Searching for SRV records for domain: example.com
    Debug: Found 2 SRV records for: _x-puppet._tcp.example.com

So, because of the SRV record's equal-weight setup, the random 50/50 chance on 
`puppet-m` determines whether "myself" or "the right inventory server" gets 
checked.  If it looks first to the real inventory server, all is well and the 
catalog is returned without error.  If the random number generator picks 
itself, there can be problems:

 - If auth.conf on puppet-m blocks calls to `/facts` (the default), the 403 
returned by the local request halts the attempt at looking up facts, which 
falls back to cache; the correct inventory server is not attempted.
 - If auth.conf on puppet-m allows calls to `/facts`, it considers the request 
coming in (from itself, to itself) as a new API call that must be checked 
against the inventory server - the SRV record coin flip is repeated for this 
"new" API call.  Until the correct server is randomly selected, connections to 
the local system are recursively created.  With more masters or different 
weights, the odds of a successful request against the correct server are lower.
 - In the previous example, if puppet-i.example.com is down or set to a lower 
priority in the SRV record, then the local recursive connections instead keep 
piling up until the master grinds to a halt.  The master is then unresponsive 
for a few minutes until connections time out.

        Debug: Searching for SRV records for domain: example.com
        Debug: Found 2 SRV records for: _x-puppet._tcp.example.com
        Debug: Yielding next server of puppet-i.example.com:8140
        Debug: facts supports formats: b64_zlib_yaml pson raw yaml; using pson
        Debug: facts supports formats: b64_zlib_yaml pson raw yaml; using pson
        Debug: facts supports formats: b64_zlib_yaml pson raw yaml; using pson
        Warning: Error connecting to puppet-i.example.com:8140: Connection 
refused - connect(2)
        Debug: Yielding next server of puppet-m.example.com:8140
        Debug: facts supports formats: b64_zlib_yaml pson raw yaml; using pson
        Debug: facts supports formats: b64_zlib_yaml pson raw yaml; using pson
        Debug: facts supports formats: b64_zlib_yaml pson raw yaml; using pson
        Debug: Searching for SRV records for domain: example.com
        Debug: Found 2 SRV records for: _x-puppet._tcp.example.com
        Debug: Yielding next server of puppet-i.example.com:8140
        Debug: facts supports formats: b64_zlib_yaml pson raw yaml; using pson
        Debug: facts supports formats: b64_zlib_yaml pson raw yaml; using pson
        Debug: facts supports formats: b64_zlib_yaml pson raw yaml; using pson
        Warning: Error connecting to puppet-i.example.com:8140: Connection 
refused - connect(2)
        Debug: Yielding next server of puppet-m.example.com:8140
        Debug: facts supports formats: b64_zlib_yaml pson raw yaml; using pson
        Debug: facts supports formats: b64_zlib_yaml pson raw yaml; using pson
        Debug: facts supports formats: b64_zlib_yaml pson raw yaml; using pson
        Debug: Searching for SRV records for domain: example.com
        Debug: Found 2 SRV records for: _x-puppet._tcp.example.com
        Debug: Yielding next server of puppet-i.example.com:8140
        Debug: facts supports formats: b64_zlib_yaml pson raw yaml; using pson
        Debug: facts supports formats: b64_zlib_yaml pson raw yaml; using pson
        Debug: facts supports formats: b64_zlib_yaml pson raw yaml; using pson

This repeats many times, then draws to a halt when a certain number (apparently 
100) of waiting connections are reached.  Then, a couple minutes later:

    Error: execution expired
    Error: execution expired
    Error: execution expired
    Error: execution expired
    Error: execution expired
    Error: execution expired
    ...

The agent that tried to fetch its catalog reports:

    Warning: Error connecting to puppet-m.example.com:8140: Connection reset by 
peer - SSL_connect

Stopping the stalled master in the time before the `Error: execution expired` 
errors start results an exactly-100-long chain of HTTP 400 errors:

    Error: Could not retrieve facts for node.example.com: Error 400 on SERVER: 
Error 400 on SERVER: ..snip for readability.. Error 400 on SERVER: Error 400 on 
SERVER: Connection refused - connect(2)

Currently, the only way to avoid this issue is to not allow the non-inventory 
masters to use the same `srv_domain` as clients; they either need their own 
special `srv_domain` or to not be using SRV records at all.

I can think of two good options to address this:

 - Add a new SRV record to point to the inventory service, similar to what 
already exists for `ca`, `report`, and `fileserver`, to allow for the inventory 
service location to be differentiated from the generic master pool:

        _x-puppet-inventory._tcp.$srv_domain

 - Stop using SRV records for the `inventory_server` lookup; let it use the 
configured value instead.


-- 
You have received this notification because you have either subscribed to it, 
or are involved in it.
To change your notification preferences, please click here: 
http://projects.puppetlabs.com/my/account

-- 
You received this message because you are subscribed to the Google Groups 
"Puppet Bugs" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/puppet-bugs?hl=en.

Reply via email to