Issue #16841 has been updated by Shane Madden.
In general terms, yes - though I'm afraid that it'd be a bit.. lumpy.. if I were to put a pull request together myself. The main points I'm seeing, based on the original [SRV record merge](https://github.com/puppetlabs/puppet/commit/26ce9c79672d578e9aa03d8341d8c315fcf30c8b): lib/puppet/indirector/facts/rest.rb: use_srv_service(:inventory) lib/puppet/network/resolver.rb: case service_name ... when :inventory then service = '_x-puppet-inventory' That seems like most of what's needed, aside from updating the spec files to account for this new case, which I'd just be stumbling through at this point. ---------------------------------------- Bug #16841: inventory_server setting ignored when using SRV records https://projects.puppetlabs.com/issues/16841#change-74347 Author: Shane Madden Status: Accepted Priority: Normal Assignee: eric sorenson Category: Target version: Affected Puppet version: 3.0.0 Keywords: Branch: Having `use_srv_records = true` configured causes the `inventory_server` setting to be ignored, and the generic `_x-puppet._tcp.$srvdomain` lookup to be used instead. This is a problem because the set of masters which should serve inventory will not necessarily match with the set of masters that are in the SRV record. This can cause bad behavior. Say a master's configured to point to a SRV domain with two equal-weight records; itself (puppet-m.example.com) and the real inventory server (puppet-i.example.com). DNS for example.com: _x-puppet._tcp IN SRV 0 5 8140 puppet-i _x-puppet._tcp IN SRV 0 5 8140 puppet-m puppet.conf: [main] use_srv_records = true srv_domain = example.com [master] facts_terminus = inventory_service inventory_server = puppet-i.example.com inventory_port = 8140 When puppet-m.example.com has a node check in looking for a catalog, it makes the attempt to pull the facts from the inventory service. It ignores the `inventory_server` config and goes straight for the SRV records: Debug: Searching for SRV records for domain: example.com Debug: Found 2 SRV records for: _x-puppet._tcp.example.com So, because of the SRV record's equal-weight setup, the random 50/50 chance on `puppet-m` determines whether "myself" or "the right inventory server" gets checked. If it looks first to the real inventory server, all is well and the catalog is returned without error. If the random number generator picks itself, there can be problems: - If auth.conf on puppet-m blocks calls to `/facts` (the default), the 403 returned by the local request halts the attempt at looking up facts, which falls back to cache; the correct inventory server is not attempted. - If auth.conf on puppet-m allows calls to `/facts`, it considers the request coming in (from itself, to itself) as a new API call that must be checked against the inventory server - the SRV record coin flip is repeated for this "new" API call. Until the correct server is randomly selected, connections to the local system are recursively created. With more masters or different weights, the odds of a successful request against the correct server are lower. - In the previous example, if puppet-i.example.com is down or set to a lower priority in the SRV record, then the local recursive connections instead keep piling up until the master grinds to a halt. The master is then unresponsive for a few minutes until connections time out. Debug: Searching for SRV records for domain: example.com Debug: Found 2 SRV records for: _x-puppet._tcp.example.com Debug: Yielding next server of puppet-i.example.com:8140 Debug: facts supports formats: b64_zlib_yaml pson raw yaml; using pson Debug: facts supports formats: b64_zlib_yaml pson raw yaml; using pson Debug: facts supports formats: b64_zlib_yaml pson raw yaml; using pson Warning: Error connecting to puppet-i.example.com:8140: Connection refused - connect(2) Debug: Yielding next server of puppet-m.example.com:8140 Debug: facts supports formats: b64_zlib_yaml pson raw yaml; using pson Debug: facts supports formats: b64_zlib_yaml pson raw yaml; using pson Debug: facts supports formats: b64_zlib_yaml pson raw yaml; using pson Debug: Searching for SRV records for domain: example.com Debug: Found 2 SRV records for: _x-puppet._tcp.example.com Debug: Yielding next server of puppet-i.example.com:8140 Debug: facts supports formats: b64_zlib_yaml pson raw yaml; using pson Debug: facts supports formats: b64_zlib_yaml pson raw yaml; using pson Debug: facts supports formats: b64_zlib_yaml pson raw yaml; using pson Warning: Error connecting to puppet-i.example.com:8140: Connection refused - connect(2) Debug: Yielding next server of puppet-m.example.com:8140 Debug: facts supports formats: b64_zlib_yaml pson raw yaml; using pson Debug: facts supports formats: b64_zlib_yaml pson raw yaml; using pson Debug: facts supports formats: b64_zlib_yaml pson raw yaml; using pson Debug: Searching for SRV records for domain: example.com Debug: Found 2 SRV records for: _x-puppet._tcp.example.com Debug: Yielding next server of puppet-i.example.com:8140 Debug: facts supports formats: b64_zlib_yaml pson raw yaml; using pson Debug: facts supports formats: b64_zlib_yaml pson raw yaml; using pson Debug: facts supports formats: b64_zlib_yaml pson raw yaml; using pson This repeats many times, then draws to a halt when a certain number (apparently 100) of waiting connections are reached. Then, a couple minutes later: Error: execution expired Error: execution expired Error: execution expired Error: execution expired Error: execution expired Error: execution expired ... The agent that tried to fetch its catalog reports: Warning: Error connecting to puppet-m.example.com:8140: Connection reset by peer - SSL_connect Stopping the stalled master in the time before the `Error: execution expired` errors start results an exactly-100-long chain of HTTP 400 errors: Error: Could not retrieve facts for node.example.com: Error 400 on SERVER: Error 400 on SERVER: ..snip for readability.. Error 400 on SERVER: Error 400 on SERVER: Connection refused - connect(2) Currently, the only way to avoid this issue is to not allow the non-inventory masters to use the same `srv_domain` as clients; they either need their own special `srv_domain` or to not be using SRV records at all. I can think of two good options to address this: - Add a new SRV record to point to the inventory service, similar to what already exists for `ca`, `report`, and `fileserver`, to allow for the inventory service location to be differentiated from the generic master pool: _x-puppet-inventory._tcp.$srv_domain - Stop using SRV records for the `inventory_server` lookup; let it use the configured value instead. -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here: http://projects.puppetlabs.com/my/account -- You received this message because you are subscribed to the Google Groups "Puppet Bugs" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/puppet-bugs?hl=en.
