On Tue, 17 May 2016, Eric Sorenson wrote:
Hi Dan, this is a good and timely post.
I apologize for the lack of response. Health issues have taken a front
seat for a while.
I'm working on some related issues regarding Puppet's CA that may help
you out. Your thinking on this is roughly correct -- things are a lot
harder than they need to be, but the above advice to nuke everything and
start over is both overly simplistic and wrong-headed.
Funny, that's pretty much exactly the advice I'm seeing here:
https://docs.puppet.com/puppet/4.4/reference/ssl_regenerate_certificates.html
Once you blow away your old CA, *none* of your agents work. If you've
been running puppet for five years, not everything is due to expire all at
once.
I've found a way forward that I think is reasonably clever, and I'll go
into it below.
Note that my comments here are specifically about the Clojure CA that
is included in puppetserver, not the Ruby CA; most things apply to both
but the past couple of years of server-side bugfixes and development energy
have gone into the Clojure CA, and Puppet 5 will consolidate
all the CA-side cert lifecycle onto this codebase.
I'm largely running Puppet 3.8, open source. My certs expire sooner than
5 will be released.
You're right that the agent SSL code is very old and badly needs an overhaul.
* There's still no support for putting multiple certificate files as the
puppet CA -- all must still be signed by a common root entity. Is this
correct? (In the "web" analogy, my browser could have lots of built-in
and additional trust-points, both corporate and as-shipped).
Have you verified experientially that this doesn't work in current Puppet
versions?
I have verified that multiple root certificates in the file will at the
very least not crash the agent. Which means, I guess, if you're rolling
from one master to another, you can seed out a ca file with two roots in
it, via puppet itself (but *not* via the auto-download).
You're right that the agent does not support a CApath, in openssl parlance: a
directory
of hashed CA certs, any of which are valid. The server side farms out its SSL
verification
to the underlying web stack, so it ought to be tolerant of agents issued from
multiple CAs checking in. I haven't tried this angle yet.
Not every OS uses CA pathing. Some of the linuxes do. FreeBSD uses a
monolithic cert. As far as I understood it, it's a function of the
underlying SSL library. It would be nice -- at least that way you could
deploy certs atomically.
* There's no directive I can find whereby puppet agents can, within N days
of expiry, re-request their certificate, while maintaining a valid one in
the meantime. On the puppet master, a duplicate cert is treated as an
absolute error and must be purged from both sides with extreme prejudice
and started over.
The first part is true, the second is controlled by the 'allow-duplicate-certs'
CA setting
which will allow later requests to overwrite newer ones.
I think you mean older? Also, do you really mean overwrite, or do you
mean the two certs can coexist?
Presumably, the master needs to keep a list of everything it's signed, so
that it can later revoke a given certificate. That's all listed in the
inventory.txt file, as well as a copy cached in the "signed" directory.
As I said above, on the master the cert verification is delegated to the
web server layer (jetty in the case of the puppetserver, apache or nginx
or (gah) webrick for non-puppetserver setups).
There's nothing built-in that does either of these things. But policy-based
autosigning provides an API that lets you do this based on some
'a priori' knowledge you have of the node:
https://docs.puppet.com/puppet/4.4/reference/ssl_autosign.html
Frustrating. When you're using an external signing executable -- unless
you've had the forethought to tell all your agents to see extra,
non-default information into their CSR, the only info you get is the CSR
and the common-name.
This is an interesting line of thought that I'm looking into more on the CA
side:
you can re-use the same private key to generate a new certificate that would
have an extended lifetime but not require a complete re-key.
There isn't an API guarantee that agents' certs (and therefore their public
keys)
are collected and stored in the CA, though such a thing would be very useful
and is on deck for future work - you can see the whole list of these things at
https://tickets.puppetlabs.com/browse/SERVER-974
* We blindly trust the first CA we get (using the default options), but
then have NO real method for accepting a second CA without manually
manipulating the CA files directly. (DANE, anyone?)
I don't know what DANE means in this context, but this statement is true.
DANE (and the DNS TLSA Record) is a way of putting your SSL certificate
(either your private key, or your entire cert) into the DNS. It hinges on
using DNSSEC as a trust-anchor point, rather than a pre-seeded cert. The
idea is that with a validating resolver, as you'd have in an enterprise,
you'll either get the correct record, or servfail. Postfix can make use
of this, and there's a standards-track RFC for it as well. (RFC6698, and
others). Assuming you have good DNSSEC, it's a useful method of
bootstrapping your trust chain -- similar to putting SSHFP records in DNS.
I have a WIP doc adding support for intermediate CAs up at:
https://gist.github.com/ahpook/06d4cfda1d68c08bc82fbfdc40123b28
This may help as many of the goals (and the corresponding implementation bugs
or missing features) overlap with your use-case.
Somewhat, but I went a slightly different way.
I'm definitely interested in keeping in touch as you go through this, to
make things easier for other people. I think a lot of sites are coming
up on the five year anniversary of their installation and the easier
this kind of re-keying can be, the better.
So, here's something insane I discovered today.
Puppet defaults all certs. Including the CA. And the cert on the master.
To five years.
Go open firefox, and look at the expiry dates of the CA certs they ship.
Some of them are valid till the 2030's. A root cert expiring is a major,
major pain.
No commercial issuer will issue a cert with an expiry date past the expiry
of its parent.
The Puppet CA will actually issue certs that have validity dates outside
the validity of the root cert. If you have had your root cert for four
years, a cert you request now will be valid for four years after your root
cert expires (and thus, won't validate anyway).
That said, we can take advantage of that, as long as we pay attention to
something fundamental in SSL. Keys don't expire. Signatures do. And
certificates don't sign other certificates -- private keys sign certs.
So this was my plan of attack, which mostly worked.
0) Back everything up.
1) Get onto the master, and generate a new CA certificate, using the exact
same private key:
openssl req -new -key ca_key.pem -out /tmp/ca_cert.csr -subj '/CN=Puppet
CA: pm.foo.org'
openssl x509 -req -in /tmp/ca_cert.csr -signkey
/var/puppet/ssl/ca/ca_key.pem -days 3500 -out /tmp/ca_crt.pem -extfile
/etc/ssl/openssl.cnf -extensions v3_ca
...and then copied it into place in /var/puppet.
We're re-using existing keying material. Assuming our puppetmaster hasn't
been compromised, there's not a lot more risk here than there would have
been if we had originally signed our cert for ten years. The main loss
here is that it's still 1024 bit and sha1 -- which I consider acceptable,
at least to buy some time. (We can't change the keysize and still keep
the same key).
2) Since the puppetmaster's SSL cert is only a few minutes younger than
the CA cert, we need a new one of those too.
I tried manually regenerating those, but puppet refused to start.
However, letting puppet magically build its own DID work.
service puppetmaster stop
puppet cert clean pm.isc.org
service puppetmaster start
3) Experimentally replaced the new ca_crt.pem from the puppetmaster onto a
few agents. No expiry warnings were issued -- the new dates were honored.
4) Tried doing an agent run on machines where I hadn't replaced the CA
file. I got the expiry warning, but an agent run still worked.
5) Defined a fairly simple manifest that replaced
/var/puppet/ssl/ca/ca.pem with my new version, and ran it in a few nodes.
It worked -- nodes that had been warning previously, no longer did.
So, on my TODO list, is the following:
* Look over the inventory.txt file, parsing it with some code, such that
we look at the *last* match for a given host to find when a machine
expires. (By definition, no machine will expire sooner than the CA was
due to, so that was my major hurdle).
* Use that to trigger an SSH push to delete and regenerate my certs, for
those machines. This can go in cron, perhaps only running during business
hours. Rather than using any kind of policy-based-autosigning, such a
tool will simply chain the commands (clean the agent, clean the master,
run the agent, sign the cert, rerun the agent to make sure all's well).
Note that this way, as I've deployed puppet more widely in my
organization, I don't lose any of the machines that I've enrolled in the
last four years, and I get a nice rolling upgrade, instead of
everything-at-once.
Here's what puppet needs to fix -- rather than open a bunch of feature
requests, I'll list some things here, and see if you think they're viable.
* The five year metric for a root ca cert needs a re-thinking. I'm sure
all the hip young kids don't expect to have a vm in "the cloud" last
longer than a few months, but this stuff, for us, is critical
infrastructure.
* Puppet needs to support varying trust models. Intermediate certs.
Commercially signed certs. Cases where you have two distinct CA's and we
can trust a thing signed by either.
* Puppet needs a metric to trust our initial CA file, and to roll that
trust forward to additional CA files.
* Puppet needs an API-based method to re-request a cert, perhaps keeping
one active while another is pending. If I can go to the DMV and request a
new drivers license, before my old one expires, and they're all still
running windows XP, we can figure this out.
* More data needs to be exposed to the policy-based autosigner. Looking
at the example immediately above, your pending-expiry cert could serve as
a useful passport to get a new cert. Exposing the connecting IP is
useful, as well as things that are machine-unique like mac-addresses, and
the sha256 sum of ssh private keys, which would need to match prior
reports. One could even build a "points" based system that autosigns
things conditionally depending on how much proof is presented.
* Also useful would be being able to use a DNS TXT record as a
bootstrapping method for additional information to supply in the CSR.
Because right now, we're installing a totally stock puppet package, and
having to preconfigure a bunch of things sort of defeats the purpose of
having puppet.
* Puppet really needs to not sign things with dates that exceed ca-expiry
(at least, not by default). Trying to sign something when your root cert
is due to expire within, say, three months, should yield a warning.
* The CA warnings need to happen on the master, as well as on the agents.
The master is where we're typing the signing command. There's LOTS of
stuff going by on the agent and it's easy to miss.
* Rather than (or in addition to) a CRL, perhaps an OCSP responder?
* DANE. Really. It's good stuff.
Thanks for all your help. If you spot any issues with the above, please
let me know.
-Dan
--