On Fri, Oct 7, 2016 at 12:33 AM, Fredrik Nilsson <soccyp...@icloud.com>

> Hi Guys,
> Hopefully one of you have a splendid idea on how to solve this...
> The problem is that I'm getting this error message a lot (to much is more
> like it):
> *Error: Could not request certificate: The certificate retrieved from the
> master does not match the agent's private key.Certificate fingerprint:
> *To fix this, remove the certificate from both the master and the agent
> and then start a puppet run, which will automatically regenerate a
> certficate.On the master:  puppet cert clean SERVERNAMEOn the agent:  1a.
> On most platforms: find C:/ProgramData/PuppetLabs/puppet/etc/ssl -name
> SERVERNAME.pem -delete  1b. On Windows: del
> "C:/ProgramData/PuppetLabs/puppet/etc/ssl/SERVERNAME" /f  2. puppet agent
> -t*
> Some characteristics:
> This is on newly provisioned hosts (provisioned from Foreman)
> The machinses is running Windows Server of different flavours
> Puppet Agent version is 3.8.7 (upgrade to a 4 release is in the pipe)
> We have two VmWare clusters and this occurs on both (the checkbox for time
> sync with hardware host is NOT checked)
> I actually had this problem from start, but back then it was so seldomly
> occuring so I decided to live with it, say it occured like 1/20 or so
> machines. But now it has escalated and it is rather 1/20 that got a working
> certificate from start, actually when starting to banging my head against
> the wall again yesterday I had two machines working, after adding an extra
> timesync in the provisioning workflow, but that was shortlived happiness as
> I've made 3 more machines after that with no success.
> So my first suspects on this was time and change of "security context",
> but I think they're of the hook for the moment as I'm pretty confident in
> that my time is right and that I to my knowledge have stayed in the same
> security context.
> To make sure that I got the time right I have this runing under the
> oobeSystem step in my provisioning workflow :
> *powershell.exe -noprofile -executionpolicy bypass -command "&
> {Start-Service W32Time -ErrorAction SilentlyContinue; .\w32tm.exe /resync}"*
> After installing chocolatey and the puppet agent the agent phones home
> like this (command composed from how this is done in the Linux half of our
> department):
> *powershell.exe -noprofile -executionpolicy bypass -command " & {&
> 'C:\Program Files\Puppet Labs\Puppet\bin\puppet.bat' agent -o --tags
> no_such_tag --no-daemonize}"*
The (--no)--daemonize flags are actually meaningless on Windows, and awhile
ago I changed the default value of daemonize to false on Windows

The reason is because services work differently on Windows than most *nix.
On *nix, the process typically forks, creates a new session, detaching from
the old one, etc. On Windows, the logic is inverted. The Service Control
Manager starts the process and the process needs to communicate back with
the SCM in a specific way. Rather than add SCM specific logic to puppet, we
have a daemon.rb shim
So the SCM runs rubyw.exe daemon.rb, and that runs puppet agent every
runinterval seconds.

So back to the issue above. The problem is that `puppet agent
--no-daemonize` will run the agent so it connects to the puppet master
every 30 minutes! That command will block until you Ctrl-C. But your
powershell command is running puppet asynchronously. Process explorer is
handy for debugging that.

Later when the Service Control Manager starts the Puppet service, it is
going to race with the instance you started above. Due to race conditions
in puppet's SSL bootstrapping process, you can get into a situation where
one instance creates a keypair and submits a CSR. And before the cert is
signed, the second instance sees there's no cert, and generates a new key
pair, overwriting the old one. The first instance then downloads the signed
cert, which doesn't match the new key pair.

To fix the problem you'll want to run puppet using C:\Program Files\Puppet
Labs\Puppet\bin\puppet.bat' agent -o --tags no_such_tag --onetime` and make
the powershell command synchronous.

> The user loging on and running the commands are the local administrator
> account, to be extra thorough I logged on as that account trying to run a 
> *puppet
> agent -t *after the host is built, just to be sure there was no logon
> account related stuff going on, but no difference.
> Following the steps in the error message, generating a new certificate,
> ofcourse works, but we can all see the inconvinience of dowing that
> constantly on newly provisioned hosts, right?
> I think that sums things up quite good, as said I've been baning my head
> against this, while not ignoring it, could still be something fishy going
> on on the puppetmaster that is not managed by me, but me colleauges in the
> linux neighborhood don't ecperience this so it is seemingly something to do
> with the Windows hosts.
> Cheers,
> Fredrik
> --
> You received this message because you are subscribed to the Google Groups
> "Puppet Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to puppet-users+unsubscr...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/puppet-users/56a91341-3509-403a-8eb7-e88d903eb02f%40googlegroups.com
> <https://groups.google.com/d/msgid/puppet-users/56a91341-3509-403a-8eb7-e88d903eb02f%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.

Josh Cooper
Developer, Puppet

PuppetConf 2016 <https://puppet.com/puppetconf>, 19 - 21 October, San
Diego, California
*Register to attend or sign up to view the Live Stream

You received this message because you are subscribed to the Google Groups 
"Puppet Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to puppet-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
For more options, visit https://groups.google.com/d/optout.

Reply via email to