Re: [Puppet Users] Very frequent "Error: Could not request certificate: The certificate retrieved from the master does not match the agent's private key." on Windows

2016-10-13 Thread Fredrik Nilsson
Removed the excessive explicit run and seemingly, knock on wood, I 
provisioned 3 hosts with no certificate error so I think this made the 
trick. Thanks alot for pointing me in the right direction Josh!

Den onsdag 12 oktober 2016 kl. 21:06:39 UTC+2 skrev Fredrik Nilsson:
>
> The talking about the possibility of a race condition between my manual 
> execution and the Puppet service makes perfectly sense, I didn't realize 
> that it existed before I read your reply above. As a matter of fact the 
> powershell command described in my post is ran as a series of synchronous 
> powershell commands before Windows restarts one last time to enter its 
> normal state, as described briefly it is still in an installation automated 
> state when the command is executed. Anyway one of the commands before the 
> manual puppet run, that is the issue here, is the installation of the 
> puppet agent package, it is installed via chocolatey and supplied with the 
> host address to the master and the puppet ca. So my guess, bearing what you 
> described above in mind, either the service or the explicit powershell 
> command creates the keypair, that is almost immediately overwritten by the 
> other resulting in the error message described. I can't investigate this 
> using processexplorer as I am still in an automatic installation stage, but 
> first thing tomorrow I will remove the manual run altogether as I think it 
> is causing all the headache and is excessive as I presume that the Puppet 
> service is already on top of things I'll post back with the results! 
> Thanks Josh!
>
>
> Den onsdag 12 oktober 2016 kl. 18:32:14 UTC+2 skrev Josh Cooper:
>>
>>
>>
>> On Fri, Oct 7, 2016 at 12:33 AM, Fredrik Nilsson  
>> wrote:
>>
>>> Hi Guys,
>>>
>>> Hopefully one of you have a splendid idea on how to solve this...
>>>
>>> The problem is that I'm getting this error message a lot (to much is 
>>> more like it):
>>>
>>>
>>> *Error: Could not request certificate: The certificate retrieved from 
>>> the master does not match the agent's private key.Certificate fingerprint: 
>>> FINGERPRINT*
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> *To fix this, remove the certificate from both the master and the agent 
>>> and then start a puppet run, which will automatically regenerate a 
>>> certficate.On the master:  puppet cert clean SERVERNAMEOn the agent:  1a. 
>>> On most platforms: find C:/ProgramData/PuppetLabs/puppet/etc/ssl -name 
>>> SERVERNAME.pem -delete  1b. On Windows: del 
>>> "C:/ProgramData/PuppetLabs/puppet/etc/ssl/SERVERNAME" /f  2. puppet agent 
>>> -t*
>>>
>>> Some characteristics:
>>> This is on newly provisioned hosts (provisioned from Foreman)
>>> The machinses is running Windows Server of different flavours
>>> Puppet Agent version is 3.8.7 (upgrade to a 4 release is in the pipe)
>>> We have two VmWare clusters and this occurs on both (the checkbox for 
>>> time sync with hardware host is NOT checked)
>>>
>>> I actually had this problem from start, but back then it was so seldomly 
>>> occuring so I decided to live with it, say it occured like 1/20 or so 
>>> machines. But now it has escalated and it is rather 1/20 that got a working 
>>> certificate from start, actually when starting to banging my head against 
>>> the wall again yesterday I had two machines working, after adding an extra 
>>> timesync in the provisioning workflow, but that was shortlived happiness as 
>>> I've made 3 more machines after that with no success.
>>>
>>> So my first suspects on this was time and change of "security context", 
>>> but I think they're of the hook for the moment as I'm pretty confident in 
>>> that my time is right and that I to my knowledge have stayed in the same 
>>> security context.
>>>
>>> To make sure that I got the time right I have this runing under the 
>>> oobeSystem step in my provisioning workflow :
>>> *powershell.exe -noprofile -executionpolicy bypass -command "& 
>>> {Start-Service W32Time -ErrorAction SilentlyContinue; .\w32tm.exe /resync}"*
>>>
>>> After installing chocolatey and the puppet agent the agent phones home 
>>> like this (command composed from how this is done in the Linux half of our 
>>> department):
>>> *powershell.exe -noprofile -executionpolicy bypass -command " & {& 
>>> 'C:\Program Files\Puppet Labs\Puppet\bin\puppet.bat' agent -o --tags 
>>> no_such_tag --no-daemonize}"*
>>>
>>>
>> The (--no)--daemonize flags are actually meaningless on Windows, and 
>> awhile ago I changed the default value of daemonize to false on Windows 
>> .
>>
>> The reason is because services work differently on Windows than most 
>> *nix. On *nix, the process typically forks, creates a new session, 
>> detaching from the old one, etc. On Windows, the logic is inverted. The 
>> Service Control Manager starts the process and the process needs to 
>> communicate back with the SCM in a specific way. Rather than add SCM 
>> specific logic to puppet, we have a 

Re: [Puppet Users] Very frequent "Error: Could not request certificate: The certificate retrieved from the master does not match the agent's private key." on Windows

2016-10-12 Thread Fredrik Nilsson
The talking about the possibility of a race condition between my manual 
execution and the Puppet service makes perfectly sense, I didn't realize 
that it existed before I read your reply above. As a matter of fact the 
powershell command described in my post is ran as a series of synchronous 
powershell commands before Windows restarts one last time to enter its 
normal state, as described briefly it is still in an installation automated 
state when the command is executed. Anyway one of the commands before the 
manual puppet run, that is the issue here, is the installation of the 
puppet agent package, it is installed via chocolatey and supplied with the 
host address to the master and the puppet ca. So my guess, bearing what you 
described above in mind, either the service or the explicit powershell 
command creates the keypair, that is almost immediately overwritten by the 
other resulting in the error message described. I can't investigate this 
using processexplorer as I am still in an automatic installation stage, but 
first thing tomorrow I will remove the manual run altogether as I think it 
is causing all the headache and is excessive as I presume that the Puppet 
service is already on top of things I'll post back with the results! 
Thanks Josh!


Den onsdag 12 oktober 2016 kl. 18:32:14 UTC+2 skrev Josh Cooper:
>
>
>
> On Fri, Oct 7, 2016 at 12:33 AM, Fredrik Nilsson  > wrote:
>
>> Hi Guys,
>>
>> Hopefully one of you have a splendid idea on how to solve this...
>>
>> The problem is that I'm getting this error message a lot (to much is more 
>> like it):
>>
>>
>> *Error: Could not request certificate: The certificate retrieved from the 
>> master does not match the agent's private key.Certificate fingerprint: 
>> FINGERPRINT*
>>
>>
>>
>>
>>
>>
>>
>> *To fix this, remove the certificate from both the master and the agent 
>> and then start a puppet run, which will automatically regenerate a 
>> certficate.On the master:  puppet cert clean SERVERNAMEOn the agent:  1a. 
>> On most platforms: find C:/ProgramData/PuppetLabs/puppet/etc/ssl -name 
>> SERVERNAME.pem -delete  1b. On Windows: del 
>> "C:/ProgramData/PuppetLabs/puppet/etc/ssl/SERVERNAME" /f  2. puppet agent 
>> -t*
>>
>> Some characteristics:
>> This is on newly provisioned hosts (provisioned from Foreman)
>> The machinses is running Windows Server of different flavours
>> Puppet Agent version is 3.8.7 (upgrade to a 4 release is in the pipe)
>> We have two VmWare clusters and this occurs on both (the checkbox for 
>> time sync with hardware host is NOT checked)
>>
>> I actually had this problem from start, but back then it was so seldomly 
>> occuring so I decided to live with it, say it occured like 1/20 or so 
>> machines. But now it has escalated and it is rather 1/20 that got a working 
>> certificate from start, actually when starting to banging my head against 
>> the wall again yesterday I had two machines working, after adding an extra 
>> timesync in the provisioning workflow, but that was shortlived happiness as 
>> I've made 3 more machines after that with no success.
>>
>> So my first suspects on this was time and change of "security context", 
>> but I think they're of the hook for the moment as I'm pretty confident in 
>> that my time is right and that I to my knowledge have stayed in the same 
>> security context.
>>
>> To make sure that I got the time right I have this runing under the 
>> oobeSystem step in my provisioning workflow :
>> *powershell.exe -noprofile -executionpolicy bypass -command "& 
>> {Start-Service W32Time -ErrorAction SilentlyContinue; .\w32tm.exe /resync}"*
>>
>> After installing chocolatey and the puppet agent the agent phones home 
>> like this (command composed from how this is done in the Linux half of our 
>> department):
>> *powershell.exe -noprofile -executionpolicy bypass -command " & {& 
>> 'C:\Program Files\Puppet Labs\Puppet\bin\puppet.bat' agent -o --tags 
>> no_such_tag --no-daemonize}"*
>>
>>
> The (--no)--daemonize flags are actually meaningless on Windows, and 
> awhile ago I changed the default value of daemonize to false on Windows 
> .
>
> The reason is because services work differently on Windows than most *nix. 
> On *nix, the process typically forks, creates a new session, detaching from 
> the old one, etc. On Windows, the logic is inverted. The Service Control 
> Manager starts the process and the process needs to communicate back with 
> the SCM in a specific way. Rather than add SCM specific logic to puppet, we 
> have a daemon.rb shim 
> .
>  
> So the SCM runs rubyw.exe daemon.rb, and that runs puppet agent every 
> runinterval seconds.
>
> So back to the issue above. The problem is that `puppet agent 
> --no-daemonize` will run the agent so it connects to the puppet master 
> every 30 minutes! That command will block until 

Re: [Puppet Users] Very frequent "Error: Could not request certificate: The certificate retrieved from the master does not match the agent's private key." on Windows

2016-10-12 Thread Josh Cooper
On Fri, Oct 7, 2016 at 12:33 AM, Fredrik Nilsson 
wrote:

> Hi Guys,
>
> Hopefully one of you have a splendid idea on how to solve this...
>
> The problem is that I'm getting this error message a lot (to much is more
> like it):
>
>
> *Error: Could not request certificate: The certificate retrieved from the
> master does not match the agent's private key.Certificate fingerprint:
> FINGERPRINT*
>
>
>
>
>
>
>
> *To fix this, remove the certificate from both the master and the agent
> and then start a puppet run, which will automatically regenerate a
> certficate.On the master:  puppet cert clean SERVERNAMEOn the agent:  1a.
> On most platforms: find C:/ProgramData/PuppetLabs/puppet/etc/ssl -name
> SERVERNAME.pem -delete  1b. On Windows: del
> "C:/ProgramData/PuppetLabs/puppet/etc/ssl/SERVERNAME" /f  2. puppet agent
> -t*
>
> Some characteristics:
> This is on newly provisioned hosts (provisioned from Foreman)
> The machinses is running Windows Server of different flavours
> Puppet Agent version is 3.8.7 (upgrade to a 4 release is in the pipe)
> We have two VmWare clusters and this occurs on both (the checkbox for time
> sync with hardware host is NOT checked)
>
> I actually had this problem from start, but back then it was so seldomly
> occuring so I decided to live with it, say it occured like 1/20 or so
> machines. But now it has escalated and it is rather 1/20 that got a working
> certificate from start, actually when starting to banging my head against
> the wall again yesterday I had two machines working, after adding an extra
> timesync in the provisioning workflow, but that was shortlived happiness as
> I've made 3 more machines after that with no success.
>
> So my first suspects on this was time and change of "security context",
> but I think they're of the hook for the moment as I'm pretty confident in
> that my time is right and that I to my knowledge have stayed in the same
> security context.
>
> To make sure that I got the time right I have this runing under the
> oobeSystem step in my provisioning workflow :
> *powershell.exe -noprofile -executionpolicy bypass -command "&
> {Start-Service W32Time -ErrorAction SilentlyContinue; .\w32tm.exe /resync}"*
>
> After installing chocolatey and the puppet agent the agent phones home
> like this (command composed from how this is done in the Linux half of our
> department):
> *powershell.exe -noprofile -executionpolicy bypass -command " & {&
> 'C:\Program Files\Puppet Labs\Puppet\bin\puppet.bat' agent -o --tags
> no_such_tag --no-daemonize}"*
>
>
The (--no)--daemonize flags are actually meaningless on Windows, and awhile
ago I changed the default value of daemonize to false on Windows
.

The reason is because services work differently on Windows than most *nix.
On *nix, the process typically forks, creates a new session, detaching from
the old one, etc. On Windows, the logic is inverted. The Service Control
Manager starts the process and the process needs to communicate back with
the SCM in a specific way. Rather than add SCM specific logic to puppet, we
have a daemon.rb shim
.
So the SCM runs rubyw.exe daemon.rb, and that runs puppet agent every
runinterval seconds.

So back to the issue above. The problem is that `puppet agent
--no-daemonize` will run the agent so it connects to the puppet master
every 30 minutes! That command will block until you Ctrl-C. But your
powershell command is running puppet asynchronously. Process explorer is
handy for debugging that.

Later when the Service Control Manager starts the Puppet service, it is
going to race with the instance you started above. Due to race conditions
in puppet's SSL bootstrapping process, you can get into a situation where
one instance creates a keypair and submits a CSR. And before the cert is
signed, the second instance sees there's no cert, and generates a new key
pair, overwriting the old one. The first instance then downloads the signed
cert, which doesn't match the new key pair.

To fix the problem you'll want to run puppet using C:\Program Files\Puppet
Labs\Puppet\bin\puppet.bat' agent -o --tags no_such_tag --onetime` and make
the powershell command synchronous.


> The user loging on and running the commands are the local administrator
> account, to be extra thorough I logged on as that account trying to run a 
> *puppet
> agent -t *after the host is built, just to be sure there was no logon
> account related stuff going on, but no difference.
>
> Following the steps in the error message, generating a new certificate,
> ofcourse works, but we can all see the inconvinience of dowing that
> constantly on newly provisioned hosts, right?
>
> I think that sums things up quite good, as said I've been baning my head
> against this, while not ignoring it, could still be something fishy going
> on on the puppetmaster that is not managed by me, but me