On Mon, 1 Jan 2018, at 23:51, Matt Wise wrote:
> *Puppet Agent: 5.3.2*
> *Puppet Server: 5.1.4 - Packaged in Docker, running on Amazon ECS*
> 
> So we've recently started rolling over from our ancient Puppet 3.x system
> to a new Puppet 5.x service. The new service consists of a PuppetServer
> Docker Image (5.1.4) running in Amazon ECS, and our hosts booting up and
> running Puppet Agent 5.3.2. At this point in the migration, we're running
> ~150-200 hosts on the new Puppet5 system and we replace ~30-80 of them
> daily.
> 
> We are currently tracking down a problem with our PuppetServers and their
> memory usage, which is causing the containers to be OOM'd a few times a day
> (~10 OOMs a day across ~20 containers). While we know that we need to fix
> this, we've seen a scary behavior on the Puppet Agent side that we could
> use some advice with.
> 
> It seems that at least a few times a day now we will get a server hung in
> the boot process. The `puppet agent -t ...` process will just hang midway
> through the run. It seems that these hangs happen when the backend
> underlying PuppetServer process that they were connected to gets OOMed and
> goes away. Obviously the OOM is a problem.. but frankly I am more concerned
> with the Puppet Agent getting wedged for hours and hours without making any
> progress.
> 
> It seems that when this failure happens, the puppet agent does not ever
> time out. It never fails, or throws an error. It just hangs. We've had
> these hangs last upwards of 4-5 hours before our systems are automatically
> terminated.
> 
> We've enabled debug logging, but haven't caught one of these failures yet
> with debug mode turned on. In the mean time, are there any  known
> regressions or configuration tweaks we need to make to Puppet Agent 5.x
> more quick to fail or resilient in this case? I could obviously try to
> build in some wrapper around Puppet to catch this behavior .. but I am
> hoping that there are just some settings we need to tweak.

I see this often for other kinds of interruptions like network interruptions etc

I do recall a number of bugs around this to make it more robust, you might want 
to try searching Puppet jita 


-- 
R.I.Pienaar / www.devco.net / @ripienaar

-- 
You received this message because you are subscribed to the Google Groups 
"Puppet Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/puppet-users/1514847264.1185405.1221159992.28D2AE6B%40webmail.messagingengine.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to