Re: [Puppet-dev] Handling failures in prefetch

Trevor Vaughan Thu, 23 Jul 2015 06:09:38 -0700

Just out of curiosity, how difficult would it be to defer the resource
chain until the end of the catalog if a prefetch fails?


It may not work in all cases, but it's certainly feasable that the prefetch
is failing because of some other item in the catalog that hasn't been
applied and, given the complexity of the system, the amount of effort that
it would involve to proper chain all resources just isn't worth the cost of
doing two runs to fix the issue. I'm particularly thinking of things like
package repositories here where you really don't want to artificially bind
all packages to your repositories for no good reason and/or you may be
pulling data from a different source that providers are going to rely on
(RH Satellite, Pulp, something else?).

Since it's a prefetch, and nothing has yet changed on the system, could
that execution chain just be deferred until just before the post-execution
section of the catalog? If this were possible, the impact of a catalog
failure would, in theory, be minimized with everything that could be
applied happening before those items that could not.

Even then, I think that all resource failures should be localized to the
resource, and resource chain, to which they apply. The *best* thing about
Puppet is the fact that it does as much as it can on every run. That really
shouldn't be broken in my opinion. My classic example is "Puppet applies my
security settings across the system, regardless of Apache failing to
install or configure". In this case, Puppet should apply my security
settings regardless of some random resource having a prefetch failure.

Circling this all the way around to my favorite pet bug that I never have
time to work on PUP-4002, in the case where prefetch fails, but the
provider (or a dependent of the provider) has a way to handle this later,
recording it for all instance of the resource to query later would be a
nice to have. I'm thinking of the 'cron' type when I write this since the
prefetch phase of cron appears to fail if the crontab file for a user does
not exist. But, it doesn't matter in the least because the provider merrily
carries on and does what it is supposed to do after raising an alarming,
but harmless, error message. (Note: I haven't checked the 4.X behavior on
this yet).

Thanks,

Trevor

On Thu, Jul 23, 2015 at 8:37 AM, Gareth Rushgrove <gar...@morethanseven.net>
wrote:

> On 23 July 2015 at 03:44, Kylo Ginsberg <k...@puppetlabs.com> wrote:
> > On Wed, Jul 22, 2015 at 1:37 PM, Branan Riley <bra...@puppetlabs.com>
> wrote:
> >>
> >> On Wed, Jul 22, 2015 at 12:44 PM Kylo Ginsberg <k...@puppetlabs.com>
> >> wrote:
> >>>
> >>> On Tue, Jul 21, 2015 at 12:40 PM, Branan Riley <bra...@puppetlabs.com>
> >>> wrote:
> >>>>
> >>>> <This is email 2/2 inspired by PUP-4813>
> >>>>
> >>>> With puppet 4, we made a change to how prefetch failures are handled
> >>>> by the transaction[1]. This is actually a pretty good change, as it
> >>>> prevents puppet from some nasty misbehaviors.
> >>>>
> >>>> I still don't think it's the "correct" behavior, though. Aborting the
> >>>> entire catalog because one resource type is broken doesn't seem like
> >>>> the best behavior, especially when it comes to reporting[2]
> >>>>
> >>>> I feel like the correct behavior is something like this:
> >>>>
> >>>> * If an exception escapes a prefetch, mark all resources of that type
> >>>>   in the catalog as failed and do not try to apply them
> >>>
> >>>
> >>> Background for a question:
> >>>
> >>> Wrt exceptions in prefetch, as of 4.0 there is a distinction between
> the
> >>> expected/supported exceptions (there are two: Puppet::MissingCommand
> and
> >>> LoadError) and everything else.
> >>>
> >>> I'm not sure this is *super* well documented, but it is mentioned as a
> >>> breaking change in the 4.0 release notes:
> >>>
> >>>
> >>>
> https://docs.puppetlabs.com/puppet/4.0/reference/release_notes.html#stuff-a-lot-of-people-will-notice
> >>>
> >>> and the rescue in question is here:
> >>>
> >>>
> >>>
> https://github.com/puppetlabs/puppet/blob/810624fe6f88d520339173c0e811c5ede4009fac/lib/puppet/transaction.rb#L322
> >>>
> >>> So with that in mind, is the proposal here about the handling of the
> two
> >>> supported exceptions or the handling of other exceptions?
> >>>
> >>>
> >>> For the two supported exceptions, this seems like a reasonable behavior
> >>> change and could also speed up catalog application in this failure
> case.
> >>>
> >>>
> >>> For *other* exceptions, I'm not sure. I walked away from PUP-3656
> >>> thinking that those two exceptions were now part of the contract we're
> >>> specifying for providers. If so, a provider that throws any *other*
> >>> exception has a bug which should be fixed, and we could be making
> matters
> >>> worse by trying to soldier on. I.e. this seems like a situation where
> it's
> >>> better to fail hard and the user should file a bug, upgrade/downgrade
> the
> >>> module, etc.
> >>
> >>
> >> I would say this should be for ALL exceptions. We absolutely should stop
> >> puppet from soldiering on in ways that could cause problems, but I don't
> >> think trying to apply other resource types in the catalog is one of
> those
> >> ways. Prefetch failure should be a localized catastrophe, not a global
> one.
> >>
> >> The move away from rescuing everything was definitely good, but aborting
> >> the entire transaction when only a single type is unprefetchable is a
> bit of
> >> an over-reaction. We know that instances of that type (really just ones
> of
> >> that type/provider pair) are bad, and we absolutely shouldn't apply
> them.
> >> The rest of the catalog is still fine, though.
> >>
> >> The two exceptions that are supported allow a Puppet run to install any
> >> packages/gems that a provider might need in order to prefetch. Skipping
> only
> >> the resources that would have been prefetched instead of the entire
> catalog
> >> serves that same purpose.
> >
> >
> > I can see this argument. I'm still concerned that this would mean we
> aren't
> > surfacing bugs that really should be exposed to the light of day. I
> suppose
> > we could report the one case with a Warning and the other with an Error.
> >
> > But assuming we go ahead and do something like this (I see I'm the only
> one
> > with the slightest reservation :)), I'm thinking we should audit how
> puppet
> > handles exceptions in providers during catalog application but *outside*
> > prefetch. It would be nice to be consistent throughout catalog
> application.
> >
> >>
> >> Aborting the entire catalog also makes remediation with Puppet (when
> >> puppet could be used for such) impossible, which seems super bad to me.
> >
> >
> > Is it impossible? Couldn't you remediate with a catalog that doesn't
> include
> > resources that will tickle prefetch from that provider? Not trying to
> > quibble (i.e. yes, that could still be a PITA), but making sure I'm not
> > missing something.
> >
> >>
> >>
> >>>
> >>> Btw (and perhaps I should have asked this from the get-go before
> opining
> >>> above): what was the exception in PUP-4813 that precipitated this
> thread?
> >>
> >>
> >> The bad catalog from that ticket set the `target` field of mount to a
> >> directory, which causes some sort of IO exception to be raised when
> >> ParsedFile tries to access that directory as a regular file.
> >
> >
> > Ooh fun. So we should fix that to be more robust. Separate ticket than
> this
> > thread though.
> >
> >>
> >>
> >> This is what inspired my bullet point below - ParsedFile should be smart
> >> enough to know which resources have a bad target and which are OK, and
> >> appropriately fail only the ones it can't work with.
> >>
> >>>>
> >>>>
> >>>> * Make it possible to fail resources from *within* a prefetch. This
> >>>>   lets a sufficiently smart prefetch handle the case when only some of
> >>>>   its resources are hosed (parsedfile resources with multiple targets,
> >>>>   for example, might be only partially unfetchable)
> >>>
> >>>
> >>> How would this work? A new hook for providers to implement? If so, it'd
> >>> be nice to flesh out a couple examples use cases so that we can think
> >>> through the API with something concrete.
> >>
> >>
> >> It's probably already "doable" by having prefetch mark the returned
> >> provider instances as bad in some way, and the provider implementation
> for
> >> flush could raise an exception. I'd prefer to see something at the
> >> transaction layer, though. I'll think through what this might look like
> >> more.
> >
> >
> > Well, definitely some charm to not adding new methods that we have to
> > sprinkle respond_to? tests around for. So the former approach might just
> > fly.
> >
> > I'd also like to think through whether there are additional motivating
> use
> > cases.
> >
>
> For some relevant colour we hit this problem hard with the AWS module.
> We're prefetching a lot of complex data in some cases and we're going
> it over the wire - so it going wrong is pretty likely.
>
> We worked around that at the time by creating our own Exception
> inherited from Exception which does get caught by Puppet:
>
>
> https://github.com/puppetlabs/puppetlabs-aws/blob/master/lib/puppet_x/puppetlabs/aws.rb#L7-L23
>
> And then raising that in cases of StandardError based exceptions being
> raised in prefetch.
>
>
> https://github.com/puppetlabs/puppetlabs-aws/blob/master/lib/puppet/provider/ec2_instance/v2.rb#L34-L36
>
> Some more details in this issue too about the changes in 4.0.
>
> https://jira.puppetlabs.com/browse/PUP-3656
>
> Gareth
>
> > Kylo
> >
> > --
> > Kylo Ginsberg | k...@puppetlabs.com | irc: kylo | twitter: @kylog
> >
> > PuppetConf 2015 is coming to Portland, Oregon! Join us October 5-9.
> > Register now to take advantage of the Early Bird discount —save $249!
> >
> > --
> > You received this message because you are subscribed to the Google Groups
> > "Puppet Developers" group.
> > To unsubscribe from this group and stop receiving emails from it, send an
> > email to puppet-dev+unsubscr...@googlegroups.com.
> > To view this discussion on the web visit
> >
> https://groups.google.com/d/msgid/puppet-dev/CALsUZFHyPv0mGjeE5Xhn-sv0WYXJ2QbDzY802sBe2JJQR%3DLT5g%40mail.gmail.com
> .
> >
> > For more options, visit https://groups.google.com/d/optout.
>
>
>
> --
> Gareth Rushgrove
> @garethr
>
> devopsweekly.com
> morethanseven.net
> garethrushgrove.com
>
> --
> You received this message because you are subscribed to the Google Groups
> "Puppet Developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to puppet-dev+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/puppet-dev/CAFi_6y%2BxV9Gk5f_C7S4%2BQ6Oxyk5JmdNsRY%3D2XchPTRqa29ywBQ%40mail.gmail.com
> .
> For more options, visit https://groups.google.com/d/optout.
>



-- 
Trevor Vaughan
Vice President, Onyx Point, Inc
(410) 541-6699

-- This account not approved for unencrypted proprietary information --

-- 
You received this message because you are subscribed to the Google Groups 
"Puppet Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to puppet-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/puppet-dev/CANs%2BFoVVkP7fo1hesBUn4TSijCqzrHZggfTOvgu5j6SV44DftQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [Puppet-dev] Handling failures in prefetch

Reply via email to