Just out of curiosity, how difficult would it be to defer the resource chain until the end of the catalog if a prefetch fails?
It may not work in all cases, but it's certainly feasable that the prefetch is failing because of some other item in the catalog that hasn't been applied and, given the complexity of the system, the amount of effort that it would involve to proper chain all resources just isn't worth the cost of doing two runs to fix the issue. I'm particularly thinking of things like package repositories here where you really don't want to artificially bind all packages to your repositories for no good reason and/or you may be pulling data from a different source that providers are going to rely on (RH Satellite, Pulp, something else?). Since it's a prefetch, and nothing has yet changed on the system, could that execution chain just be deferred until just before the post-execution section of the catalog? If this were possible, the impact of a catalog failure would, in theory, be minimized with everything that could be applied happening before those items that could not. Even then, I think that all resource failures should be localized to the resource, and resource chain, to which they apply. The *best* thing about Puppet is the fact that it does as much as it can on every run. That really shouldn't be broken in my opinion. My classic example is "Puppet applies my security settings across the system, regardless of Apache failing to install or configure". In this case, Puppet should apply my security settings regardless of some random resource having a prefetch failure. Circling this all the way around to my favorite pet bug that I never have time to work on PUP-4002, in the case where prefetch fails, but the provider (or a dependent of the provider) has a way to handle this later, recording it for all instance of the resource to query later would be a nice to have. I'm thinking of the 'cron' type when I write this since the prefetch phase of cron appears to fail if the crontab file for a user does not exist. But, it doesn't matter in the least because the provider merrily carries on and does what it is supposed to do after raising an alarming, but harmless, error message. (Note: I haven't checked the 4.X behavior on this yet). Thanks, Trevor On Thu, Jul 23, 2015 at 8:37 AM, Gareth Rushgrove <gar...@morethanseven.net> wrote: > On 23 July 2015 at 03:44, Kylo Ginsberg <k...@puppetlabs.com> wrote: > > On Wed, Jul 22, 2015 at 1:37 PM, Branan Riley <bra...@puppetlabs.com> > wrote: > >> > >> On Wed, Jul 22, 2015 at 12:44 PM Kylo Ginsberg <k...@puppetlabs.com> > >> wrote: > >>> > >>> On Tue, Jul 21, 2015 at 12:40 PM, Branan Riley <bra...@puppetlabs.com> > >>> wrote: > >>>> > >>>> <This is email 2/2 inspired by PUP-4813> > >>>> > >>>> With puppet 4, we made a change to how prefetch failures are handled > >>>> by the transaction[1]. This is actually a pretty good change, as it > >>>> prevents puppet from some nasty misbehaviors. > >>>> > >>>> I still don't think it's the "correct" behavior, though. Aborting the > >>>> entire catalog because one resource type is broken doesn't seem like > >>>> the best behavior, especially when it comes to reporting[2] > >>>> > >>>> I feel like the correct behavior is something like this: > >>>> > >>>> * If an exception escapes a prefetch, mark all resources of that type > >>>> in the catalog as failed and do not try to apply them > >>> > >>> > >>> Background for a question: > >>> > >>> Wrt exceptions in prefetch, as of 4.0 there is a distinction between > the > >>> expected/supported exceptions (there are two: Puppet::MissingCommand > and > >>> LoadError) and everything else. > >>> > >>> I'm not sure this is *super* well documented, but it is mentioned as a > >>> breaking change in the 4.0 release notes: > >>> > >>> > >>> > https://docs.puppetlabs.com/puppet/4.0/reference/release_notes.html#stuff-a-lot-of-people-will-notice > >>> > >>> and the rescue in question is here: > >>> > >>> > >>> > https://github.com/puppetlabs/puppet/blob/810624fe6f88d520339173c0e811c5ede4009fac/lib/puppet/transaction.rb#L322 > >>> > >>> So with that in mind, is the proposal here about the handling of the > two > >>> supported exceptions or the handling of other exceptions? > >>> > >>> > >>> For the two supported exceptions, this seems like a reasonable behavior > >>> change and could also speed up catalog application in this failure > case. > >>> > >>> > >>> For *other* exceptions, I'm not sure. I walked away from PUP-3656 > >>> thinking that those two exceptions were now part of the contract we're > >>> specifying for providers. If so, a provider that throws any *other* > >>> exception has a bug which should be fixed, and we could be making > matters > >>> worse by trying to soldier on. I.e. this seems like a situation where > it's > >>> better to fail hard and the user should file a bug, upgrade/downgrade > the > >>> module, etc. > >> > >> > >> I would say this should be for ALL exceptions. We absolutely should stop > >> puppet from soldiering on in ways that could cause problems, but I don't > >> think trying to apply other resource types in the catalog is one of > those > >> ways. Prefetch failure should be a localized catastrophe, not a global > one. > >> > >> The move away from rescuing everything was definitely good, but aborting > >> the entire transaction when only a single type is unprefetchable is a > bit of > >> an over-reaction. We know that instances of that type (really just ones > of > >> that type/provider pair) are bad, and we absolutely shouldn't apply > them. > >> The rest of the catalog is still fine, though. > >> > >> The two exceptions that are supported allow a Puppet run to install any > >> packages/gems that a provider might need in order to prefetch. Skipping > only > >> the resources that would have been prefetched instead of the entire > catalog > >> serves that same purpose. > > > > > > I can see this argument. I'm still concerned that this would mean we > aren't > > surfacing bugs that really should be exposed to the light of day. I > suppose > > we could report the one case with a Warning and the other with an Error. > > > > But assuming we go ahead and do something like this (I see I'm the only > one > > with the slightest reservation :)), I'm thinking we should audit how > puppet > > handles exceptions in providers during catalog application but *outside* > > prefetch. It would be nice to be consistent throughout catalog > application. > > > >> > >> Aborting the entire catalog also makes remediation with Puppet (when > >> puppet could be used for such) impossible, which seems super bad to me. > > > > > > Is it impossible? Couldn't you remediate with a catalog that doesn't > include > > resources that will tickle prefetch from that provider? Not trying to > > quibble (i.e. yes, that could still be a PITA), but making sure I'm not > > missing something. > > > >> > >> > >>> > >>> Btw (and perhaps I should have asked this from the get-go before > opining > >>> above): what was the exception in PUP-4813 that precipitated this > thread? > >> > >> > >> The bad catalog from that ticket set the `target` field of mount to a > >> directory, which causes some sort of IO exception to be raised when > >> ParsedFile tries to access that directory as a regular file. > > > > > > Ooh fun. So we should fix that to be more robust. Separate ticket than > this > > thread though. > > > >> > >> > >> This is what inspired my bullet point below - ParsedFile should be smart > >> enough to know which resources have a bad target and which are OK, and > >> appropriately fail only the ones it can't work with. > >> > >>>> > >>>> > >>>> * Make it possible to fail resources from *within* a prefetch. This > >>>> lets a sufficiently smart prefetch handle the case when only some of > >>>> its resources are hosed (parsedfile resources with multiple targets, > >>>> for example, might be only partially unfetchable) > >>> > >>> > >>> How would this work? A new hook for providers to implement? If so, it'd > >>> be nice to flesh out a couple examples use cases so that we can think > >>> through the API with something concrete. > >> > >> > >> It's probably already "doable" by having prefetch mark the returned > >> provider instances as bad in some way, and the provider implementation > for > >> flush could raise an exception. I'd prefer to see something at the > >> transaction layer, though. I'll think through what this might look like > >> more. > > > > > > Well, definitely some charm to not adding new methods that we have to > > sprinkle respond_to? tests around for. So the former approach might just > > fly. > > > > I'd also like to think through whether there are additional motivating > use > > cases. > > > > For some relevant colour we hit this problem hard with the AWS module. > We're prefetching a lot of complex data in some cases and we're going > it over the wire - so it going wrong is pretty likely. > > We worked around that at the time by creating our own Exception > inherited from Exception which does get caught by Puppet: > > > https://github.com/puppetlabs/puppetlabs-aws/blob/master/lib/puppet_x/puppetlabs/aws.rb#L7-L23 > > And then raising that in cases of StandardError based exceptions being > raised in prefetch. > > > https://github.com/puppetlabs/puppetlabs-aws/blob/master/lib/puppet/provider/ec2_instance/v2.rb#L34-L36 > > Some more details in this issue too about the changes in 4.0. > > https://jira.puppetlabs.com/browse/PUP-3656 > > Gareth > > > Kylo > > > > -- > > Kylo Ginsberg | k...@puppetlabs.com | irc: kylo | twitter: @kylog > > > > PuppetConf 2015 is coming to Portland, Oregon! Join us October 5-9. > > Register now to take advantage of the Early Bird discount —save $249! > > > > -- > > You received this message because you are subscribed to the Google Groups > > "Puppet Developers" group. > > To unsubscribe from this group and stop receiving emails from it, send an > > email to puppet-dev+unsubscr...@googlegroups.com. > > To view this discussion on the web visit > > > https://groups.google.com/d/msgid/puppet-dev/CALsUZFHyPv0mGjeE5Xhn-sv0WYXJ2QbDzY802sBe2JJQR%3DLT5g%40mail.gmail.com > . > > > > For more options, visit https://groups.google.com/d/optout. > > > > -- > Gareth Rushgrove > @garethr > > devopsweekly.com > morethanseven.net > garethrushgrove.com > > -- > You received this message because you are subscribed to the Google Groups > "Puppet Developers" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to puppet-dev+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/puppet-dev/CAFi_6y%2BxV9Gk5f_C7S4%2BQ6Oxyk5JmdNsRY%3D2XchPTRqa29ywBQ%40mail.gmail.com > . > For more options, visit https://groups.google.com/d/optout. > -- Trevor Vaughan Vice President, Onyx Point, Inc (410) 541-6699 -- This account not approved for unencrypted proprietary information -- -- You received this message because you are subscribed to the Google Groups "Puppet Developers" group. To unsubscribe from this group and stop receiving emails from it, send an email to puppet-dev+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/puppet-dev/CANs%2BFoVVkP7fo1hesBUn4TSijCqzrHZggfTOvgu5j6SV44DftQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.