[Puppet-dev] Re: Rethinking file recursion

Luke Kanies Sat, 25 Apr 2009 17:24:46 -0700

On Apr 25, 2009, at 4:25 PM, Brice Figureau wrote:

>
> Hi,
>
> For whatever reasons, it appears I've never followed up this  
> interesting
> conversation.
>
> I was about to resurrect a patch I submitted around this timeframe to
> "compress" file path in File resources (I'll repost it soon, as I'd  
> like
> it to be part of 0.25 beta cycle if possible), and remembered this  
> topic.
>
> On 20/03/09 1:28, Luke Kanies wrote:
>> On Mar 19, 2009, at 5:09 PM, Brice Figureau wrote:
>>
>>> On 19/03/09 22:50, Luke Kanies wrote:
>>>> Hi all,
>>>>
>>>> I've been thinking a lot about file recursion and why it's so darn
>>>> complicated, and I think one reason is the recursion happening in  
>>>> the
>>>> same resource type doing the managing.  As a result, I've been
>>>> thinking of moving the file recursion into a Fileset resource type.
>>>>
>>>> Currently, the file type generates new file resources during
>>>> recursion; this basic model would be the same, except that the
>>>> fileset
>>>> resource type would be generating files.
>>> I've also been thinking a lot about local file recursion lately, but
>>> for
>>> performance reasons.
>>>
>>> I understand your idea and what are the benefits of your proposals  
>>> in
>>> term of clarity, code concision and such.
>>>
>>> Right now, the main performance issue with local recursive file
>>> resources is creating one newchild file resource per managed sub  
>>> file,
>>> which in turn will be managed by the system.
>>> Ruby seems particularly slow at creating tons of objects, and it  
>>> uses
>>> memory for something that is at really transient.
>>>
>>> My idea on the subject (but I didn't research if that's doable) was
>>> that
>>> we don't really need to create those objects, if we consider a
>>> recursive
>>> resource as an "opaque" system which manages its own sub-resources
>>> itself. This behavior could be supported by the puppet ral system by
>>> defining a kind of recursive manager system that could offer
>>> programmatic resource management instead of being object based.
>>> I'm not sure I'm prefectly clear, it's late here and my brain need
>>> some
>>> rest :-)
>>>
>>> I think this violates the current puppet contract, but I'm sure we
>>> could
>>> implement the recursive behavior outside of the file resource while
>>> still being able to manage sub-resource procedurally instead of  
>>> having
>>> to generate them.
>>>
>>> Maybe that's what you are proposing (still late here).
>>> If not, please try to think about it and see if that could make  
>>> sense.
>>
>> It's not actually what I'm proposing - I'd say it's a parallel and
>> possibly competing proposal.
>>
>> I've been thinking about something similar.  I think there are at
>> least three ways one could do what you're asking (in inverse order of
>> overhead).  I'm describing them here with simple names so it's easier
>> to refer back to them; the names aren't perfect, but hopefully  
>> they'll
>> do.
>>
>> 1) Transient resources: Continue to create the resources but create
>> and destroy them one at a time
>>
>> 2) Set resources: Use a single recursive operation that somehow
>> manages to retain transactional integrity
>>
>> 3) Set operations: Perform a recursive operation that loses
>> transactional integrity
>>
>> I think you're essentially proposing something like #3.
>
> Hum, I wasn't really clear in my previous e-mail, but in fact I was
> thinking more about something like your #2. Ie File resource could get
> support from a recursive meta-resource to actually "perform" the  
> recursion.


Ok.

>
>> I'll provide some more detail on each, but there are a couple of
>> points of complexity that are worth noting.  In particular, the  
>> choice
>> here has a significant affect on logging and events.  You can  
>> actually
>> think of logs and events as isomorphic, and they're only going to get
>> moreso:  I hope by 0.26 or so all transaction logs are actually
>> generated by events.
>>
>> Obviously logging is critical so you know what's happening on your
>> system.  Events are critical so that you can react to those changes.
>> E.g., if you need to restart a service if any file in a fileset
>> changes, then an event from a file deep in the hierarchy needs to be
>> routed appropriately and then it needs to be able to be logged as
>> being from that location.
>
> Of course, it makes perfect sense.
>
>> We solve that right now using proxy resources - the recursing  
>> resource
>> is the proxy for the event-generating resource.  This will likely  
>> work
>> for any of the other solutions, too, but it's worth thinking about
>> here because I've found it to be the major source of complexity,  
>> and I
>> think that would continue.
>
> OK.
>
>> So, on to descriptions:
>>
>> Transient resources:
>>
>> Currently, we create the whole list of resources, add them to the
>> graph, and then iterate over them.  Instead, we could essentially
>> process them one at a time and then discard them.  Our current method
>> (loosely) is:
>>
>>   file.eval_generate.each { |resource| add_to_catalog(resource) and
>> eval_resource(resource) }
>>
>> Instead, it would become:
>>
>>   file.eval_generate { |resource| eval_resource(resource) }
>>
>> Ridiculous pseudo-code, obviously, but hopefully you get the idea.
>> The optimization here is that 1) we aren't adding to the catalog and
>> 2) we aren't building a list at all.
>
> If that's the low hanging fruit, which would add the lowest resource
> comsumption, then I'm all for it for 0.25.
> The new more generic mechanism (ie #2) can still be implemented for  
> 0.26
> (or any other subsequent release).

I'm basically unwilling to add any new functionality to 0.25, unless  
the code is already ready and largely trivial.  I can't think of  
anything that's so critical that it's worth delaying the release.

Plus, if we've got multiple ideas for performance gains, let's  
separate them into different releases so people see multiple stages  
instead of one big gain. :)

>
>> Set resources:
>>
>> We could have some kind of resource that didn't use instance  
>> variables
>> for any of the values or comparisons, such that a single resource
>> instance could be used to do all of the operations.  The main thing  
>> is
>> that it kicks out change instances for each of the things that needs
>> to be done, with all of the appropriate information for logging and
>> events.
>>
>> We're still paying a per-change overhead, but I don't think you can
>> get away from that and retain transactional integrity.
>>
>> Set operations:
>>
>> This is pretty much just a big, painful chmod -R.  This would be a  
>> bit
>> more difficult because we'd have to skip any resources that are
>> managed anywhere else.
>>
>> In writing this description, I think the second option is the best,
>> even though I was leaning toward the first, initially.  I think it's
>> doable - the big limitation right now is the use of instance  
>> variables
>> in files.  If 'should' weren't set anywhere, then you'd be passing it
>> in each time, and if you're passing it in, then you could use the  
>> same
>> instance for every file you needed to operate on.
>
> That's plain true. I vote for this!
>
>> One of the big changes we did (but no one noticed) in the fall of '07
>> was that we switched 'is' from being an instance variable to being
>> transient, only maintained by the transaction.  I've always wanted to
>> switch 'should', too, but I haven't known how.
>>
>> If we were to combine this with my goal of splitting resource types
>> into a Resource class (which already exists in 0.25) and a
>> ResourceType class (whose instances will model individual resource
>> types), then these resource types could be written to operate with no
>> instance variables, essentially.  This would, I think, enable the set
>> resources pretty easily.  (I suppose I should open this as a ticket,
>> so people know wtf I'm talking about.)
>>
>> Well, 'easily' once you refactored the RAL entirely and broke  
>> backward
>> compatibility.
>
> Yes, that's the difficult task, I do agree.
>
> Now, if we can move forward to at least have #1 in 0.25, that'd be  
> über
> great, because it'd solve the one of the biggest performance (ie  
> memory)
> issue in puppetd while managing files (issue I tried to overcome with
> #1469 or the soon-resurrected path compression stuff).
> People, myself included, find natural to use recursive file  
> resources on
> large trees just to chmod/chown, and we all expect those to work as  
> fast
> as chmod/chown, so it is not imaginable to see puppetd failing at this
> simple task. And you can be sure, that's almost the first thing a  
> puppet
> newcomer will use, so imagine her reaction seeing the tool not  
> succeeding...

Like I said, I really don't want to try to fit this into 0.25.  I  
believe James is essentially ready to announce 0.25b1, with the only  
major known issue being the lack of rack support.  There are open  
tickets, but none look too difficult to fix.

>
> /me preparing the pathcomp patch for a new review.
> -- 
> Brice Figureau
> http://www.masterzen.fr/
>
>
> >


-- 
The hypothalamus is one of the most important parts of the brain,
involved in many kinds of motivation, among other functions. The
hypothalamus controls the "Four F's": 1. fighting; 2. fleeing;
3. feeding; and 4. mating.
     -- Psychology professor in neuropsychology intro course
---------------------------------------------------------------------
Luke Kanies | http://reductivelabs.com | http://madstop.com


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Puppet Developers" group.
To post to this group, send email to puppet-dev@googlegroups.com
To unsubscribe from this group, send email to 
puppet-dev+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/puppet-dev?hl=en
-~----------~----~----~----~------~----~------~--~---

[Puppet-dev] Re: Rethinking file recursion

Reply via email to