[Puppet-dev] Re: Rethinking file recursion

Brice Figureau Sat, 25 Apr 2009 08:25:33 -0700

Hi,

For whatever reasons, it appears I've never followed up this interesting 
conversation.


I was about to resurrect a patch I submitted around this timeframe to 
"compress" file path in File resources (I'll repost it soon, as I'd like 
it to be part of 0.25 beta cycle if possible), and remembered this topic.

On 20/03/09 1:28, Luke Kanies wrote:
> On Mar 19, 2009, at 5:09 PM, Brice Figureau wrote:
> 
>> On 19/03/09 22:50, Luke Kanies wrote:
>>> Hi all,
>>>
>>> I've been thinking a lot about file recursion and why it's so darn
>>> complicated, and I think one reason is the recursion happening in the
>>> same resource type doing the managing.  As a result, I've been
>>> thinking of moving the file recursion into a Fileset resource type.
>>>
>>> Currently, the file type generates new file resources during
>>> recursion; this basic model would be the same, except that the  
>>> fileset
>>> resource type would be generating files.
>> I've also been thinking a lot about local file recursion lately, but  
>> for
>> performance reasons.
>>
>> I understand your idea and what are the benefits of your proposals in
>> term of clarity, code concision and such.
>>
>> Right now, the main performance issue with local recursive file
>> resources is creating one newchild file resource per managed sub file,
>> which in turn will be managed by the system.
>> Ruby seems particularly slow at creating tons of objects, and it uses
>> memory for something that is at really transient.
>>
>> My idea on the subject (but I didn't research if that's doable) was  
>> that
>> we don't really need to create those objects, if we consider a  
>> recursive
>> resource as an "opaque" system which manages its own sub-resources
>> itself. This behavior could be supported by the puppet ral system by
>> defining a kind of recursive manager system that could offer
>> programmatic resource management instead of being object based.
>> I'm not sure I'm prefectly clear, it's late here and my brain need  
>> some
>> rest :-)
>>
>> I think this violates the current puppet contract, but I'm sure we  
>> could
>> implement the recursive behavior outside of the file resource while
>> still being able to manage sub-resource procedurally instead of having
>> to generate them.
>>
>> Maybe that's what you are proposing (still late here).
>> If not, please try to think about it and see if that could make sense.
> 
> It's not actually what I'm proposing - I'd say it's a parallel and  
> possibly competing proposal.
> 
> I've been thinking about something similar.  I think there are at  
> least three ways one could do what you're asking (in inverse order of  
> overhead).  I'm describing them here with simple names so it's easier  
> to refer back to them; the names aren't perfect, but hopefully they'll  
> do.
> 
> 1) Transient resources: Continue to create the resources but create  
> and destroy them one at a time
> 
> 2) Set resources: Use a single recursive operation that somehow  
> manages to retain transactional integrity
> 
> 3) Set operations: Perform a recursive operation that loses  
> transactional integrity
> 
> I think you're essentially proposing something like #3.

Hum, I wasn't really clear in my previous e-mail, but in fact I was 
thinking more about something like your #2. Ie File resource could get 
support from a recursive meta-resource to actually "perform" the recursion.

> I'll provide some more detail on each, but there are a couple of  
> points of complexity that are worth noting.  In particular, the choice  
> here has a significant affect on logging and events.  You can actually  
> think of logs and events as isomorphic, and they're only going to get  
> moreso:  I hope by 0.26 or so all transaction logs are actually  
> generated by events.
> 
> Obviously logging is critical so you know what's happening on your  
> system.  Events are critical so that you can react to those changes.   
> E.g., if you need to restart a service if any file in a fileset  
> changes, then an event from a file deep in the hierarchy needs to be  
> routed appropriately and then it needs to be able to be logged as  
> being from that location.

Of course, it makes perfect sense.

> We solve that right now using proxy resources - the recursing resource  
> is the proxy for the event-generating resource.  This will likely work  
> for any of the other solutions, too, but it's worth thinking about  
> here because I've found it to be the major source of complexity, and I  
> think that would continue.

OK.

> So, on to descriptions:
> 
> Transient resources:
> 
> Currently, we create the whole list of resources, add them to the  
> graph, and then iterate over them.  Instead, we could essentially  
> process them one at a time and then discard them.  Our current method  
> (loosely) is:
> 
>    file.eval_generate.each { |resource| add_to_catalog(resource) and  
> eval_resource(resource) }
> 
> Instead, it would become:
> 
>    file.eval_generate { |resource| eval_resource(resource) }
> 
> Ridiculous pseudo-code, obviously, but hopefully you get the idea.   
> The optimization here is that 1) we aren't adding to the catalog and  
> 2) we aren't building a list at all.

If that's the low hanging fruit, which would add the lowest resource 
comsumption, then I'm all for it for 0.25.
The new more generic mechanism (ie #2) can still be implemented for 0.26 
(or any other subsequent release).

> Set resources:
> 
> We could have some kind of resource that didn't use instance variables  
> for any of the values or comparisons, such that a single resource  
> instance could be used to do all of the operations.  The main thing is  
> that it kicks out change instances for each of the things that needs  
> to be done, with all of the appropriate information for logging and  
> events.
> 
> We're still paying a per-change overhead, but I don't think you can  
> get away from that and retain transactional integrity.
> 
> Set operations:
> 
> This is pretty much just a big, painful chmod -R.  This would be a bit  
> more difficult because we'd have to skip any resources that are  
> managed anywhere else.
> 
> In writing this description, I think the second option is the best,  
> even though I was leaning toward the first, initially.  I think it's  
> doable - the big limitation right now is the use of instance variables  
> in files.  If 'should' weren't set anywhere, then you'd be passing it  
> in each time, and if you're passing it in, then you could use the same  
> instance for every file you needed to operate on.

That's plain true. I vote for this!

> One of the big changes we did (but no one noticed) in the fall of '07  
> was that we switched 'is' from being an instance variable to being  
> transient, only maintained by the transaction.  I've always wanted to  
> switch 'should', too, but I haven't known how.
> 
> If we were to combine this with my goal of splitting resource types  
> into a Resource class (which already exists in 0.25) and a  
> ResourceType class (whose instances will model individual resource  
> types), then these resource types could be written to operate with no  
> instance variables, essentially.  This would, I think, enable the set  
> resources pretty easily.  (I suppose I should open this as a ticket,  
> so people know wtf I'm talking about.)
> 
> Well, 'easily' once you refactored the RAL entirely and broke backward  
> compatibility.

Yes, that's the difficult task, I do agree.

Now, if we can move forward to at least have #1 in 0.25, that'd be über 
great, because it'd solve the one of the biggest performance (ie memory) 
issue in puppetd while managing files (issue I tried to overcome with 
#1469 or the soon-resurrected path compression stuff).
People, myself included, find natural to use recursive file resources on 
large trees just to chmod/chown, and we all expect those to work as fast 
as chmod/chown, so it is not imaginable to see puppetd failing at this 
simple task. And you can be sure, that's almost the first thing a puppet 
newcomer will use, so imagine her reaction seeing the tool not succeeding...

/me preparing the pathcomp patch for a new review.
-- 
Brice Figureau
http://www.masterzen.fr/


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Puppet Developers" group.
To post to this group, send email to puppet-dev@googlegroups.com
To unsubscribe from this group, send email to 
puppet-dev+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/puppet-dev?hl=en
-~----------~----~----~----~------~----~------~--~---

[Puppet-dev] Re: Rethinking file recursion

Reply via email to