[Puppet-dev] A more ideal language (was Re: Classes vs. definitions (#1645))

Luke Kanies Tue, 04 Nov 2008 12:31:59 -0800

On Oct 31, 2008, at 10:33 AM, jerico wrote:
> If you're interested in a more "abstract" discussion of my "ideal"
> definition language for a configuration tool, then you may read on. I
> think I gave answers to all of your more specific questions here.


I definitely am.  I'm going to trim as much as possible, though, to  
make the conversation manageable.

>
> You have been warned though! The remainder is more of an article than
> of a post. And I won't consider backwards compatibility from now on as
> the above practical solution does a good job on that I think.
>
> Anyway: I think that being fully aware of what "would be nice" if
> there wasn't backwards compatibility is wonderful for you as a
> language designer because it gives you direction in implementation
> details even if you cannot implement the "pure" solution for practical
> reasons. It also helps users like myself to program into the language
> rather than programming in the language (see the book "Code Complete"
> again for that distinction) thereby avoiding maintenance problems.
> Both are real practical and money-worthy advantages although based on
> "conceptual" rather than "implementation" knowledge. I am a
> practitioner and not a scientist!!

I don't think backward compatibility is always required, as long-time  
members would attest, but I strive for it when possible, and, at the  
least, always do my best to be clear when it's absent.

>
>> Part of the problem is that Puppet's language isn't a real
>> "programming" language
>
> I don't agree on that. It might be a specially declarative and less
> procedural/imperative language and is probably not turing complete (no
> idea about that, I didn't try the proof!) But maybe this is just an
> argument about wording and therefore not so important.

I agree it's mostly a terminology point, but it's definitely not  
Turing-complete, although Brice might have slipped that in with his  
last series of updates. :)

>
>> But the point remains:   A Puppet class is more of a service class
>> than something like an OO class; it can also be thought of as the  
>> code
>> that specifies a class of machines -- that is, if you've got a
>> sendmail-class server, then you get sendmail-related resources.
>
> I agree with you that it should be the purpose of puppet's language to
> describe bundles of configured resources. But that's already it, no
> further distinction necessary.
>
> The rest is cfengine heritage which I do not consider valuable when
> defining the "ideal" language based on design best-practices. It
> certainly is very valuable for cfengine adepts who "upgrade" to
> puppet. But that's not the point here (see the practical solution
> above for cfengine-think compatible changes).

I disagree with this -- you're coming at it from the implementation,  
but I'm thinking about it from the perspective of intent.  If  
anything, I had to completely invert Cfengine's idea of a class, and  
every Cfengine refugee we get has to go through an adoption period  
where they rewire their brains.

No one starts with code, even when they think they do -- you always  
start with intent.  Why are you building this node?  What services  
should it offer?  Why?

To me, this is how you should be able to think about Puppet code:  
"This node exists because it's going to be a web server.  Web servers  
run Apache and MySQL, they need perl installed.  It's also on my  
network, and every node on my network is a dns client and has sysadmin  
users installed on it.  The machine is in my Nashville datacenter, and  
it should get a router appropriate to that datacenter."

To me, the term 'class' is a convenient way to describe the  
configuration associated with a given intent.  "Web server" means one  
list of resources, "dns client" means another, etc.

So, that's from the direction of specification; now let's talk about  
data production.  When a client produces a log message, that message  
requieres context to be useful.  If it fails to install a package, we  
need to know why (again, the intent) that package was needed, so we  
can then determine what services would be affected.  We can also  
relate that failure back to code changes, maybe, so it's easier to  
resolve the problem network-wide.

>
> What you probably want to say is that cfengine or puppet classes are
> somehow cross-cutting aspects of nodes (in AOP words) while
> definitions are not. But what's the real purpose of AOP? It's just
> introducing or modifying procedural methods in bulk to avoid code
> duplication. This does not make sense in a language without a
> procedural element! My hypothesis is: Any non-procedural language can
> be considered class- and aspect-oriented at the same time. If I
> introduce a puppet definition into a puppet class then you introduce
> some powerful "aspect" into this class using a very concise language
> without any code duplication. You could always factor out repetitive
> code into puppet classes /or/ definitions without any problems even
> when this code will be used in completely different contexts later. So
> no required distinction here either.

I don't think I really understand what you're saying here.  Note that  
I'm entirely self-trained, so a lot of what you're talking about goes  
right over my head.  I read a lot of books, but I don't always absorb  
or agree.

> To fully "disillusion" the cfengine heritage: To me a "node" is a
> resource as well, I don't see any required semantic difference here to
> other resources like a web server software or a firewall. I could also
> imagine having higher level clusters of nodes being treated as a
> "resource bundle".

I basically agree, with one significant distinction -- nodes have  
automatic entry points (i.e., the parser accepts a request from a host  
and uses the host's name to look up a node instance).  Neither classes  
nor definitions have automatic entry.

Otherwise, I agree, and the lack of meaningful distinction is one  
thing that led me to want to push nodes outside of Puppet.

> Why not define a class of nodes with instantiation
> parameters and thereby handle a cluster of machines as "one thing"? If
> we had a construct like that I could probably eliminate another bunch
> of global variables and redundancy in my puppet scripts.

This is more of a grouping problem in general -- Puppet can't really  
talk about anything that's cross-host, and it's a significant  
problem.  I think of this as the 'me/it' problem -- Puppet can only  
talk about 'me' (what services should I be running?) rather than  
'it' (what services should this host or that host be running?).  Been  
a while since I thought about this aspect, but it's definitely there  
and definitely a problem.

How does your 'class of nodes' idea change this?

>
> IMO it is completely irrelevant whether you add a resource (node) to a
> class (in cfengine parlor) or whether a class bundles resources as
> puppet's definitions do. If you agree that a node is a resource and
> that aspects and classes are essentially the same in a declarative
> language then what's the difference??? You end up with classified (or
> bundled) resources, don't you? Just a chicken or egg problem IMO.

I don't really agree that classes and aspects are essentially the  
same.  In fact, if an aspect is a cross-cutting concern, then Puppet's  
classes are actually explicitly not aspects -- they're completely  
exclusive.  No two classes can overlap at all.

IMO, one of the significant weaknesses of Puppet's language is its  
completely lack of ability to specify cross-cutting ideas.  The  
closest you get is global variables in the top scope.

For instance, if you want to say "All Debian hosts should default to  
'aptitude' for package provider", your only real choice is to put that  
in your site.pp file.  There's no other way to guarantee all of your  
classes will get it.  There's no 'class' construct you can use that  
will intersect all classes.

>
> The same applies to puppet types. They just abstract away some OS
> level resources. To me types are not necessarily semantically
> different from classes, nodes or definitions.

You're kind of losing me here.  Resource types theoretically model  
some clear resource on the system; if you lose that, you lose a lot,  
IMO.

>
> So even the node and type constructs become unnecessary if you think
> like that. You could express everything you have today in puppet with
> resource bundles that take instantiation parameters, allow inheritance
> and maybe define public attributes. I bet I could reformulate
> everything I can do in puppet today with a simplistic language like
> that.

I can't disagree with that, but my question would be, would you have a  
"better" language if you did this?  And it must be said, "better" here  
is defined as "better for its audience".  Given that we're not talking  
to declarative programmers with CS degrees and 5 years of programming,  
but rather are talking to jack-of-all-trades sysadmins, the answer  
varies a bit.

And even if you are talking to the programmers (many of whom do use  
Puppet), I think a slightly more complex, more explicit language is  
often actually a better idea.  Every problem can be expressed in LISP,  
but that doesn't make LISP the best way to solve every problem.  From  
what I understand of LISP culture, their whole point is to build a DSL  
for every problem and then live there.  My goal here is to build a DSL  
that provides the most readable and maintainable way of specifying  
infrastructure; I'm less concerned with having the simplest language.

>
> Here another "disillusionment", this time concerning singletons and
> prototypes: A singleton is always only a singleton within a certain
> context. So in Java or C++ you might do some tricks to define a
> singleton. But then you end up with a singleton in the context of a
> class loader (Java) or process (C++)! As soon as you work with
> application server clusters (Java) or several processes in parallel  
> (C+
> +), you have to synchronize all those "singletons" or invent
> singletons on cluster level (what some application servers and
> parallel processing frameworks effectively do). Think cloud computing
> and you're at the next level of abstraction, and so on...

This is clearly the case -- a given resource must be unique within a  
host's catalog, but not usually unique within a network.

>
> The exact same applies to puppet singletons. Puppet classes may be
> singletons on a single node but yes they are prototypes if you
> consider a cluster of nodes as an entity in its own right. The fact
> that you can safely and implicitly consider classes singletons in
> puppet is because you are running puppetd on a node. But what if you
> wanted to cluster nodes one day as one "thing" to avoid the current
> duplication of node-specific code in puppet? Oups! You'd have to
> introduce ... named puppet class instances.

Can you elaborate on this?

>
> IMO a resource can only be a singleton in the context of a bundle. Not
> more not less.

I agree, and that bundle is currently named a 'catalog' in Puppet.   
0.25 finally makes this Catalog class the arbiter of singleton-hood.

>
> So once we see that we really only have to describe bundled resources
> then we can do with one single syntactical construct. Let's call it a
> "bundle" for now to avoid all mis-interpretations in the sense of OO
> concepts or functional languages or cfengine heritage.

Urgh, I hate the term, but I'll follow along for now.

>
> Here a translation table for cfengine and puppet semantics into
> "bundle semantics":
>
> cfengine input files (bundle) group cfengine action instances
> (resources)
> cfengine classes (bundle) group hosts (resources)
> puppet nodes (bundle) group puppet classes and definitions (resources)
> puppet classes (bundle) group puppet types, classes and definitions
> (resources)
> puppet definitions (bundle) group puppet types, classes and
> definitions (resources)
> puppet types (bundle) group OS configuration files, services,
> packages, etc. (resources)
>
> Imagine what such a simplification would mean for documentation and
> parser implementation. You'd probably be able to through away large
> junks of specialised node, class, definition and type code/
> documentation. And doing the same thing with less code is always a
> good thing. (Wasn't it Mark Twain who apologized that he couldn't
> write a shorter text because he didn't have the time to do so? I
> apologize too...)

As discussed above, I don't always agree.  And you'd be surprised how  
little extra code there is right now -- Node and Class are subclasses  
of Definition, and it's not that complicated.

>
> But yes: You cannot do this in a backward compatible way.
>
> Based on this long "foreword" I can quickly answer your remaining
> questions:
>
>> How would you instantiate a class?
>> And what about singleton classes?  Would you support classes without
>> instance names, and if so, what would that syntax look like?
>> How would [the] distinction [between singletons and prototypes] be  
>> made?
>
> If we re-interprete a class as a bundle then we get:
>
> bundle-name { par1 => ..., par2 => ..., ... }
>
> This looks exactly as an instance of today's definition without an
> instance name. The instance name is however implicitly defined and
> defaults to the containing bundle's name.

I actually started out with exactly this syntax, and almost all of the  
behaviour you're talking about.  I ended up removing it in the end,  
but I didn't replace the functionality as I probably should have.

I don't like the idea of declaring class membership via the same  
syntax for specifying resources, partially because I think most people  
will find that the attributes that are consumed by a class are often  
specified in a different location than the code that specifies class  
membership.

Take a dns client -- you'd probably have a 'base' class or node that  
said every node is a dns client, and your dns client class would  
require something like a resolver list and a search path.  These  
values would be filled in by some Datacenter or Location or whatever  
class.

In the current system, this Location class would need to have a  
heirarchical relationship to the base class, in order to override as  
necessary. However, you'd soon find yourself with 50 subclasses of the  
base class, one for each aspect that needed to specify overrides.

Thus, I think any attempt at solving the problems you're describing  
should attempt to solve this problem, too, and really, I think this is  
a greater problem, because it speaks of something you just can't do  
well right now, whereas you're mostly focused on something that's  
straightforward but moderately ugly.

I've been thinking of something like an aspect syntax, which could  
have maybe attributes and defaults, but no resources.  This could then  
be included/imported/whatever into any class, such that it could then  
configure that class.  You might do something like this:

class dnsclient($resolvers,$domain) {
   ...
}

aspect location {
   $resolvers = "..."
   $domain = "..."
}

aspect location::nashville inherits location {
   $resolvers = "something else"
   $domain = "other"
}

class base {
   include dnsclient
}

node mynode {
   include base
   acquire location::nashville
}

Then you just need some kind of validator to make sure that every  
class has all of its parameters provided.  It might also be a good  
idea to differentiate between aspect attributes, which would be  
imported when an aspect is 'acquired', and variables, which would not  
be.

It seems we also need different ways of specifying relationships  
between classes.  Right now, a class can use 'include' to declare that  
a class should be evaluated, but that's basically it.  It'd be nice to  
be able to specify a dependency, but it'd also be nice to be able to  
somehow merge two classes, akin to the aspect acquisition mentioned  
above.

>
> For a class in the context of a "node bundle" this would automatically
> be the node name. If you nest classes then the node name would
> automatically propagate to lower level classes. So this is just a
> special case within the general bundle framework now. A very elegant
> solution IMO.
>
> Including classes within definitions is possible today. I don't
> understand what this really means in your own definition of classes as
> node services and definitions as resource bundles. So I won't use it
> now. Anyway there is an easy workaround for this in our "bundle
> framework" by including the class "bundle" at a higher level. This is
> semantically identical and IMO cleaner anyway as it enforces real
> encapsulation and disallows code duplication. As we are not concerned
> with backwards compatibility here this is ok.

I'm not really sure what you mean in this section.

>
> More generally: Any instantiation without name would be considered a
> "singleton" instantiation in the context of its bundle. The instance
> name would always default to the bundle's instance name. If you try to
> instantiate two "name-less" bundles within the same context you'd get
> an error. Like that we could even do away with the "singleton/
> prototype" keyword that I proposed in my initial post. And you'd get a
> very nice upgrade path if you discover all of a sudden that you want
> two ssh services running in parallel on one node. Simply transform
> your ssh singleton into an ssh prototype by giving it an explicit
> name.

What if you have multiple classes that want to declare that they  
require a given class?

Currently this is done by each of them calling 'include' on the class,  
but I assume you do away with the 'include' function, and now these  
classes have an error.

I don't think this is a good idea -- in this way, classes can be cross- 
cutting, in that many classes might care that a given class is  
instantiated, so it's important that they can declare this.

>
>> Don't definitions provide most of the functionality you want in your
>> classes?  Wouldn't it make more sense to use them as the new 'class'
>> construct?
>
> Yes absolutely! If you add public parameters to definitions, they are
> exactly my intended target construct. I only retained the "class" word
> rather than the "definition" word as it is closer to what is commonly
> seen as a class in OO languages and rings the right bell in the head
> of somebody trained in OO (of which you probably have as many if not
> more as former cfengine users). To keep "the right part" of your
> cfengine thinking you might also call it an "aspect" following my
> hypothesis of class/aspect equality in declarative languages above .

Ok.

>
>>> 6) You can define "inner classes" to replace today's construct of
>>> definitions within class context.
>>
>> So today's definitions could only ever be defined inside other  
>> classes?
>
> No. Definitions or "bundles" as I like to call them now, can live
> within the top level context. I think that this has become obvious
> from the previous discussion.
>
> Hope that I could answer all your questions. I hope that this was at
> least an interesting read though maybe not what can be actually
> implemented in the foreseeable future.


None of it's that hard, but I'm not yet convinced it's the right  
direction.

-- 
SELF-EVIDENT, adj. Evident to one's self and to nobody else.
     -- Ambrose Bierce
---------------------------------------------------------------------
Luke Kanies | http://reductivelabs.com | http://madstop.com


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Puppet Developers" group.
To post to this group, send email to puppet-dev@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/puppet-dev?hl=en
-~----------~----~----~----~------~----~------~--~---

[Puppet-dev] A more ideal language (was Re: Classes vs. definitions (#1645))

Reply via email to