Re: [Puppet Users] Re: Thoughts about extlookup: http://blog.wl0.org/2011/05/thoughts-about-extlookup-in-puppet/

Simon J Mudd Mon, 16 May 2011 15:18:20 -0700

Hi John,

john.bollin...@stjude.org (jcbollinger) writes:


...

> Let's not cast things in terms of "correctness," except insomuch as
> whether they reliably produce the desired effect on clients.

Indeed. As always there is more than one way to solve a problem.

> Depending on what you're trying to do with Puppet, however, it is
> certainly true that some approaches to structuring your manifests will
> make them easier to write and maintain than will others.  Personally,
> when writing specializations into my manifests, I find it useful to
> keep in mind the question of *why* a particular host or group of hosts
> is different from others.  I have yet to run into an answer that
> didn't fall into one of these categories, or a combination of them:
> 
> 1) It has a functional role that some other systems do not

In my case 90% of the differences are functional differences, often
expressed as part of the hostname.

> 2) Its network environment requires differences from some other
> systems.

There may be some minor differences here but they are the exceptions
rather than the rule.

> 3) It's just special

I have a few of these too.

> It has worked well for me so far to model roles via (unparameterized)
> classes, assigned to nodes via their node declarations.  That leaves
> only one level between "common" and "node-specific" where I might need
> to customize data.

Yes, I have db_server::type_a and db_server::type_b, ...

> (That intermediate level could in principal be
> parameterized by network domain, but in my case it is by subnet.)
> Most of my nodes do not require per-node customization, so I don't end
> up with many data files for extlookup.

In my case one db_server is pretty much similar to another one. Things
vary such as mysql version to use (normally pretty constant), partitions
to mount and their location(s), cron jobs to setup.

All of these are lower level classes which are almost the same except
for a few parameters. I also started with non-parameterised classes so
created a large number of nearly duplicate classes and now I'm
beginning to parameterise some of these the total number of classes
should drop much more easily. Of course if I can lookup the parameters
from a "data source" (extlookup or similar) then I can pretty much
make many classes identical: all they do is pickup the required
parameters and apply them.

> 
> 
> > > Moreover, I disagree with several of the opinions and conclusions in
> > > your post:
> >
> > > 1) You write: "The extlookup() functionality only allows [...
> > > specifying implicitly ...] where to look for this value."  That is
> > > false.  Extlookup does provide for configuration of a standard set of
> > > CSV files to search (which can be parameterized by nodes' facts), but
> > > the function also has an optional parameter to specify a specific file
> > > to be searched first on any given invocation.
> >
> > Perhaps coming from a database background, I'd like to mirror what
> > seems more _natural_ and having values spread around over potentially
> > a large number of files seems non-intuitive.
> 
> 
> If extlookup use would indeed require you to maintain a large number
> of separate files then that might be a good reason to find something
> better, but in all likelihood you can avoid that, or else structure it
> in a sane way.  Consider also:
> 
> When you work with a database, do you normally focus on how it maps
> data to the host filesystem?

Focus no. Again as mentioned I try to simplify the structure by
"normalising" it.  Here I see I want to lookup a parameter ( say
"partition size" ) from somewhere and I need to find it for a server
called $hostname. If I can't find a parameter for $hostname, I may be
happy if I find the same parameter for $domain, or if not I may be
happy with some default value. _My_ preference is to look that up in
one place. Also as mentioned in a different message, doing this lookup
via a regexp might nicely enable me to keep the list of entries short.

> Given the diverse data parameterization you described, if you created
> a database for your configuration data, would you really organize
> everything into a single table?  (And what would be its key?)

yes:

CREATE TABLE lookup_table ( 
        config_item VARCHAR(200) NOT NULL,
        lookup_value VARCHAR(200) NOT NULL,
        return_value VARCHAR(200) NOT NULL
        PRIMARY KEY ( config_item, lookup_value )
)

SELECT return_value FROM lookup_table WHERE config_item = 'lvsize' AND 
lookup_value = 'myhostname001'

provides a fast lookup of the value. If that fails you can do

SELECT return_value FROM lookup_table WHERE config_item = 'lvsize' AND 
lookup_value = 'example.com'

for a more generic response. If that fails you can do:

SELECT return_value FROM lookup_table WHERE config_item = 'lvsize' AND 
lookup_value = 'DEFAULT'

Yes, I'm fully aware this could be normalised better but given the
limited number of entries it's trivial to understand, trivial to
maintain fast to access. The 3 SELECTs could be optimised into a
_single_ more complex SELECT adding a priority value to the lookup
values and a LIMIT 1.

A db server holding a trivial table like this can probably serve 
thousands of queries a second without breaking into a sweat.

> If you wanted to allow customization of your data with varying scopes
> (e.g. supporting both per-node and per-network customization of the
> same parameters) then how would you design the DB?

See above.  In fact I've given the "typical example" of ordering by
$hostname, $domain, 'DEFAULT' but one of the reasons I wanted to have
the flexibilty in providing the input values was so that I can choose
whatever "lookup values" I want.

So I could do something like:

$mount_point = lookup_by_key( 'mount_point', $hostname,    'config_file', '' ) 
# no default
$vgname      = lookup_by_key( 'vgname'       "$hostname,$domain", 
'config_file', 'vg00' ) # first lookup $hostname, then $domain
$lvname      = lookup_by_key( 'lvname',      $hostname,    'config_file', 
'logical_volume_00' )
$lvsize      = lookup_by_key( 'lvsize',      $mount_point, 'config_file', '' ) 
# no default
$fstype      = lookup_by_key( 'fstype',      $mount_point, 'config_file', '' )
$owner       = lookup_by_key( 'owner',       $mount_point, 'config_file', '' )
$group       = lookup_by_key( 'group',       $mount_point, 'config_file', '' )

Now I have all the parameters I need to:
1. create a directory
2. create a logical volume of the required size in the volume group
3. mkfs the filesystem on the logical volume
4. mount it on the directory
5. change the owner, group, and mode as required

All the configs fit nicely in the same config file.

> > > 2) You would prefer looking up data via a compound key
> > > (config_item, lookup_value) rather than via a simple key
> > > (config_item).  
> 
> > > [...]  And even a DB performs multi-key queries
> > > slower than single-key searches.
> >
> > Not if they are part of the primary key. That's part of the point.
> 
> 
> Well that's a possible answer for you then.  Extlookup performs
> lookups based on a single key, and nothing prevents you from using
> keys that allow you to flatten your data.  For example, you can
> structure your data so that instead of extlookup("interface"), you can
> extlookup("interface-$subnet-$is_master").  In other words, the
> distinction in your examples between "config_item" and
> "lookup_value(s)" is artificial and unnecessary.  There is no reason
> why you could not combine them all into a single field in the CSV.

That's true and to be honest something I hadn't considered before.
It certainly works but feels a big ugly to me.

...

> > As written it's clear if you do extlookup('dns_server') that you are
> > getting a "dns server" result. In many of the examples provided
> > the expected use case is: check first for the host, then the domain,
> > then ....., but that's not specific, _and_ may be important.
> 
> When you say it's not "specific," do you mean that you can't tell from
> the extlookup() call itself?  The typical use of extlookup() involves
> defining the lookup precedence once, globally.

Indeed and that is what doesn't feel natural in the way I expect to
use extlookup(). Perhaps that's the problem. I was expecting this to
be a general lookup() function but while it is it assumes a certain
constant usage which I wasn't planing on following :-(

Just to summarise your comments have helped me think out a little bit
more what it is I want and what I'm trying to achieve. I'm beginning
to think that in the end extlookup() isn't quite what I'm looking
for. I was pleasantly suprised to find the code is actually only a few
lines and it _looks_ reasonably easy to adapt it to give me a
lookup_by_key() function with the behaviour I'm more comfortable with.

So thanks for taking the time to explain. I now need to take a look at
the Ruby code for extlookup.rb and adjust it to my needs and see if
that works as wanted.

Simon

-- 
You received this message because you are subscribed to the Google Groups 
"Puppet Users" group.
To post to this group, send email to puppet-users@googlegroups.com.
To unsubscribe from this group, send email to 
puppet-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/puppet-users?hl=en.

Re: [Puppet Users] Re: Thoughts about extlookup: http://blog.wl0.org/2011/05/thoughts-about-extlookup-in-puppet/

Reply via email to