Issue #6079 has been updated by Brian Gallew.
I disagree with Luke on data priority: PDL should always override class
defaults simply from the viewpoint of never wanting to touch the class to
change the data.
Regarding the PDL, Nigel asked for comment on the datalocation tree. While I
hate extra typing as much as the next guy, I think the principle of least
surprise would indicate that the datalocation would keep the /data directories
(omitting the files/manifests/templates/tests/etc ones). For a further
clarification, though, would the PDLs allows substitutions and functions?
# /ntp/data/ntp.pdl
$server = $my_puppet_master
$conffile = template("ntp/ntp.conf.erb")
While talking with Nigel I brought up the possibility of executable manifests,
which could (and if they were implemented, *should*) be mirrored on the PDL
side. The idea is that there are a lot of attempts to write ugly defines so
that we can keep our manifests relatively DRY, and that's bad. Worse, there
are places where it simply doesn't work (e.g. defines of defines). I've been
contemplating adding some code to Puppet such that when it finds a manifest to
read, it checks the executable bit and, if set, executes the file and parses
the output. Likewise, an executable PDL would be executed and the output
consumed. Obviously, if done naively it would be hideously fragile since the
output of the file would absolutely always have to be valid DSL. Also, it
would break on e.g. FAT filesystems which lack the executable bit.
Another thing to think about would be stages (though there's probably another
ticket somewhere that better deserves this comment). As things stand, from a
usage standpoint Puppet feels like a one-pass parser. As Puppet encounters
each parseable item (resource, variable, function) it is immediately evaluated.
ISTR a ticket about "futures" where evaluation of any given item was delayed
as long as possible. This would give templated items a *lot* more utility as
the catalog and associated variables would be much more complete.
Alternatively, add an "include_last" function that simply queues up a class for
inclusion rather than including it immediately.
Let me give you a use case (for pretty much everything I've talked about). All
of my computers fall into one of four classes (this is simplified, of course):
production, testing, development, and services. All of my computers use the
AllowGroups clause in sshd_config to restrict access appropriately. Developers
can login to both development and testing systems, change management can login
to both testing and production systems, and the sysadmins can, of course, login
everywhere (not as root!). In the ideal situation, you would only have to
include the sshserver class once, since clearly every system needs it.
However, if you do that, your sshd_config template will be parsed long before
you have determined the function of a particular node. Today you get around
that by either including the sshserver class in every role (except for the
roles which are inherited, then you have to *only* include it in the children)
which is both confusing and the antithesis of DRY, or else you have some custom
fact that runs on the client (e.g. one that parses
/var/lib/puppet/state/classes.txt) and use that on subsequent runs to generate
the appropriate sshd_config, thus introducing a one cycle delay for convergence.
----------------------------------------
Feature #6079: Data/Model Separation - Data in (and out of) Modules
https://projects.puppetlabs.com/issues/6079
Author: Nigel Kersten
Status: Investigating
Priority: Normal
Assignee: Nigel Kersten
Category:
Target version:
Keywords:
Branch:
# Overview #
There are (at least) two problems related to data in Puppet.
1. The data is mixed up with the model in our manifests.
2. Puppet modules cannot be reused because site specific settings cannot be
specified consistently and easily outside of the module contents.
# Glossary #
These terms should not be taken to be authoritative yet, particularly
define/declare/evaluate. They are here so we can at least be internally
self-consistent in this document.
* **Author** - The person who writes Puppet modules.
* **User** - The person who is responsible for the infrastructure and consumes
modules written by an Author. A person may be a an Author and a User.
* **Define** - When you describe what a Class is in Puppet and what parameters
it accepts, if any.
* **Declare** - When you instantiate a singleton instance of a Puppet Class
and specify values for parameters.
* **Evaluate** - When the agent applies the catalog.
* **Parameterized Class** - A Puppet Class that has been Defined with a
non-zero number of parameters.
All snippets of text will have a single header line to indicate their
filesystem path as follows:
# /etc/puppet/puppet.conf
[master]
...
# Constraints #
1. We must attempt to solve both problems the same way where possible.
2. It must be easy for an Author to separate model and data.
3. It must be easy for an Author to specify data according to Author-defined
criteria.
4. It must be easy for a User to specify site specific data according to
User-defined criteria.
5. For both Users and Authors, the implementation must be as transparent as
possible in the DSLs; the introduction of new keywords, functions and syntax is
discouraged, but may be necessary.
6. The solution must work with as many different ways of invoking Puppet as
possible.
7. Author and User specification of data must be possible as simple key/value
pairs in a text file.
8. Authors and Users should be able to export to the data format reasonably
easily.
9. The solution must not preclude User data specification in storage
mechanisms other than text files.
10. We must adhere to existing Puppet conventions such as the autoload
mechanism where possible.
# Proposal #
## Authoring Modules ##
The primary aim is to solve Problem One.
Modules have a new meaningful sub-directory name, "data", and we add a new
meaningful file extension to puppet, ".pdl" for Puppet Data Library (PDL).
(Please don't
[bike-shed](http://en.wikipedia.org/wiki/Parkinson's_Law_of_Triviality) on this
name or the extension. It is not important yet.)
In exactly the same way that the autoloader expects class ntp to be found in:
<modulepath>/ntp/manifests/init.pp
data for class ntp is found in:
<modulepath>/ntp/data/init.pdl
In exactly the same way that the autoloader expects class ntp::client to be
found in:
<modulepath>/ntp/manifests/client.pp
data for class ntp::client is found in:
<modulepath>/ntp/data/client.pdl
When you Define a Parameterized Class, if any parameter does not have a default
specified in the manifest, Puppet consults the PDL.
When a User Declares a Parameterized Class without specifying parameter values,
Puppet consults the PDL, then any default values the Author has specified if
the PDL does not provide an answer.
In all following examples, we will be retrieving data for the following classes.
# <modulepath>/ntp/manifests/init.pp
class ntp($server, $admin_group) { ... }
# <modulepath>/ntp/manifests/client.pp
class ntp::client($iburst) { ... }
Puppet Data Library Format
### Key/Value pairs ###
$server will be set to "time.ntp.org"
# <modulepath>/ntp/data/init.pdl
$server = "time.ntp.org"
$iburst will be set to true
# <modulepath>/ntp/data/client.pdl
$iburst = true
### Extended Syntax ###
The extended syntax allows for values to be specified based upon other
variables such as facts about the node.
Each (assignment where conditional) is a single line to avoid the requirement
of block markers.
If $operatingsystem equals "darwin", set $server to "time.apple.com", else if
$operatingsystem equals "debian", set $server to "time.ntp.org", else set
$server to "time2.ntp.org".
# <modulepath>/ntp/data/init.pdl
$server = "time.apple.com" where $operatingsystem == "darwin"
$server = "time.ntp.org" where $operatingsystem == "debian"
$server = "time2.ntp.org"
If $hardware_type equals "laptop", set $iburst to "false", otherwise, set
$iburst to "true"
# <modulepath>/ntp/data/client.pdl
$iburst = false where $hardware_type == "laptop"
$iburst = true
If $hardware_type equals "laptop" and $domain equals "puppetlabs.lan", set
$server to "time.puppetlabs.lan"
# <modulepath>/ntp/data/client.pdl
$server = "time.puppetlabs.lan" where ($hardware_type == "laptop" and
$domain == "puppetlabs.lan)
If an Author wishes to have class ntp::client share a parameter value with
class ntp, they specify this when they Define class ntp::client.
# <modulepath>/ntp/manifests/client.pp
class ntp::client($version=$ntp::version) { ... }
## Using Modules ##
The primary aim is to solve Problem Two.
We add a new Puppet configuration parameter, "datalocation" that can be set in
any configuration block (see earlier reference about bike-shedding).
# /etc/puppet/puppet.conf
[agent]
datalocation = /var/lib/puppet/data
...
[master]
datalocation = /var/lib/puppet/data
...
[development]
datalocation = /var/lib/puppet/environments/development/data
...
[testing]
datalocation = /var/lib/puppet/environments/testing/data
...
[production]
datalocation = /var/lib/puppet/environments/production/data
...
Users can create PDLs in these locations that supplant the Author-specified
PDLs.
The autoloader expects a very similar structure as specified in Authoring
Modules, just rooted at $datalocation rather than $modulepath, and without the
redundant "data" sub-directory. [Particularly interested in feedback on this
point]
# <datalocation>/ntp/init.pdl
$server = "time.mydomain.com"
# <datalocation>/ntp/client.pdl
$iburst = true
Exactly the same extended syntax can be used as specified in Authoring Modules.
In the Puppet configuration file, just as we have "manifest" which refers to
the entry-point manifest, and that defaults to "$manifestdir/site.pp", we now
add "datalibrary" , which defaults to "$datalocation/site.pdl" (with a final
reminder about bike-shedding actual names).
The value of $server will still be retrieved from the PDL as specified above,
but the value of $admin_group will be retrieved from the Site PDL.
# <datalocation>/ntp/init.pdl
$server = "time.mydomain.com"
# <datalocation>/site.pdl
$admin_group = "corp_dev"
The User can also choose to maintain a single Site PDL with no Module specific
PDLs if they wish by fully qualifying parameters as follows:
# <datalocation>/site.pdl
$ntp::server = "time.mydomain.com"
$admin_group = "corp_dev"
## Precedence Order ##
As soon as a value for a parameter is discovered, the lookup process stops for
that parameter.
Precedence is as follows for our class ntp example:
1. Manually specified parameter values when Declaring a Parameterized Class
2. `<datalocation>/ntp/init.pdl`
3. `<datalocation>/site.pdl`
4. `<modulepath>/ntp/data/init.pdl`
5. Manually specified parameter values when Defining a Parameterized Class.
## Rich Data Types ##
We have several ways to encode this. We could use Ruby-style or JSON-style.
JSON-style is going to be easier for people to write tools to populate data
files. Arrays are the same in both cases, only Hashes differ.
Again, each of these should be assumed to be on a single line.
### Arrays ###
$packages = [ "one", "two", "three" ] where $operatingsystem == "debian"
### Ruby-style Array of Hashes ###
$packages = [ { "name" => "puppet", "ensure" => "installed" }, { "name" =>
"puppetmaster", "ensure" => "installed" }, ] where $operatingsystem == "debian"
### JSON-style Array of Hashes ###
$packages = [ { "name":"puppet", "ensure":"installed" }, {
"name":"puppetmaster","ensure":"installed" }, ] where $operatingsystem ==
"debian"
We are not going to support setting individual rich data elements on the left
hand side like this:
$packages[0] = "foo" where $operatingsystem == "debian" # not
implementing
$packages[1] = "bar" where $operatingsystem == "debian" # not
implementing
unless someone can come up with a clean and simple design.
# Other Notes #
We envisage other PDL formats other than this plain text one, such as a Ruby
PDL that could invoke blocks to query other data sources and/or provide more
complex conditional logic. This format should remain simple.
Under this proposal, if I reference $ntp::version from anywhere outside class
ntp itself, it should resolve to the same value as the data lookup process
does.
# Potential Problems #
* How do users easily debug the process?
* How do they check ahead of time that parameters are getting the values
they expect on a given node?
* Will we eventually need a command line tool to validate?
* Security - do we care if all nodes or some nodes etc can use the data
contained in modules?
* Is some form of granular security controls needed and if so how would such
controls get implemented?
# References #
*
[http://www.devco.net/archives/2009/08/31/complex_data_and_puppet.php](http://www.devco.net/archives/2009/08/31/complex_data_and_puppet.php)
*
[http://www.lab42.it/presentations/puppetmodules/puppetmodules.html](http://www.lab42.it/presentations/puppetmodules/puppetmodules.html)
* [http://bodepd.com/wordpress/?p=64](http://bodepd.com/wordpress/?p=64)
*
[https://github.com/ohadlevy/puppet-lookup](https://github.com/ohadlevy/puppet-lookup)
--
You have received this notification because you have either subscribed to it,
or are involved in it.
To change your notification preferences, please click here:
http://projects.puppetlabs.com/my/account
--
You received this message because you are subscribed to the Google Groups
"Puppet Bugs" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/puppet-bugs?hl=en.