Issue #6079 has been updated by Nigel Kersten.

Alternative prototype:

https://github.com/nigelkersten/puppet-get
----------------------------------------
Feature #6079: Data/Model Separation - Data in (and out of) Modules
https://projects.puppetlabs.com/issues/6079

Author: Nigel Kersten
Status: Investigating
Priority: Normal
Assignee: Nigel Kersten
Category: 
Target version: 
Keywords: 
Branch: 


# Overview #

There are (at least) two problems related to data in Puppet.

 1. The data is mixed up with the model in our manifests.
 2. Puppet modules cannot be reused because site specific settings cannot be 
specified consistently and easily outside of the module contents.

# Glossary #

These terms should not be taken to be authoritative yet, particularly 
define/declare/evaluate. They are here so we can at least be internally 
self-consistent in this document.

 * **Author** - The person who writes Puppet modules.
 * **User** - The person who is responsible for the infrastructure and consumes 
modules written by an Author. A person may be a an Author and a User.
 * **Define** - When you describe what a Class is in Puppet and what parameters 
it accepts, if any.
 * **Declare** - When you instantiate a singleton instance of a Puppet Class 
and specify values for parameters.
 * **Evaluate** - When the agent applies the catalog.
 * **Parameterized Class** - A Puppet Class that has been Defined with a 
non-zero number of parameters.


All snippets of text will have a single header line to indicate their 
filesystem path as follows:
    # /etc/puppet/puppet.conf
    [master]
    ...

# Constraints #

  1. We must attempt to solve both problems the same way where possible.
  2. It must be easy for an Author to separate model and data.
  3. It must be easy for an Author to specify data according to Author-defined 
criteria.
  4. It must be easy for a User to specify site specific data according to 
User-defined criteria.
  5. For both Users and Authors, the implementation must be as transparent as 
possible in the DSLs; the introduction of new keywords, functions and syntax is 
discouraged, but may be necessary.
  6. The solution must work with as many different ways of invoking Puppet as 
possible.
  7. Author and User specification of data must be possible as simple key/value 
pairs in a text file.
  8. Authors and Users should be able to export to the data format reasonably 
easily.
  9. The solution must not preclude User data specification in storage 
mechanisms other than text files.
  10. We must adhere to existing Puppet conventions such as the autoload 
mechanism where possible.

# Proposal #

## Authoring Modules ##

The primary aim is to solve Problem One.

Modules have a new meaningful sub-directory name, "data", and we add a new 
meaningful file extension to puppet, ".pdl" for Puppet Data Library (PDL). 
(Please don't 
[bike-shed](http://en.wikipedia.org/wiki/Parkinson's_Law_of_Triviality) on this 
name or the extension. It is not important yet.) 

In exactly the same way that the autoloader expects class ntp to be found in:
    <modulepath>/ntp/manifests/init.pp
data for class ntp is found in:
    <modulepath>/ntp/data/init.pdl

In exactly the same way that the autoloader expects class ntp::client to be 
found in:
    <modulepath>/ntp/manifests/client.pp
data for class ntp::client is found in:
    <modulepath>/ntp/data/client.pdl

When you Define a Parameterized Class, if any parameter does not have a default 
specified in the manifest, Puppet consults the PDL.

When a User Declares a Parameterized Class without specifying parameter values, 
Puppet consults the PDL, then any default values the Author has specified if 
the PDL does not provide an answer.

In all following examples, we will be retrieving data for the following classes.

    # <modulepath>/ntp/manifests/init.pp
    class ntp($server, $admin_group) { ... }

    # <modulepath>/ntp/manifests/client.pp
    class ntp::client($iburst) { ... }
    Puppet Data Library Format

### Key/Value pairs ###

$server will be set to "time.ntp.org"

    # <modulepath>/ntp/data/init.pdl
    $server = "time.ntp.org"
    $iburst will be set to true

    # <modulepath>/ntp/data/client.pdl
    $iburst = true

### Extended Syntax ###

The extended syntax allows for values to be specified based upon other 
variables such as facts about the node.

Each (assignment where conditional) is a single line to avoid the requirement 
of block markers.

If $operatingsystem equals "darwin", set $server to "time.apple.com", else if 
$operatingsystem equals "debian", set $server to "time.ntp.org", else set 
$server to "time2.ntp.org".

    # <modulepath>/ntp/data/init.pdl
    $server = "time.apple.com" where $operatingsystem == "darwin"
    $server = "time.ntp.org" where $operatingsystem == "debian" 
    $server = "time2.ntp.org"

If $hardware_type equals "laptop", set $iburst to "false", otherwise, set 
$iburst to "true"

    # <modulepath>/ntp/data/client.pdl
    $iburst = false where $hardware_type == "laptop"
    $iburst = true

If $hardware_type equals "laptop" and $domain equals "puppetlabs.lan", set 
$server to "time.puppetlabs.lan"

    # <modulepath>/ntp/data/client.pdl
    $server = "time.puppetlabs.lan" where ($hardware_type == "laptop" and 
$domain == "puppetlabs.lan)

If an Author wishes to have class ntp::client share a parameter value with 
class ntp, they specify this when they Define class ntp::client. 

    # <modulepath>/ntp/manifests/client.pp 
    class ntp::client($version=$ntp::version) { ... }

## Using Modules ##

The primary aim is to solve Problem Two.

We add a new Puppet configuration parameter, "datalocation" that can be set in 
any configuration block (see earlier reference about bike-shedding).

    # /etc/puppet/puppet.conf
    [agent]
    datalocation = /var/lib/puppet/data
    ...

    [master]
    datalocation = /var/lib/puppet/data
    ...

    [development]
    datalocation = /var/lib/puppet/environments/development/data
    ...

    [testing]
    datalocation = /var/lib/puppet/environments/testing/data
    ...

    [production]
    datalocation = /var/lib/puppet/environments/production/data
    ...

Users can create PDLs in these locations that supplant the Author-specified 
PDLs.

The autoloader expects a very similar structure as specified in Authoring 
Modules, just rooted at $datalocation rather than $modulepath, and without the 
redundant "data" sub-directory. [Particularly interested in feedback on this 
point]

    # <datalocation>/ntp/init.pdl
    $server = "time.mydomain.com"

    # <datalocation>/ntp/client.pdl
    $iburst = true

Exactly the same extended syntax can be used as specified in Authoring Modules.

In the Puppet configuration file, just as we have "manifest" which refers to 
the entry-point manifest, and that defaults to "$manifestdir/site.pp", we now 
add "datalibrary" , which defaults to "$datalocation/site.pdl" (with a final 
reminder about bike-shedding actual names).

The value of $server will still be retrieved from the PDL as specified above, 
but the value of $admin_group will be retrieved from the Site PDL.

    # <datalocation>/ntp/init.pdl
    $server = "time.mydomain.com"

    # <datalocation>/site.pdl
    $admin_group = "corp_dev"

The User can also choose to maintain a single Site PDL with no Module specific 
PDLs if they wish by fully qualifying parameters as follows:
    # <datalocation>/site.pdl
    $ntp::server = "time.mydomain.com"
    $admin_group = "corp_dev"


## Precedence Order ##

As soon as a value for a parameter is discovered, the lookup process stops for 
that parameter.

Precedence is as follows for our class ntp example:

  1. Manually specified parameter values when Declaring a Parameterized Class  
  2. `<datalocation>/ntp/init.pdl`
  3. `<datalocation>/site.pdl`
  4. `<modulepath>/ntp/data/init.pdl`
  5. Manually specified parameter values when Defining a Parameterized Class.


## Rich Data Types ##

We have several ways to encode this. We could use Ruby-style or JSON-style. 
JSON-style is going to be easier for people to write tools to populate data 
files. Arrays are the same in both cases, only Hashes differ.

Again, each of these should be assumed to be on a single line.

### Arrays ###

    $packages = [ "one", "two", "three" ] where $operatingsystem == "debian"

### Ruby-style Array of Hashes ###

    $packages = [ { "name" => "puppet", "ensure" => "installed" }, { "name" => 
"puppetmaster", "ensure" => "installed" }, ]  where $operatingsystem == "debian"

### JSON-style Array of Hashes ###

    $packages = [ { "name":"puppet", "ensure":"installed" }, { 
"name":"puppetmaster","ensure":"installed" }, ] where $operatingsystem == 
"debian"

We are not going to support setting individual rich data elements on the left 
hand side like this:
    $packages[0] = "foo" where $operatingsystem == "debian"    # not 
implementing
    $packages[1] = "bar" where $operatingsystem == "debian"    # not 
implementing
unless someone can come up with a clean and simple design.

# Other Notes #

We envisage other PDL formats other than this plain text one, such as a Ruby 
PDL that could invoke blocks to query other data sources and/or provide more 
complex conditional logic. This format should remain simple.

Under this proposal, if I reference $ntp::version from anywhere outside class 
ntp itself, it should resolve to the same value as the data lookup process 
does. 


# Potential Problems #

 * How do users easily debug the process?
   * How do they check ahead of time that parameters are getting the values 
they expect on a given node?
   * Will we eventually need a command line tool to validate?
 * Security - do we care if all nodes or some nodes etc can use the data 
contained in modules?
   * Is some form of granular security controls needed and if so how would such 
controls get implemented?


# References #

 * 
[http://www.devco.net/archives/2009/08/31/complex_data_and_puppet.php](http://www.devco.net/archives/2009/08/31/complex_data_and_puppet.php)
 * 
[http://www.lab42.it/presentations/puppetmodules/puppetmodules.html](http://www.lab42.it/presentations/puppetmodules/puppetmodules.html)
 * [http://bodepd.com/wordpress/?p=64](http://bodepd.com/wordpress/?p=64)
 * 
[https://github.com/ohadlevy/puppet-lookup](https://github.com/ohadlevy/puppet-lookup)


-- 
You have received this notification because you have either subscribed to it, 
or are involved in it.
To change your notification preferences, please click here: 
http://projects.puppetlabs.com/my/account

-- 
You received this message because you are subscribed to the Google Groups 
"Puppet Bugs" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/puppet-bugs?hl=en.

Reply via email to