Issue #6079 has been updated by Nigel Kersten.
For everyone watching this feature, it's well worth going and looking at the work RI has done: [http://www.devco.net/archives/2011/06/05/hiera_a_pluggable_hierarchical_data_store.php](http://www.devco.net/archives/2011/06/05/hiera_a_pluggable_hierarchical_data_store.php) ---------------------------------------- Feature #6079: Data/Model Separation - Data in (and out of) Modules https://projects.puppetlabs.com/issues/6079 Author: Nigel Kersten Status: Investigating Priority: Normal Assignee: Nigel Kersten Category: Target version: Keywords: Branch: # Overview # There are (at least) two problems related to data in Puppet. 1. The data is mixed up with the model in our manifests. 2. Puppet modules cannot be reused because site specific settings cannot be specified consistently and easily outside of the module contents. # Glossary # These terms should not be taken to be authoritative yet, particularly define/declare/evaluate. They are here so we can at least be internally self-consistent in this document. * **Author** - The person who writes Puppet modules. * **User** - The person who is responsible for the infrastructure and consumes modules written by an Author. A person may be a an Author and a User. * **Define** - When you describe what a Class is in Puppet and what parameters it accepts, if any. * **Declare** - When you instantiate a singleton instance of a Puppet Class and specify values for parameters. * **Evaluate** - When the agent applies the catalog. * **Parameterized Class** - A Puppet Class that has been Defined with a non-zero number of parameters. All snippets of text will have a single header line to indicate their filesystem path as follows: # /etc/puppet/puppet.conf [master] ... # Constraints # 1. We must attempt to solve both problems the same way where possible. 2. It must be easy for an Author to separate model and data. 3. It must be easy for an Author to specify data according to Author-defined criteria. 4. It must be easy for a User to specify site specific data according to User-defined criteria. 5. For both Users and Authors, the implementation must be as transparent as possible in the DSLs; the introduction of new keywords, functions and syntax is discouraged, but may be necessary. 6. The solution must work with as many different ways of invoking Puppet as possible. 7. Author and User specification of data must be possible as simple key/value pairs in a text file. 8. Authors and Users should be able to export to the data format reasonably easily. 9. The solution must not preclude User data specification in storage mechanisms other than text files. 10. We must adhere to existing Puppet conventions such as the autoload mechanism where possible. # Proposal # ## Authoring Modules ## The primary aim is to solve Problem One. Modules have a new meaningful sub-directory name, "data", and we add a new meaningful file extension to puppet, ".pdl" for Puppet Data Library (PDL). (Please don't [bike-shed](http://en.wikipedia.org/wiki/Parkinson's_Law_of_Triviality) on this name or the extension. It is not important yet.) In exactly the same way that the autoloader expects class ntp to be found in: <modulepath>/ntp/manifests/init.pp data for class ntp is found in: <modulepath>/ntp/data/init.pdl In exactly the same way that the autoloader expects class ntp::client to be found in: <modulepath>/ntp/manifests/client.pp data for class ntp::client is found in: <modulepath>/ntp/data/client.pdl When you Define a Parameterized Class, if any parameter does not have a default specified in the manifest, Puppet consults the PDL. When a User Declares a Parameterized Class without specifying parameter values, Puppet consults the PDL, then any default values the Author has specified if the PDL does not provide an answer. In all following examples, we will be retrieving data for the following classes. # <modulepath>/ntp/manifests/init.pp class ntp($server, $admin_group) { ... } # <modulepath>/ntp/manifests/client.pp class ntp::client($iburst) { ... } Puppet Data Library Format ### Key/Value pairs ### $server will be set to "time.ntp.org" # <modulepath>/ntp/data/init.pdl $server = "time.ntp.org" $iburst will be set to true # <modulepath>/ntp/data/client.pdl $iburst = true ### Extended Syntax ### The extended syntax allows for values to be specified based upon other variables such as facts about the node. Each (assignment where conditional) is a single line to avoid the requirement of block markers. If $operatingsystem equals "darwin", set $server to "time.apple.com", else if $operatingsystem equals "debian", set $server to "time.ntp.org", else set $server to "time2.ntp.org". # <modulepath>/ntp/data/init.pdl $server = "time.apple.com" where $operatingsystem == "darwin" $server = "time.ntp.org" where $operatingsystem == "debian" $server = "time2.ntp.org" If $hardware_type equals "laptop", set $iburst to "false", otherwise, set $iburst to "true" # <modulepath>/ntp/data/client.pdl $iburst = false where $hardware_type == "laptop" $iburst = true If $hardware_type equals "laptop" and $domain equals "puppetlabs.lan", set $server to "time.puppetlabs.lan" # <modulepath>/ntp/data/client.pdl $server = "time.puppetlabs.lan" where ($hardware_type == "laptop" and $domain == "puppetlabs.lan) If an Author wishes to have class ntp::client share a parameter value with class ntp, they specify this when they Define class ntp::client. # <modulepath>/ntp/manifests/client.pp class ntp::client($version=$ntp::version) { ... } ## Using Modules ## The primary aim is to solve Problem Two. We add a new Puppet configuration parameter, "datalocation" that can be set in any configuration block (see earlier reference about bike-shedding). # /etc/puppet/puppet.conf [agent] datalocation = /var/lib/puppet/data ... [master] datalocation = /var/lib/puppet/data ... [development] datalocation = /var/lib/puppet/environments/development/data ... [testing] datalocation = /var/lib/puppet/environments/testing/data ... [production] datalocation = /var/lib/puppet/environments/production/data ... Users can create PDLs in these locations that supplant the Author-specified PDLs. The autoloader expects a very similar structure as specified in Authoring Modules, just rooted at $datalocation rather than $modulepath, and without the redundant "data" sub-directory. [Particularly interested in feedback on this point] # <datalocation>/ntp/init.pdl $server = "time.mydomain.com" # <datalocation>/ntp/client.pdl $iburst = true Exactly the same extended syntax can be used as specified in Authoring Modules. In the Puppet configuration file, just as we have "manifest" which refers to the entry-point manifest, and that defaults to "$manifestdir/site.pp", we now add "datalibrary" , which defaults to "$datalocation/site.pdl" (with a final reminder about bike-shedding actual names). The value of $server will still be retrieved from the PDL as specified above, but the value of $admin_group will be retrieved from the Site PDL. # <datalocation>/ntp/init.pdl $server = "time.mydomain.com" # <datalocation>/site.pdl $admin_group = "corp_dev" The User can also choose to maintain a single Site PDL with no Module specific PDLs if they wish by fully qualifying parameters as follows: # <datalocation>/site.pdl $ntp::server = "time.mydomain.com" $admin_group = "corp_dev" ## Precedence Order ## As soon as a value for a parameter is discovered, the lookup process stops for that parameter. Precedence is as follows for our class ntp example: 1. Manually specified parameter values when Declaring a Parameterized Class 2. `<datalocation>/ntp/init.pdl` 3. `<datalocation>/site.pdl` 4. `<modulepath>/ntp/data/init.pdl` 5. Manually specified parameter values when Defining a Parameterized Class. ## Rich Data Types ## We have several ways to encode this. We could use Ruby-style or JSON-style. JSON-style is going to be easier for people to write tools to populate data files. Arrays are the same in both cases, only Hashes differ. Again, each of these should be assumed to be on a single line. ### Arrays ### $packages = [ "one", "two", "three" ] where $operatingsystem == "debian" ### Ruby-style Array of Hashes ### $packages = [ { "name" => "puppet", "ensure" => "installed" }, { "name" => "puppetmaster", "ensure" => "installed" }, ] where $operatingsystem == "debian" ### JSON-style Array of Hashes ### $packages = [ { "name":"puppet", "ensure":"installed" }, { "name":"puppetmaster","ensure":"installed" }, ] where $operatingsystem == "debian" We are not going to support setting individual rich data elements on the left hand side like this: $packages[0] = "foo" where $operatingsystem == "debian" # not implementing $packages[1] = "bar" where $operatingsystem == "debian" # not implementing unless someone can come up with a clean and simple design. # Other Notes # We envisage other PDL formats other than this plain text one, such as a Ruby PDL that could invoke blocks to query other data sources and/or provide more complex conditional logic. This format should remain simple. Under this proposal, if I reference $ntp::version from anywhere outside class ntp itself, it should resolve to the same value as the data lookup process does. # Potential Problems # * How do users easily debug the process? * How do they check ahead of time that parameters are getting the values they expect on a given node? * Will we eventually need a command line tool to validate? * Security - do we care if all nodes or some nodes etc can use the data contained in modules? * Is some form of granular security controls needed and if so how would such controls get implemented? # References # * [http://www.devco.net/archives/2009/08/31/complex_data_and_puppet.php](http://www.devco.net/archives/2009/08/31/complex_data_and_puppet.php) * [http://www.lab42.it/presentations/puppetmodules/puppetmodules.html](http://www.lab42.it/presentations/puppetmodules/puppetmodules.html) * [http://bodepd.com/wordpress/?p=64](http://bodepd.com/wordpress/?p=64) * [https://github.com/ohadlevy/puppet-lookup](https://github.com/ohadlevy/puppet-lookup) -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here: http://projects.puppetlabs.com/my/account -- You received this message because you are subscribed to the Google Groups "Puppet Bugs" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/puppet-bugs?hl=en.
