Re: [Puppet-dev] Re: Data loading

Trevor Vaughan Wed, 02 Jun 2010 18:54:51 -0700

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I've been following this thread with interest and I think that Donavan
is hitting upon something that I've also been wanting.

However, the way I was looking at it was as a set of atomic, optionally
blocking, semaphores in a set of parallel threads.

If you look at each puppet client as an individual thread and realize
that you need things to happen cross-client that depend on the state of
one or more clients, then you've deconstructed this into a classic
parallel programming application (with all the constituent nonsense).

I was toying with the idea of a, for lack of a better term, registry
where you could have your modules place *any* type of data and other
systems would retrieve that data/data sets when necessary.

I would *not* build this into the puppetmaster for scaling reasons but
would instead make it a separate data service perhaps optionally backed
by a distributed database.

This scenario works both for the OpenLDAP situation below as well as the
situation where you can't start a service/don't want to apply part of a
manifest, until something has happened on a different system.

OpenLDAP Example:

Node A -
  - Request Lock with data broker
  - Obtain lock
Node B -
  - Request Lock with data broker
  - Sleep or skip on returned block
Node A -
  - Obtain RID list
  - RID +1
  - Register new RID with data broker
  - Release Lock
Node B -
  - Get notified that lock is available if didn't skip
  - Continue with processing ....

It's a complex situation, but I'm not entirely sure how else to do it
without a kludgy web "service" or the like. If you could, perhaps, use
some of the recent NoSQL abstractions then this may be a reasonably fast
operation.

Of course, not all data would need to be locked, if you're just reading
data, then there's no need to lock at all. But, in my opinion, this is
fundamentally cross-system parallel programming at its best and I think
that the existing techniques for dealing with the problem would be best
suited to the task.

I do see a growing trend in this thread and others that, no matter what
you choose, someone is going to need something else. Such is the nature
of the vast array of data that we have to pull from.

For example, Person A with 500 servers will be happy with some GUI
wrapped around YAML. However, Person B with 5000 servers will find the
same solution to be tedious and slow and will want a full-on database.

Hope this helps and doesn't just make the whole thing more complicated.

Thanks,

Trevor

On 05/28/2010 03:41 PM, donavan wrote:
> On May 28, 3:00 am, Luke Kanies <[email protected]> wrote:
>> External data (that is, data specified outside of Puppet manifests)  
>> seems to keep coming up.  This is a relatively long description of  
>> where it seems we are and where we should go from here.
> 
> I'd like to +1 this discussion in general. My personal #1 wishlist
> item is the 'data from other nodes' problem that Daniel mentions.
> 
> It seems to me that more separation between logic and data is needed
> in Puppet manifests. It's one of the main problems I see with module
> redistribution. Apache module A is written for Debian, Apache module B
> is written for RHEL, etc. Even if that was cleaned up you still see
> $adminpassword variables, and not everyone wants the same list of
> modules loaded/installed.
> 
>> * Alessandro's presentation caused someone to point out to me  
>> afterward that case statements of this ilk:
>>
>> case $operatingsytem {
>> debian: { ... }
>> redhat: { ... }
>>
>> }
> 
> I echo Jonathans sentiments on this. I think a better alternative to
> he above is something like this:
> class apache {
>  include apache::$operatingsystem
> }
> 
> To add solaris support I add solaris.pp into the load path.
> Class[apache::solaris] can then include or inherit
> Class[apache::base], as needed. Then I extend the module and limit
> conflict with upstream manifests. It may not be ideal, but it works
> today.
> 
>> * Users should probably be able to put their external data in a  
>> database, preferably in their external node tool
> 
> I'd like to second Daniels comments regarding data of other nodes.
> This is pretty much required when configuring distributed services.
> Today I can get most of the the way there with storeconfigs & clever
> defines, but it's not ideal. I posted a question about hacking faux
> distributed key value storage using storeconfigs, but got crickets.
> 
> A common use case for me is OpenLDAP replication (syncrepl). Each
> slave is identified by a 3 digit integer (rid), which must be unique
> in the replication group, that needs to persist for the life of that
> replica. So each new slave needs to know the rid of every existing
> slave, and then pick the next available rid. The same pattern applies
> to MySQL replication, for example.
> 
>> I also don't like the idea of just relying on a function - I'd like a  
>> class to be able to declare that it relies on external data, so that  
>> users know what they can configure in their class.
> 
> Isn't this part of parameterized classes? Today I'd probably do 'if
> $value == "" { fail("must define \$value") }' where required.
> 

- -- 
Trevor Vaughan
 Vice President, Onyx Point, Inc.
 email: [email protected]
 phone: 410-541-ONYX (6699)
 pgp: 0x6C701E94

- -- This account not approved for unencrypted sensitive information --
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)

iEYEARECAAYFAkwHC1sACgkQyWMIJmxwHpRK1QCfR2quLSuugQjJymBllxGV4PDU
LrQAoJpizbcZz3BpzzTE2WmsBKs4OM7/
=b0gf
-----END PGP SIGNATURE-----

-- 
You received this message because you are subscribed to the Google Groups 
"Puppet Developers" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/puppet-dev?hl=en.

<<attachment: tvaughan.vcf>>

Re: [Puppet-dev] Re: Data loading

Reply via email to