I've spent the last few days digesting the results of Luke's queueing
experiments (see e-mails "Asynchronous catalog compiles", "Asynchrony, take
2", and "Asynchrony, take 3"), and reviewing them with Luke, Jesse W, Jacob
H, and Teyo T in an effort to figure out a good way to move forward from
experiment to concrete implementation.

To start things off, I'd like to try to list most salient customer-visible
features that have been motivating our foray into asynchrony and queueing.
 Once we've done that, it should be easier to choose a subset of
functionality to target for 2.7 that is valuable to a lot of people without
requiring years to implement.  I suspect we may discover that some of the
features that motivated the investigation into queueing and asynchrony could
be built today, without needing an architectural change, and if so that
might be a big win in terms of implementation effort.

Here are the salient features that I've culled from the e-mail discussions
and meetings with Luke and others, in no particular order.  Please feel free
to make comments and corrections, talk about what's most important to you
personally, and especially to add your own items if you think I'm missing
something important.



Features:

1. Make low-end scaling easier: currently when a customer's deployment gets
too large to be handled by a single master running Webrick, they have to
install apache/passenger or mongrel.  This can be difficult to do,
especially since passenger has limited package support on some OSes (notably
RHEL/Centos 5).  It would be nice to give people a less painful way of
scaling up beyond what Webrick is capable of handling.

2. Make medium-to-high-end scaling easier: currently when a customer's
deployment gets too large to be handled by a single physical machine, they
have to set up a load balancing infrastructure to distribute HTTPS requests
from client machines to a suite of puppet masters.  It would be nice to give
people a way of adding CPUs to the problem (essentially creating a "catalog
compiler farm") without forcing them to add a layer of infrastructure.

3. Allow customers who already have a queueing system as part of their
infrastructure to use it to scale Puppet, so they don't have to implement a
special Puppet-specific piece of infrastructure.

4. Make Puppet handle load spikes more robustly.  Currently I understand
from Luke that there is an avalance effect once the master reaches 100%
capacity, wherein the machine starts thrashing and actually loses
throughput, causing further load increases.  It would be nice if we could
guarantee that Puppet didn't try to serve more simultaneous requests than
its processors/memory could handle, so that performance would degrade more
gracefully in times of high load.

5. Allow customers to prioritize compilations for some client machines over
others, so that mission critical updates aren't delayed.

6. Allow a "push model", where changing the manifest causes catalogs to be
recompiled, and those catalogs are then pushed out to client machines rather
than waiting for the client machines to contact the master and request
catalogs.

7. Allow inter-machine dependencies to be updated faster (i.e. if machine
A's configuration depends on the stored configuration of machine B, then
when a new catalog gets sent to machine B, push an update to machine A ASAP
rather than waiting for it to contact the master and request a catalog).

8. Allow the fundamental building blocks of Puppet to be decomposed more
easily by advanced customers so that they can build in their own
functionality, especially with respect to caching, request routing, and
reporting.  For example, a customer might decide that instead of building a
brand new catalog in response to every catalog request, they might want to
send a standard pre-built catalog to some clients.  Customers should be able
to do things like this (and make other extensions to puppet that we cannot
anticipate) by putting together the building blocks of puppet in their own
unique ways.

9. Allow for staged rollouts--a customer may want to update a manifest on
the master but have the change propagate to client machines in a controlled
fashion over several days, rather than automatically deploying each machine
whenever it happens to contact the puppet master next.

10. Allow for faster file serving by allowing a client machine to request
multiple files in parallel rather than making a separate REST request for
every single file.

11. Allow for fail-over: if one puppet master crashes, allow other puppet
masters to transparently take over the work it was doing.



Note that these features come with a number of caveats:

A. We don't want to take a big performance hit in order to add these
features, especially since many of the features concern scalability and
hence are performance critical.

B. We don't want to break existing features or introduce new dependencies
(e.g. customers whose deployment is small enough that they don't have major
scalibility problems should be able to continue using HTTPS/Webrick).

C. We don't want to unnecessarily duplicate effort in the code base (e.g. we
wouldn't want to write a complete queueing infrastructure independent of the
indirector that served much the same purpose)

D. We don't want to break compaitibility with older (2.6 and possibly 0.25)
clients.

E. We don't want to sacrifice error handling, and we want the system to be
at least as robust as 0.25 and 2.6 if a puppet master crashes.

F. We don't want to lose (or waste a lot of time re-implementing) the
features that we get "for free" from HTTPS/REST, such as: synchronous file
delivery, ability to tunnel/proxy through firewalls, and a the security of
SSL.

-- 
You received this message because you are subscribed to the Google Groups 
"Puppet Developers" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/puppet-dev?hl=en.

Reply via email to