Re: [Puppet-dev] Re: Draft for new type and provider API

David Schmitt Thu, 02 Mar 2017 05:29:25 -0800

On 28 February 2017 at 17:07, John Bollinger <john.bollin...@stjude.org>
wrote:

>
>
> On Monday, February 27, 2017 at 8:59:46 AM UTC-6, David Schmitt wrote:
>>
>> Hi folks,
>>
>> I've written down the design in README format now. This should make for a
>> easier to ingest from a module developer's perspective. Meanwhile I've also
>> had a shot at a hardcoded implementation of this, and it is technically
>> possible to implement this too.
>>
>
>
> Thanks for your efforts on this documentation.  It helps clarify your
> view.  I think you have some good ideas here, but I also think that what
> you've presented, as you've presented it, is not a viable replacement for
> the type & provider system we have now.  Moreover, there remain important
> areas that I would dearly love to see documented, and one or two longtime
> sore spots that this proposal seems to be missing the opportunity to
> address.
>

This is only a v1. It is presented with the intent to build a simpler, more
approachable foundation for future work.

> *Big picture items*
>
> *Multiple implementations / implementation selection*
>
> In splitting resources into "definition" and "implementation", the
> proposal adheres to a *form* similar to the current system's, but it
> seems to have fundamentally different design goals.  I've always
> interpreted the present type / provider system's separation of resource
> interface from implementation first and foremost as a means to accommodate
> multiple implementations. The one most appropriate to the target system is
> chosen from among those available.  I think that's a sound design approach;
> I like it, and it has served Puppet well.  As far as I can determine,
> however, the proposal loses that completely -- I see no means to support
> multiple implementations at all, much less means to match an implementation
> to the target system.
>

Looking at which types provide multiple implementations, I mostly found
types from core puppet. Even looking at those (e.g. package, user,
service), they have many attributes that are provider specific, making the
abstractions very leaky. Other examples in the wider software world  (e.g.
SQL mappers, vendor-independent configuration schemes) also highlight the
issues with one-size-fits-all approaches: to get the most out of a specific
system, users require full access to the lower layer's capabilities. Having
said that, it does not invalidate the use-cases where a broad applicability
trumps those special cases. Puppet already has a perfectly reasonable DSL
to deal with those cases. For example, a hypothetical successor to the
package type could look like this:

define package (
  Ensure $ensure,
  Enum[apt, rpm] $provider, # have a hiera 5 dynamic binding to a function
choosing a sensible default for the current system
  Optional[String] $source  = undef,
  Optional[String] $version = undef,
  Optional[Hash] $options   = { },
) {
  case $provider {
    apt: {
      package_apt { $title:
        ensure          => $ensure,
        source          => $source,
        version         => $version,
        *               => $options,
      }
    }
    rpm: {
      package_rpm { $title:
        ensure => $ensure,
        source => $source,
        *      => $options,
      }
      if defined($version) { fail("RPM doesn't support \$version") }
      # ...
    }
  }
}

This would provide a best of both worlds approach that exposes the
subtleties of particular systems, and the full use of those features,
without overloading a common interface, while still providing a quick and
simple top-level entry point, if you don't need that level of detail.

> *Inadequate interface documentation*
>
> As I said earlier, one of the biggest problems with the current system is
> inadequate documentation.  As it now stands, the proposal's documentation
> does about as well as the docs of the current system.
>

Thank you for this praise, seeing as I'm up against a team of world-class
tech writers, and legends like Dan Bode :-D

>   Missing is information about how the runtime environment is intended to
> use the defined objects and methods -- for example, under what
> circumstances it will instantiate / invoke them, how many times per
> transaction (partially addressed in the doc), at what point(s) in the
> transaction.  What may the environment do with the object returned by
> get() and its contents (i.e., must implementations assume that the
> environment may modify them)?  Is there anything the environment is
> required to do with them?  What may an implementation do with the object
> passed to set() and its contents?  Is there any expectation about
> relationships between the objects provided by get() and received by put()?
>

Very good questions all of them. I haven't gotten into those details yet.
At a high level I'd expect the data passed around from get() and to set()
will be passed "by-value." That is, no references that are handed out be
get() will be valid within set(), and nothing is depending on a specific
behaviour of the implementation around those references. You are completely
right in that these details need to be written out in the docs.

>
> Similarly, are the "utilities" provided by the runtime environment
> supposed to be the only mechanism by which implementations interact with
> the host Ruby environment?
>

The "host Ruby environment" is a large scope. For example interaction with
the current catalog during evaluation is definitely on the list of things I
currently don't want to address (although there are good reasons for some
to do so, e.g. concat). On the other hand, I don't want to keep a
implementation from using ruby's general capabilities. Do you have specific
things in mind?

>   The only mechanism by which implementations interact with the target
> system at all?
>

Not at all. Implementations are free to use ruby in any way they see fit.
There is no value in e.g. prescribing a way for talking to an HTTP/REST
API. Looking at the most interesting examples (AWS, azure, etc) those
projects already bring their own SDKs, so there is no way anyways to
improve on that.

> *Pushing new responsibilities onto resource implementations*
>
> The minimalistic approach to defining the interface for resource
> implementations has the inevitable effect of making the host environment
> incapable of some of the tasks that it presently performs, leaving the onus
> on resource implementations to fill the gaps.  For example, since there is
> no way for the environment to retrieve the state of a specified individual
> resource, and because some resource types cannot feasibly be enumerated,
> the environment cannot, in general, determine whether resources are in
> sync.  Only resource implementations can be relied upon to do that, and
> then only in the context of their put() methods.
>

Yes. This is a conscious decision to keep the v1 simple. Evolving the
interface later by adding more capabilities is easier than fixing a rushed
design. For many resources getting all of them is as costly as getting a
single one, e.g. the apt-key thing.

> Similarly, although I definitely like that a logging utility is made
> available to resource implementations, it seems that most responsibility
> for logging now falls on resource implementations as well.  Currently, the
> environment handles a fair amount of that.  It seems likely that moving the
> responsibility to resource implementations will not only make them more
> complicated to write, but will also reduce the consistency of log messages.
>

The current reporting from the puppet agent is entirely aspirational. Log
messages are emitted when values are passed to the provider, not when
changes are effected. As the new resources will have to run alongside
existing resources for a long while, this will not go away. Logging from
the implementation will allow more accurate reporting about what actually
happens. I can totally see simple implementations skipping logging
altogether, although I would not recommend that. To support people new to
provider development, the API beyond the basic messaging functions is
intended to provide guidance on how to log, especially around error
handling.

>
> *Terminology Changes*
>
> The proposal seems to gratuitously change established terminology.  What
> we now call "properties" are to be referred to as "attributes", which
> itself is currently used in a broader sense.  "Types" become "definitions";
> "providers" become "implementations".  Why?  However well or poorly you
> like the established terminology, it *is* established.  The proposal
> seems to be offering variations on these existing things, not entirely new
> things, and changing all the terminology is likely to be more harmful than
> helpful.
>

Partially, this was because those are not the same things. Partially to get
a feel of people's reactions. Your's speaks clearly.

>
> *Specific features*
>
> *Attribute and parameter validation*
>
> Validation ought to be performed during catalog building, thus it needs to
> be adequately addressed by type "definitions".  I am dissatisfied with the
> proposal's provision for that.  The Puppet type system is very expressive,
> but it can also be very arcane.  I maintain that it is a mistake to rely
> exclusively on the type system for validating "attributes" or "operational
> parameters".  Moreover, I think the proposal is missing an opportunity to
> provide for multiple-attribute, multiple-parameter covalidation.  This has
> been requested on and off for a long time, and this proposal is a great
> opportunity to finally put it in place.
>

Full validation cannot be performed during catalog building. Accepting
that, the conversation turns to where to put the different pieces. One of
the design constraints is to minimize ruby code on the master for
environment isolation. That said, multiple-attribute, multiple-parameter
covalidation definitely is on my list of things I want to support
server-side in the future. One possibility would be to lift the definition
part into the puppet DSL, and do validation there. If we get all the
infrastructure for that in place it'll be an automatable upgrade from the
definitions described here, and it will require even more tooling for
backwards compatibility. I'm not comfortable to delay this effort further
for features that can be seamlessly retrofitted later.

>
> *Transaction progress and state*
>
> Resource implementations are afforded only indirect information about
> transaction context.  For example, the docs say that "The implementation
> should not try to cache state beyond the transaction", but how, exactly, is
> the implementation supposed to recognize the end of a transaction?
>
  It would be highly desirable for implementations to have means to at
> least opt in to messages about transaction lifecycle and events.
>

This has come up now multiple times, and I'll add a optional
end-of-transaction lifecycle method to the implementation. I'll need to
figure out how to hook that up to puppet though. Any suggestions on naming?

> For some purposes, it would be useful to have a way for type
> implementations to be able to communicate with each other within the scope
> of a transaction.
>

Cooperating implementations can use regular ruby mechanisms, like
singletons in a ruby module or class, to communicate. In a polyglot world
where some or all providers are running outside the main agent process,
that'll cease to be a viable way to solve these problems. Maybe it'll turn
out that we need to keep the old API around indefinitely, or maybe we'll
find better ways to solve these problems. For example, native concat could
very well be implemented as a catalog transformation, that runs on the
server, after compilation, collects all fragments and *replaces* them in
the catalog with the final file. The fact that all these nifty things
currently run as types and providers does not prevent us from exploring
better solutions for the problems they are solving. And the old APIs will
remain in place for quite a while.

> *Details*
>
> *"Read-only" vs. "init-only"*
>
> I understand -- I think -- the distinction you're drawing here, but it's
> confusing.  Surely if a resource does not yet exist, then *all* its
> attributes are logically written when it is created / initialized.  I could
> understand that is a plausible exception to read-onlyness if not for the
> explicit provision for init-only attributes.  If you want to maintain both
> of these cases among the attribute metadata then please rename one of
> them.  For example, perhaps "derived" or "internal" would describe what you
> mean by "read-only".
>

I've added more explanation around read_only and init_only:

  * `init_only`: this attribute can only be set during creation of the
resource. Its value will be reported going forward, but trying to change it
later will lead to an error. For example, the base image for a VM, or the
UID of a user.
  * `read_only`: values for this attribute will be returned by `get()`, but
`set()` is not able to change them. Values for this should never be
specified in a manifest. For example the checksum of a file, or the MAC
address of a network interface.

I hope this makes the distinction clearer.

> *Compound resource names*
>
> I suppose that compound namevars are a topic that you hoped to avoid in
> your push for simplification, though it looks like they might not be hard
> to support via the interface you have already described.  This is a weak
> area for the present system, and one that I think any replacement should
> support well and and fully.
>

Composite namevars are a topic dear to my heart (see
https://groups.google.com/d/msg/puppet-dev/xQN5nTHrIc0/bUplrjQj7sAJ for
historical context,
https://github.com/puppetlabs/puppet-specifications/blob/master/language/resource_types.md#title-patterns
for the current (incomplete) spec). Exactly the fact that they'll be easy
to support, is the reason why they're not in the v1. I've started a "Known
Limitations" section in the README to collect all these points and
arguments. Also provides a good starting point for anyone who wants to get
involved.

>
> *Attribute / parameter clashes*
>
> By specifying attributes and operational parameters via separate hashes,
> the proposed system affords the opportunity for name collisions between
> these.  That also makes resource definitions less cohesive with Puppet
> overall, because in the Puppet language, attributes and operational
> parameters are specified together, in the same syntactic construct.  I
> fully understand the differences between these, but I suggest they not be
> reflected by a complete separation between their specifications, such as
> the proposal currently provides.
>

Excellent point, I didn't think of the uniqueness requirements here. I'll
fold them into a single structure.

>
> *Resource implementations that become usable during a run*
>
> It is only relatively recent that Puppet implemented the long-desired
> feature of supporting resource providers that become usable as a result of
> other resources being managed earlier in the same run.  It is not clear
> whether or how that would be supported under the proposal as currently
> presented, but this is a feature that I do not think we want to lose.
>

To quote from the readme:

To allow for a transaction to set up the prerequisites for an
> implementation, and use it immediately, the provider is instantiated as
> late as possible.
>

Together with autoloading, this should address the situation. It might be
necessary to put `require` statements into the body of the registered
block, instead of the beginning of the file.

> *Environment for commands*
>
> If resource implementations are expected to use the 'commands' utility for
> running system commands, then that utility needs to provide a mechanism for
> specifying environment variables that must be set in the environment of
> those commands.
>

Good call on the environment support. I've added it like this:

To pass additional environment variables through to the command, pass a hash
of them as `env:`:

```ruby
apt_key 'del', key_id, env: { 'LC_ALL': 'C' }
```

>   It is furthermore highly desirable that it be possible for the values
> and maybe even the names of those variables to be drawn from the values of
> a resource's operational parameters.
>

The ruby namespace for those methods is shared between all resource
instances. An alternative would be a way to access the commands
functionality without having to pre-specify the command name, e.g.:

`run_command '/usr/bin/apt-key', 'del', key_id`

That can be used with the get/set methods and access dynamic values.

>
> *Host object*
>
> The doc refers in a couple of places to a "host object".  What is this?
> What are its methods and properties?  How is it to be used?
>

The ruby object hosting the get and set methods. I've changed the wording
to make that clearer: "The object instance hosting the `get` and `set`
methods can be used to cache ephemeral state during..."

>
> *Overall*
>
> I can see that a lot of thought and effort has gone into this proposal.
> I've focused mostly on criticisms, but I want to acknowledge that there's
> some good work here.
>
> Nevertheless, I regret to say that I dislike the overall proposal in its
> current form.  It could be improved to some extent by more complete
> documentation, but documentation notwithstanding, I think it focuses too
> much on simplification and on the resource itself, and not enough on how
> resources will be used by and interact with the larger system.  I do not
> doubt that the proposed API can be implemented, and that's indeed a mark in
> its favor.  If it is intended only as a simpler *alternative* to the
> present system, then many of my more significant concerns are irrelevant.
> To the extent that this is being positioned to ultimately replace the
> current system, however, it doesn't provide everything it needs to do, it
> misses providing some things it should do, and I'm dissatisfied with the
> way it provides some of the things it does.
>

This API is not a full replacement for the power of 3.x style types and
providers. In the end, the goal of the new Resource API is not to be a
complete replacement of prior art, but a cleaner way to get good results
for the majority of simple cases.

Thank you for the time and work you're investing in reviewing this! It
already has made the thing better.

Regards, David

-- 
You received this message because you are subscribed to the Google Groups 
"Puppet Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to puppet-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/puppet-dev/CALF7fHZ3E4chqF4DPvMZmPGum2C_Vo0idsWpwAPkB22ZbUqNmQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [Puppet-dev] Re: Draft for new type and provider API

Reply via email to