On 2014-07-12 04:50, Henrik Lindberg wrote:
On 2014-11-07 10:55, David Schmitt wrote:
I really dig this idea. Reading it sparked a crazy idea in the
language-designer part of my brain: What about going even further and
making the RHS also an Expression?

In the grammar basically everything would become a function call or just
a sequence of expressions. For the expressiveness of the language it
might do wonders:

Yes, that is how everything else works, but cannot because of the
ambiguity in hash vs. resource body/bodies (in the two different shapes
for regular, override/defaults).

   $ref = File[id]
   $select = File<|title == id|>
   $ref == $select # true
   $type = File

Yes, except we will have issues with the query (it is lazy now). We
either need to make it evaluate immediately, or make the return value
a Future.

   $values = { id => { mode => 0664, owner => root } }
   # equivalent hash shortcut notation for backwards compat and
   # keystroke reduction
   $values = { id: mode => 0664, owner => root }

Ah, neat idea make { x: y => z } mean the same as {x => { y => z}} !

   $defaults = { owner => def, group => def }
   $overrides = { mode => 0 }

   $final = hash_merge($values, { default: $defaults })

Did you mean?
     hash_merge($defaults, $overrides)

(which btw is the same as $defaults + $overrides).
Or, was that an example of a special instruction to the hash_merge to
treat a key of literal 'default' as things that do not override (i.e.
all other keys override, but 'default' defines what to pick if nothing
defined. If so, This is easily expressed directly in the language like
this:

     $final = $defaults + $values + $overrides

Ah, nice! Actually I was thinking of something completely different, but this is better.

   # old style
   create_resources($type, $values, $defaults)
   # basic resource statement
   $type $final
This is problematic, we cannot make any sequence a function call without
requiring that every expression is terminated with punctuation (e.g.
';') - but that must then be applied everywhere.


You are right, having that in the grammar makes no sense. I still think it is a neat detail to keep in mind when thinking about the underlying structure of what we're building.

   # interpreted as function call
   $type($final)

This is problematic because:
- selecting what to call via a general expression has proven in several
languages to be a source of thorny bugs in user code.

Conceded.

   # override chaining
   $ref $overrides
   $select $overrides

   # if create_resources would return the created resources:
It should.

   $created = create_resources($type, $values, $defaults)
   $created $overrides

   # replace create_resources
   File hiera('some_files')

   # different nesting
   file { "/tmp/foo": $value_hash }

   # extreme override chaining
   File['/tmp/bar']
   { mode => 0644 }
   { owner => root }
   { group => root }

   # inverse defaulting
   file { [ '/tmp/1', '/tmp/2' ]: } { mode => 0664, owner => root }

   # define defined()
   defined(File['/tmp/bar']) == !empty(File<|title == '/tmp/bar'|>)

This would require unifying the attribute overriding semantics as almost
everything would become a override.

It would also lift set-of-resources as currently used in simple
collect-and-override statements to an important language element as
almost everything touching resources would "return" such a set.

Formalizing this a little bit:

   * 'type' is a type reference.
   * 'Type' is the list of resources of type 'type' in the current
     catalog (compilation).
This is actually a reference to the set that includes all instances of
that type (irrespective of if they actually exist anywhere). Something
needs to narrow that set to "in the catalog" (which is actually several
sets; realized, virtual, exported from here, imported to here, ...). To
get such a set, there should be a query operator (it operates on a
container and takes a type and predicates for that type).

   * 'Type[expr]' is the resource of type 'type' and the title equal
     to the result of evaluating 'expr'
yes
   * 'Type<| expr |>' is the list of local resources of type 'type' in
     the current compilation where 'expr' evaluates true. As a
     side-effect, it realizes all matched virtual resources.[1]
Some sort of query operator. If we keep the <| |>, it could mean
selection of the "virtual here" container/subset, currently it is
"everything defined here".

If functions can return sets of resources that can be manipulated, the special query operator syntax can be abolished - or at least de-emphasised. See Eric's puppetdbquery for an example.

   * 'Type<<| expr |>>' is the list of local and exported resources of
     type 'type' where 'expr' evaluates true. As a side-effect,
     it realizes all matched exported resources.[2]
Same comment as above, but using a different container.

   * '{ key => value, }' is a simple hash ('hash')
   * '{ title: key => value, }' is a hash-of-hashes. Let's call this a
     untyped resource ('ur') due to its special syntax[3].
   * 'type ur' now syntactically matches what puppet3 has and evaluates
     to the set of resources ('resset') created by
     create_resources('type', 'ur').
   * '[Type1[expr1], Type2[expr2]]' is the resset containing
     'Type1[expr1]' and 'Type2[expr2]'.
That is what you get now. (or rather you get a set of references to the
resource instances, not the instances themselves).

Is there a distinguishable difference for the language user?

   * 'resset hash' (e.g. 'File { mode => 0 }') is an override expression.
     It sets all values from 'hash' on all resources in 'resset'.
   * 'resset -> resset' (and friends) define resource relationships
     between sets of resources.
     'Yumrepo -> Package' would be a nice example, also avoiding
     premature realization.
The relations are recorded as being between references.

   * 'create_resource(type, ur)' returns a resset containing resources
     of type 'type' with the values from 'ur'. Written differently,
     'create_resource' becomes a cast-and-realize operator.[4]
     - This allows things like 'create_resource(...) -> resset' and
       'create_resource(...) hash'
   * 'include someclass' returns the resset of all resources included in
     'someclass'. Note that 'included' is a very weakly defined concept
     in puppet, see Anchor Pattern.
Hm, intriguing idea.

   * Instances of user-defined types might also be seen as heterogeneous
     ressets.

Yes.


[1] It might be worthwhile to start requiring to always write
'realize(Type<| expr |>)' for this side-effect. This looks annoying.

it could be

   Type <| expr |>.realize

Ugh.

[2] Unintentionally realized exported resources seem a much less
frequent problem than the same side-effect on virtual resources causes.
It might make sense to avoid [1] and instead introduce something like
'Type[|expr|]' and 'Type[[|expr|]]' to select without realizing.

I like to go in the other direction with fewer special operators.

As said above, functions returning resource sets might be the way to go then. It's not like we need to design the next APL ;-)

[3] Note that this is really only syntactic. { title => { key => value
}} would be the evaluate to the equivalent untyped resource.
[4] I'm beginning to get an uncanny XPath/PuppetSQL vibe here.

:-)

Up until now, this is MOSTLY syntactic sugar to massively improve the
flexibility of the language. To avoid the most egregious abuses and
traps of this flexibility we have to take a good look at the underlying
datamodel, how evaluating puppet manifests changes this model and what
the result should be.

The result is very simple: the compiled catalog is a heterogeneous set
of resources. In an ideal world is that the contents of this resset is
independent of the evaluation order of the source files (and also the
order of the statements within).

Yes. A Catalog is basically Array[CatalogEntry]

Unifying all kinds of overrides, defaults and "normal" parameter setting
into a single basic operation opens the way to discuss this on a
different level: for a evaluation order independent result, it's not
important how or when a value is set, but it's only important that it is
only set once at most. That is a condition that is easily checked and
enforced if we accept that the evaluator may reject some complex
manifests that could be evaluated theoretically but not with a given
implementation.

yes.

The alert reader rightly complains that defaults and overrides have
different precedences. To make a strict evaluation possible I'd suggest
to create multiple "value slots" on a property. A default, normal and
override slot. The properties' value is the highest priority value
available.

That is one way, yes.

To avoid write/write conflicts in the evaluation, each slot may be
changed only once. This follows directly from the eval-order
independence requirement: when there are two places trying to set the
same property to different values with the same precedence it cannot
work. The argument is the same as for disallowing duplicate resources
currently.

I think this may just move the problem to dueling defaults, dueling
values, and dueling overrides. (This problem occurs in the binder and
there the problem is solved by the rules (expressed in the terms we use
here (except the term 'layer', which I will come back to):
- if two defaults are in conflict, a set value wins
- if two values are in conflict, an override wins
- if two overrides are in conflict, then the one made in the highest
layer wins.
- a layer must be conflict free

Don't you mean "the highest layer with a value must be conflict free" ?

Highest (most important) layer is "the environment", secondly "all
modules" - this means that conflicts bubble to the top, where a user
must resolve the conflict by making the final decision.

The environment level can be thought of as what is expressed in
"site.pp", global or expressed for a  "node" (if we forget for a while
about all the crazy things puppet allows you to do with global scope;
open and redefine code etc).

If you mean what I think you mean, I think like it.

Another example to try to understand this:

  class somemodule { package { "git": ensure => installed } }

  class othermodule { package { "git": ensure => '2.0' } }

  node 'developer-workstation' {
    # force conflict on Package[git]#ensure here: installed != '2.0'
    include somemodule
    include othermodule

    # conflict resolved: higher layer saves the day
    Package[git] { ensure => '2.1' }
  }

How would the parser/grammar/evaluator understand which manifests are part of what layer?

To avoid read/write conflicts in the evaluation, each property may be
sealed to the currently available value(s) when reading from it. This
allows detecting write-after-read situations. At this point the
evaluator has enough information to decide whether the write is safe
(the value doesn't change) or not (the eval-order independence is
violated). In a future version, the evaluator could be changed to return
promises instead of values and to lazy evaluation of promises. That way
it would be possible to evaluate all manifests that have a eval-order
independent result (that is, all that are reference-loop-free).

yes, and now, basically, the catalog is produced using a production
system that was populated by the puppet logic.

The case of +>: the write/write conflict is irrelevant up to the order
of the resulting list. The read/write conflict can be checked like any
other case.

A more subtle problem with this approach are resset-based assignments.
Some examples:

   File { mode => 0644 } # wrong precedence
   file { '/tmp/foo': mode => 0600 }

   File['/tmp/foo'] { mode => 0644 }
   file { '/tmp/foo': mode => 0600 }

   File<| title == '/tmp/foo' |> { mode => 0644 }
   file { '/tmp/foo': mode => 0600 }

   File <| owner == root |> { mode => 0644 }
   file { '/tmp/foo': mode => 0600 }

The solution to this lies in deferring evaluation of all dynamic (Type
and Type<||>) ressets to the end of the compilation. While that would
not influence write/write conflicts, it would force most read/write
conflicts to happen always.

Another ugly thing would be detecting this nonsense:

   File <| mode == 0600 |> { mode => 0644 }


The same read/write conflict detection logic could be re-used for
variables, finally being able to detect use of not-yet-defined variables.

Here we have another problem; variables defined in classes are very
different from those defined elsewhere - they are really
attributes/parameters of the class. All other variables follow the
imperative flow. That has always bothered me and causes leakage from
classes (all the temporary variables, those used for internal purposes
etc). This is also the source of "immutable variables", they really do
not have to be immutable (except in this case).

Yeah, not being able to calculate and reset values in parameters (or class vars) is a pita, leading to all sorts of $managed_ and $real_ variables for little gain. Having proper futures or at least r/w conflict detection might fix that instead of doing immutable.

If we make variables be part of the lazy logic you would be able to write:

   $a = $b + 2
   $b = 2

I think this will confuse people greatly.

Hehe, I can imagine that. When accessing variables across files/classes I do not see that as a big problem, though. Within a single file/scope it can be forbidden, or at least warned/linted.

My own main issue with the idea is that it makes code backwards
incompatible; you cannot write a manifest that uses defaults and
overrides in a way that works both in 3.x and 4.x. (Or, I have not
figured out a way yet at least).

Even if you skip the resources-as-hashes idea, I think most of the
defaults and overrides precedence and eval-order confusion can be
mitigated by a multi-slot implementation for properties as described
above.

And finally, an alternative regarding Overrides, if we want to keep the
left side to be resource instance specific, (i.e no title), we could
simply change it to an assignment of a hash. I.e. instead of

Notify[hi] { message => 'overridden message' }

you write:

Notify[hi] = { message => 'overriden message' }

And now, the right hand side is simply a hash. The evaluator gets a
specific reference to a resource instance, and knows what to do.
(We could also allow both; the type + title in body way, and the
assignment way).

This is what actually triggered my first idea. Also because I really
dislike the assignment there.


The crux here is that just having one expression being followed by
another - e.g:

   1 1

is this a call to 1 with 1 as an argument, or the production of one
value 1, followed by another?

Is the parser and the evaluator so intertwined that that cannot be interpreted in context? "1" is not a callable, therefore it cannot be a function call.

This general problem is solved by stating that for this to be a call,
the first expression must be special; a NAME token that is followed by a
general expression (or a list of expressions: e.g. NAME e,e,e)

We cannot turn a hash into an operator since that would make it close to
impossible to write a literal hash.

Hence... for the resource expressions we need an operator that operates
on  three things, type, id, and named arguments (plus, via the operator,
or through other means) the extra information if each value is a
default, a value, or an override, if it is an addition or a subtraction).

We can solve this by making the data structure special (the {: }),
using an operator, or using more complex but generic data structure
(hash with particular keys). If we use : in hashes to mean hash of hash,
then we made it easier to encode things like defaults, values and
overrides but we lack type and id.

You could read the
    notify { hi: message => hello }
as:
    Notify.new(hi, {message=>hello})

As I see it, the main grammar problem is that there is no "new
operator". Hence my attempt:

   Notify[hi] = {message => hello}


Now I have typed too much already...

Me too ;-)

Dueling ramblers?

I think "Ideating" is the proper jargon here ;-)

To Summarize

I think it will be hard to change the core expression that creates
resource - i.e.

    notify { hi : ...}

and then we are back at where I started;
- we can play tricks with the titel (using a literal default there)
- we can generalize the LHS since {: is an operator (i.e. differentiate
between LHS being name, and a type (notify vs Notify), or being a
resource-set, say from a query like Notify <| |>, or indeed any
expression such as a variable reference. The main problem here is being
able to infer the correct type (when that is not possible we end up with
late evaluation errors if there are mistakes, and they are hard to deal
with), so we may want to restrict the type of expression to those where
type is easily inferred.

:-/


Regards, David

--
You received this message because you are subscribed to the Google Groups "Puppet 
Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to puppet-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/puppet-dev/53C7C818.6060407%40dasz.at.
For more options, visit https://groups.google.com/d/optout.

Reply via email to