A while ago we removed support for puppet to *send* YAML on the network. At
the same time we converted to using safe_yaml for receiving YAML in order
to keep compatibility with existing agents. Instead of YAML all of the
communication was done with PSON, which is a variant of JSON that has been
in use in puppet since at least 2010. As far as I understand PSON started
out as simply a vendored version of json_pure. The name PSON was apparently
because rails would try to patch anything named JSON, and so they needed to
name it something different to stop that from happening (that is all
hearsay, so I don't know how truthful it is).

Over time PSON started to evolve. Little changes were made to it here and
there. The largest change came about because of
http://projects.puppetlabs.com/issues/5261. The changes for that ticket
removed the restriction that only valid UTF-8 could be sent in PSON, which
opened the door to a) binary data as file contents and b) absolutely no
control over what encodings puppet was using. Over time there have been a
large number of issues that have been related to not keeping track of what
encoding puppet is dealing with.

I'd like to move us away from PSON and onto a standard format. YAML is out
of the question because it is either slow and unsafe (all of the YAML
vulnerabilities) or extremely slow and safe (safe_yaml). MessagePack might
be nice. It is pretty well specified, has a fairly large number of
libraries written for it, but it doesn't do much to help us solve the wild
west of encoding in puppet. In MessagePack there aren't really any
enforcements of string encodings and everything is treated as an array of
bytes.

In order to keep consistency across various puppet projects we'll be going
with JSON. JSON requires that everything is valid UTF-8, which gives us a
nice deliberateness to handling data. JSON is pretty fast (not as fast as
MessagePack) and there are a lot of libraries if it turns out that the
built in json isn't fast enough (puppet-server could use jrjackson, for
instance).

So what all would be changing?

  1. Network communication that is using PSON would move to JSON
  2. YAML files that the master and agent write would move to JSON (node,
facts, last_run_summary, state, etc.).
  3. A new exec node terminus would be written to handle JSON, or the
existing one would be updated (check the first byte for '{').

That is just some of the changes that will need to happen. There will be a
ripple of other changes based on the fact that JSON has to be UTF-8.

  1. A new "encoding" parameter on File and a base64() function. This will
allow transferring non-UTF-8 data as file content until we can get a new
catalog structure that allows tracking data types and more changes to the
language to differentiate Strings from Blobs.
  2. Reports will have to strip invalid UTF-8 sequences. Nothing would be
worse than a single byte stopping a report from being sent. This is what
PuppetDB does right now with facts, catalogs, and reports.
  3. Facts can't contain non-UTF-8 data. Facter 2 already enforces this.

As I start entering tickets and we work through them, there are probably
other things that will come up. I've create
https://tickets.puppetlabs.com/browse/PUP-3524 to track this work. Tickets
will be created as children of that epic.

What I don't know right now is how much of an impact this change will
actually have. It isn't really clear how often non-UTF-8 data is actually
placed in the catalog. Is it really common and making this change without
better support in the language is going to be a huge burden? Or is it
pretty common and only shows up in a few specific situations? How can we
find out?

-- 
Andrew Parker
a...@puppetlabs.com
Freenode: zaphod42
Twitter: @aparker42
Software Developer

*Join us at **PuppetConf 2015, October 5-9 in Portland, OR - *
http://2015.puppetconf.com
*Register early to save 40%!*

-- 
You received this message because you are subscribed to the Google Groups 
"Puppet Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to puppet-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/puppet-dev/CANhgQXtegN-WVmJfcb_kQekWg25iVKC1w4P7tJ_rB%2BqzQY3owg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to