+1 on hiera-file - I suspect this is where you are going to find this - edge 
cases around create_resources and other places where people don’t realize 
they’re serializing data into another format.

On the plus side the people doing this are also the ones most likely to 
understand the impact and able to work around it. I’m not (consciously) using 
anything that isn’t UTF-8, and if I find it, I expect the hassle of going 
through a transform or trying to get the data another way to be much lower than 
the silliness of putting that much binary data into the catalog anyway.

Overall, very +1 on this change.

-Eric

-- 
Eric Shamow
Sent with Airmail

On October 23, 2014 at 6:01:41 PM, Spencer Krum (krum.spen...@gmail.com) wrote:

Awesome work Andy. I will be pleased to not see any encoding bugs any more. I 
also did not know that anecdote about PSON, good stuff.

As to your questions abut user use, I use the hiera-file type pretty 
frequently, and so some of my catalogs have binary data in the 'content' 
parameter of the file resource. Or at least I think that's what's going on. Can 
you describe for me how to check my catalogs for the things you're asking 
about? I'd be happy to generate some results and share them with you.

Thanks,
Spencer

On Thu, Oct 23, 2014 at 5:04 PM, Andy Parker <a...@puppetlabs.com> wrote:
A while ago we removed support for puppet to *send* YAML on the network. At the 
same time we converted to using safe_yaml for receiving YAML in order to keep 
compatibility with existing agents. Instead of YAML all of the communication 
was done with PSON, which is a variant of JSON that has been in use in puppet 
since at least 2010. As far as I understand PSON started out as simply a 
vendored version of json_pure. The name PSON was apparently because rails would 
try to patch anything named JSON, and so they needed to name it something 
different to stop that from happening (that is all hearsay, so I don't know how 
truthful it is).

Over time PSON started to evolve. Little changes were made to it here and 
there. The largest change came about because of 
http://projects.puppetlabs.com/issues/5261. The changes for that ticket removed 
the restriction that only valid UTF-8 could be sent in PSON, which opened the 
door to a) binary data as file contents and b) absolutely no control over what 
encodings puppet was using. Over time there have been a large number of issues 
that have been related to not keeping track of what encoding puppet is dealing 
with.

I'd like to move us away from PSON and onto a standard format. YAML is out of 
the question because it is either slow and unsafe (all of the YAML 
vulnerabilities) or extremely slow and safe (safe_yaml). MessagePack might be 
nice. It is pretty well specified, has a fairly large number of libraries 
written for it, but it doesn't do much to help us solve the wild west of 
encoding in puppet. In MessagePack there aren't really any enforcements of 
string encodings and everything is treated as an array of bytes.

In order to keep consistency across various puppet projects we'll be going with 
JSON. JSON requires that everything is valid UTF-8, which gives us a nice 
deliberateness to handling data. JSON is pretty fast (not as fast as 
MessagePack) and there are a lot of libraries if it turns out that the built in 
json isn't fast enough (puppet-server could use jrjackson, for instance).

So what all would be changing?

  1. Network communication that is using PSON would move to JSON
  2. YAML files that the master and agent write would move to JSON (node, 
facts, last_run_summary, state, etc.).
  3. A new exec node terminus would be written to handle JSON, or the existing 
one would be updated (check the first byte for '{').

That is just some of the changes that will need to happen. There will be a 
ripple of other changes based on the fact that JSON has to be UTF-8.

  1. A new "encoding" parameter on File and a base64() function. This will 
allow transferring non-UTF-8 data as file content until we can get a new 
catalog structure that allows tracking data types and more changes to the 
language to differentiate Strings from Blobs.
  2. Reports will have to strip invalid UTF-8 sequences. Nothing would be worse 
than a single byte stopping a report from being sent. This is what PuppetDB 
does right now with facts, catalogs, and reports.
  3. Facts can't contain non-UTF-8 data. Facter 2 already enforces this.

As I start entering tickets and we work through them, there are probably other 
things that will come up. I've create 
https://tickets.puppetlabs.com/browse/PUP-3524 to track this work. Tickets will 
be created as children of that epic.

What I don't know right now is how much of an impact this change will actually 
have. It isn't really clear how often non-UTF-8 data is actually placed in the 
catalog. Is it really common and making this change without better support in 
the language is going to be a huge burden? Or is it pretty common and only 
shows up in a few specific situations? How can we find out?

--
Andrew Parker
a...@puppetlabs.com
Freenode: zaphod42
Twitter: @aparker42
Software Developer

Join us at PuppetConf 2015, October 5-9 in Portland, OR - 
http://2015.puppetconf.com 
Register early to save 40%!
--
You received this message because you are subscribed to the Google Groups 
"Puppet Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to puppet-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/puppet-dev/CANhgQXtegN-WVmJfcb_kQekWg25iVKC1w4P7tJ_rB%2BqzQY3owg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.



--
Spencer Krum
(619)-980-7820
--
You received this message because you are subscribed to the Google Groups 
"Puppet Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to puppet-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/puppet-dev/CADt6FWPS_q4s%2B3SVs%3DzDaOv0iaFg7KD8NZ1MOeyB7R3-Q6iX5w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"Puppet Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to puppet-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/puppet-dev/etPan.5449a64b.643c9869.2a1%40rassilon.
For more options, visit https://groups.google.com/d/optout.

Reply via email to