+1 on hiera-file - I suspect this is where you are going to find this - edge cases around create_resources and other places where people don’t realize they’re serializing data into another format.
On the plus side the people doing this are also the ones most likely to understand the impact and able to work around it. I’m not (consciously) using anything that isn’t UTF-8, and if I find it, I expect the hassle of going through a transform or trying to get the data another way to be much lower than the silliness of putting that much binary data into the catalog anyway. Overall, very +1 on this change. -Eric -- Eric Shamow Sent with Airmail On October 23, 2014 at 6:01:41 PM, Spencer Krum (krum.spen...@gmail.com) wrote: Awesome work Andy. I will be pleased to not see any encoding bugs any more. I also did not know that anecdote about PSON, good stuff. As to your questions abut user use, I use the hiera-file type pretty frequently, and so some of my catalogs have binary data in the 'content' parameter of the file resource. Or at least I think that's what's going on. Can you describe for me how to check my catalogs for the things you're asking about? I'd be happy to generate some results and share them with you. Thanks, Spencer On Thu, Oct 23, 2014 at 5:04 PM, Andy Parker <a...@puppetlabs.com> wrote: A while ago we removed support for puppet to *send* YAML on the network. At the same time we converted to using safe_yaml for receiving YAML in order to keep compatibility with existing agents. Instead of YAML all of the communication was done with PSON, which is a variant of JSON that has been in use in puppet since at least 2010. As far as I understand PSON started out as simply a vendored version of json_pure. The name PSON was apparently because rails would try to patch anything named JSON, and so they needed to name it something different to stop that from happening (that is all hearsay, so I don't know how truthful it is). Over time PSON started to evolve. Little changes were made to it here and there. The largest change came about because of http://projects.puppetlabs.com/issues/5261. The changes for that ticket removed the restriction that only valid UTF-8 could be sent in PSON, which opened the door to a) binary data as file contents and b) absolutely no control over what encodings puppet was using. Over time there have been a large number of issues that have been related to not keeping track of what encoding puppet is dealing with. I'd like to move us away from PSON and onto a standard format. YAML is out of the question because it is either slow and unsafe (all of the YAML vulnerabilities) or extremely slow and safe (safe_yaml). MessagePack might be nice. It is pretty well specified, has a fairly large number of libraries written for it, but it doesn't do much to help us solve the wild west of encoding in puppet. In MessagePack there aren't really any enforcements of string encodings and everything is treated as an array of bytes. In order to keep consistency across various puppet projects we'll be going with JSON. JSON requires that everything is valid UTF-8, which gives us a nice deliberateness to handling data. JSON is pretty fast (not as fast as MessagePack) and there are a lot of libraries if it turns out that the built in json isn't fast enough (puppet-server could use jrjackson, for instance). So what all would be changing? 1. Network communication that is using PSON would move to JSON 2. YAML files that the master and agent write would move to JSON (node, facts, last_run_summary, state, etc.). 3. A new exec node terminus would be written to handle JSON, or the existing one would be updated (check the first byte for '{'). That is just some of the changes that will need to happen. There will be a ripple of other changes based on the fact that JSON has to be UTF-8. 1. A new "encoding" parameter on File and a base64() function. This will allow transferring non-UTF-8 data as file content until we can get a new catalog structure that allows tracking data types and more changes to the language to differentiate Strings from Blobs. 2. Reports will have to strip invalid UTF-8 sequences. Nothing would be worse than a single byte stopping a report from being sent. This is what PuppetDB does right now with facts, catalogs, and reports. 3. Facts can't contain non-UTF-8 data. Facter 2 already enforces this. As I start entering tickets and we work through them, there are probably other things that will come up. I've create https://tickets.puppetlabs.com/browse/PUP-3524 to track this work. Tickets will be created as children of that epic. What I don't know right now is how much of an impact this change will actually have. It isn't really clear how often non-UTF-8 data is actually placed in the catalog. Is it really common and making this change without better support in the language is going to be a huge burden? Or is it pretty common and only shows up in a few specific situations? How can we find out? -- Andrew Parker a...@puppetlabs.com Freenode: zaphod42 Twitter: @aparker42 Software Developer Join us at PuppetConf 2015, October 5-9 in Portland, OR - http://2015.puppetconf.com Register early to save 40%! -- You received this message because you are subscribed to the Google Groups "Puppet Developers" group. To unsubscribe from this group and stop receiving emails from it, send an email to puppet-dev+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/puppet-dev/CANhgQXtegN-WVmJfcb_kQekWg25iVKC1w4P7tJ_rB%2BqzQY3owg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout. -- Spencer Krum (619)-980-7820 -- You received this message because you are subscribed to the Google Groups "Puppet Developers" group. To unsubscribe from this group and stop receiving emails from it, send an email to puppet-dev+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/puppet-dev/CADt6FWPS_q4s%2B3SVs%3DzDaOv0iaFg7KD8NZ1MOeyB7R3-Q6iX5w%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups "Puppet Developers" group. To unsubscribe from this group and stop receiving emails from it, send an email to puppet-dev+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/puppet-dev/etPan.5449a64b.643c9869.2a1%40rassilon. For more options, visit https://groups.google.com/d/optout.