Issue #10291 has been updated by Jeff McCune.
# Additional Information # This is a more general encoding issue with Strings in Ruby 1.9 and later. We'll need to try and detect the encoding of each file we load and switch the encoding of the resulting string object on the fly. Related to the paying customer support ticket (535) we specifically need to make this work with templates and the template() function. A great description of the context and surrounding issues are located at: <http://blog.grayproductions.net/articles/ruby_19s_three_default_encodings> <blockquote> I suspect early contact with the new m17n engine is going to come to Rubyists in the form of this error message: invalid multibyte char (US-ASCII) Ruby 1.8 didn't care what you stuck in a random String literal, but 1.9 is a touch pickier. I think you'll see that the change is for the better, but we do need to spend some time learning to play by Ruby's new rules. That takes us to the first of Ruby's three default Encodings. The Source Encoding In Ruby's new grown up world of all encoded data, each and every String needs an Encoding. That means an Encoding must be selected for a String as soon as it is created. One way that a String can be created is for Ruby to execute some code with a String literal in it, like this: str = "A new String" That's a pretty simple String, but what if I use a literal like the following instead? str = "Résumé" What Encoding is that in? That fundamental question is probably the main reason we all struggle a bit with character encodings. You can't tell just from looking at that data what Encoding it is in. Now, if I showed you the bytes you may be able to make an educated guess, but the data just isn't wearing an Encoding name tag. That's true of a frightening lot of data we deal with every day. A plain text file doesn't generally say what Encoding the data inside is in. When you think about that, it's a miracle we can successfully read a lot of things. When we're talking about program code, the problem gets worse. I may want to write my code in UTF-8, but some Japanese programmer may want to write his code in Shift JIS. Ruby should support that and, in fact, 1.9 does. Let's complicate things a bit more though: imagine that I bundle up that UTF-8 code I wrote in a gem and the Japanese programmer later uses it to help with his Shift JIS code. How do we make that work seamlessly? The Ruby 1.8 strategy of one global variable won't survive a test like this, so it was time to switch strategies. Ruby 1.9's answer to this problem is the source Encoding. All Ruby source code now has some Encoding. When you create a String literal in your code, it is assigned the Encoding of your source. That simple rule solves all the problems I just described pretty nicely. As long my source Encoding is UTF-8 and the Japanese programmer's source Encoding is Shift JIS, my literals will work as I expect and his will work as he expects. Obviously if we share any data, we will need to establish some rules about our shared formats using documentation or code that can adapt to different Encodings, but we should have been doing that all along anyway. Thus the only question becomes, what's my source Encoding and how do I change it? </blockquote> ---------------------------------------- Bug #10291: UTF8 non-breaking space in a manifest breaks the parser https://projects.puppetlabs.com/issues/10291 Author: Oliver Hookins Status: Accepted Priority: Normal Assignee: Jeff McCune Category: ruby19 Target version: Affected Puppet version: 2.6.7 Keywords: Branch: <code> err: Could not parse for environment production: Could not match Yum::Repo at /home/ohookins/svn/redacted/repo.pp:4 </code> The actual code is unremarkable, but the problem is here: <code> 00000020 20 7b 0a 20 c2 a0 59 75 6d 3a 3a 52 65 70 6f 20 | {. ..Yum::Repo | 00000030 7b 0a 20 c2 a0 c2 a0 c2 a0 6d 65 74 61 64 61 74 |{. ......metadat| </code> Somehow we've ended up with a UTF8 "nbsp" in our manifest (the 0xc2a0). Sure, I can just remove these characters but it suggests to me that perhaps the Unicode support in the parser is incomplete, which is a larger problem for internationalisation. -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here: http://projects.puppetlabs.com/my/account -- You received this message because you are subscribed to the Google Groups "Puppet Bugs" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/puppet-bugs?hl=en.
