On Monday, October 14, 2013 2:40:14 AM UTC-5, Erik Dalén wrote: > > I checked this a bit further, and it seems like the policy is to always > have UTF-8 encoding in RPM descriptions etc, but that RPM will happily > build packages with text in other encodings. > >
Whose policy, exactly? As far as I can tell, the RPM file format specifications do not define the character encoding to be used for textual data, therefore it is dangerous (dare I even say "wrong"?) for code that consumes that data to make any assumption whatever about its encoding. Absent a means to determine the correct encoding, the "strings" from RPM headers ought to be handled as byte arrays (since that's what they actually are). > But it seems like a decent workaround to force encoding here as RPM seems > to print the original text out to the console without any charset > translation to the system locale. > > Forcing UTF-8 would rescue only the case where the RPM text is encoded specifically in that encoding (including pure, 7-bit ASCII). If the actual encoding were anything else, and it contained non-ASCII characters, then you would again get an encoding error. If it is important to decode the bytes to characters then it would be better to assume Latin-1, which admits no invalid code sequences. On the other hand, if it is essential to header text to the logical character sequence from which it was encoded then there is no substitute for a reliable method of determining the encoding. John -- You received this message because you are subscribed to the Google Groups "Puppet Developers" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/puppet-dev. For more options, visit https://groups.google.com/groups/opt_out.
