Chris Hall skribis 2008-03-11 21:09 (+0000): > OK. In the meantime IMHO chr(n) should be handling utf8 and has no > business worrying about things which UTF-8 or UCS think aren't > characters.
It should do Unicode, not any specific byte encoding, like UTF-?8. Internally, a byte encoding is needed. As a programmer I don't want to be bothered with such implementation details. > Note that chr(n) is whingeing about 0xFFFE, which Encode::en/decode > (UTF-8) are happy with. Unicode defines 0xFFFE and 0xFFFF as > non-characters, not just 0xFFFF (which Encode::en/decode do deem > invalid). Personally, I think Perl should accept these characters without warning, except the strict UTF-8 encoding is requested (which differs from the non-strict UTF8 encoding). > >>In any case, is chr(n) supposed to be utf8 or UTF-8 ? AFAIKS, it's > >>neither. > >It's supposed to be neither on the outside. Internally, it's utf8. > One can turn off the warnings and then chr(n) will happily take any +ve > integer and give you the equivalent character -- so the result is utf8, The result is Unicode. The difference between Unicode and UTF8 is not always clear, but in this case is: the character is Unicode, a single codepoint, the internal implementation is UTF8. Unicode: U+20AC (one character: €) UTF-8: E2 82 AC (three bytes) I am under the impression that you know the difference and made an honest mistake. My detailed expansion is also for lurkers and archives. > [replacement character] > So we'll have to differ on this :-) Yes, although my opinion on this is not strong. undef or replacement character - both are good options. One argument in favor of the replacement character would be backwards compatibility. -- Met vriendelijke groet, Kind regards, Korajn salutojn, Juerd Waalboer: Perl hacker <[EMAIL PROTECTED]> <http://juerd.nl/sig> Convolution: ICT solutions and consultancy <[EMAIL PROTECTED]>