Re: utf8::valid and \x14_000 - \x1F_0000

2008-03-13 Thread Chris Hall
On Wed, 12 Mar 2008 Juerd Waalboer wrote Chris Hall skribis 2008-03-12 20:49 (+): a. are you saying that characters in Perl are Unicode ? Yes. They are called Unicode, at least. This has my preference for explanation and documentation. b. or are you agreeing that characters in

Re: utf8::valid and \x14_000 - \x1F_0000

2008-03-12 Thread Juerd Waalboer
Chris Hall skribis 2008-03-12 13:20 (+): OK. In the meantime IMHO chr(n) should be handling utf8 and has no business worrying about things which UTF-8 or UCS think aren't characters. It should do Unicode, not any specific byte encoding, like UTF-?8. IMHO chr(n) should do characters,

Re: utf8::valid and \x14_000 - \x1F_0000

2008-03-12 Thread Chris Hall
On Wed, 12 Mar 2008 Juerd Waalboer wrote Chris Hall skribis 2008-03-12 13:20 (+): String literals are represented by UCS code points. Which reinforces the feeling that characters in Perl are Unicode. Yes! OK. For the avoidance of doubt: a. are you saying that

Re: utf8::valid and \x14_000 - \x1F_0000

2008-03-11 Thread Chris Hall
On Tue, 11 Mar 2008 you wrote Chris Hall skribis 2008-03-11 18:48 (+): I'm comfortable with the notion that perl characters are unsigned integers that overlap UCS, and happen to be held internally as a superset of UTF-8. I wonder if perl is completely comfortable. It isn't. There are

Re: utf8::valid and \x14_000 - \x1F_0000

2008-03-11 Thread Juerd Waalboer
Chris Hall skribis 2008-03-11 21:09 (+): OK. In the meantime IMHO chr(n) should be handling utf8 and has no business worrying about things which UTF-8 or UCS think aren't characters. It should do Unicode, not any specific byte encoding, like UTF-?8. Internally, a byte encoding is