Perl 6 / Rakudo unicode code point ranges

David Warring Mon, 18 Feb 2013 20:29:51 -0800

Hi Guys,
A quick question.

I'm trying to interpret unicode code-point ranges from the CSS 3 spec -
http://www.w3.org/TR/css3-syntax/#CHARSETS


The rule in question is

nonascii :== #x80-#xD7FF #xE000-#xFFFD #x10000-#x10FFFF

Where (I think) these are unicode code-point ranges.

The latest rakudo build is fine with:


% perl6 -e perl6 -e '/<[\c[0x80]..\c[0xD7FF]]>/'


...but doesn't like the second (or third) range:


% perl6 -e '/<[\c[0xE000]..\c[0xFFFD]]>/'
===SORRY!===
Invalid character for UTF-8 encoding


...the individual code points are ok:


% perl6 -e '/<[\c[0xE000]]>/'
% perl6 -e '/<[\c[0xFFFD]]>/'


I'm think I'm getting the above error because not all unicode code-points
are defined for the range xE000 to xFFFD - see
http://www.utf8-chartable.de/unicode-utf8-table.pl  .

I'm just having a problem implementing a concise regex/grammar rule for the
above. Looking for advice.

Cheers,
David Warring

Perl 6 / Rakudo unicode code point ranges

Reply via email to