Hi, For my regexp-ordering module (Regexp::Compare, available from CPAN), I'm looking at the bytecode generated by perl perl-5.8.8 from \p{...} constructs (see perldoc perlre), and there's a funny thing: no matter what I put inside the braces (i.e. for '\\p{IsUpper}' and '\\p{IsLower}'), the resulting bytecode is always the same...
I took the XS code for compiling regexps (basically just calling pregcomp) from http://perl.plover.com/Rx/ and it generally seems to work, but \p{...} looks like triggering some special case I'm missing - as far as I can tell, pregcomp reserves space for Unicode character classes, but doesn't fill it, so the bytecode doesn't really represent the whole input regexp... Is there some special invocation telling pregcomp to also do Unicode, or some place apart from the bytecode I should look to when interpreting it? Matches against \p{...} work correctly (or at least differently for '\\p{IsUpper}' and '\\p{IsLower}'), but trying to read the implementation didn't get me very far - I'm probably jinxed by the curse at the start of regexec.c... Bye Vasek