Regexp bytecode

Vaclav Barta Sun, 09 Sep 2007 07:01:00 -0700

Hi,

For my regexp-ordering module (Regexp::Compare, available from CPAN), I'm 
looking at the bytecode generated by perl perl-5.8.8 from \p{...} constructs 
(see perldoc perlre), and there's a funny thing: no matter what I put inside 
the braces (i.e. for '\\p{IsUpper}' and '\\p{IsLower}'), the resulting 
bytecode is always the same...


I took the XS code for compiling regexps (basically just calling pregcomp) 
from http://perl.plover.com/Rx/ and it generally seems to work, but \p{...} 
looks like triggering some special case I'm missing - as far as I can tell, 
pregcomp reserves space for Unicode character classes, but doesn't fill it, 
so the bytecode doesn't really represent the whole input regexp... Is there 
some special invocation telling pregcomp to also do Unicode, or some place 
apart from the bytecode I should look to when interpreting it? Matches 
against \p{...} work correctly (or at least differently for '\\p{IsUpper}' 
and '\\p{IsLower}'), but trying to read the implementation didn't get me very 
far - I'm probably jinxed by the curse at the start of regexec.c...

        Bye
                Vasek

Regexp bytecode

Reply via email to