On Thu, Aug 03, 2000 at 02:49:11AM -0400, Owen Taylor wrote:
>
> The output of -Dr makes it pretty clear what is going on:
>
> Compiling REx `^\C\C(c)'
> size 10 first at 2
> rarest char c at 0
> 1: BOL(2)
> 2: SANY(3)
> 3: SANY(4)
> 4: OPEN1(6)
> 6: EXACT <c>(8)
> 8: CLOSE1(10)
> 10: END(0)
> anchored `c' at 2 (checking anchored) anchored(BOL) minlen 3
>
> [...]
>
> Guessing start of match, REx `^\C\C(c)' against `École'...
> String not equal...
> Match rejected by optimizer
>
> For regexes compiled with 'use utf8' the anchor position
> is in chars, not bytes, and the re optimizer (study_chunk)
> things that \C counts as one char.
>
> Fixing this looks decidedly unfun.
I now submitted a perlbug on this so that this bug (which
unfortunately still seems to be there) won't be forgotten.
> Regards,
> Owen
--
$jhi++; # http://www.iki.fi/jhi/
# There is this special biologist word we use for 'stable'.
# It is 'dead'. -- Jack Cohen