On Thu, Jan 31, 2002 at 08:54:21AM -0800, Brent Dax wrote:
> Peter Haworth:
> # On Wed, 30 Jan 2002 17:45:58 +0000, Graham Barr wrote:
> # > On Wed, Jan 30, 2002 at 09:32:49AM -0800, Brent Dax wrote:
> # > > #                 rx_setprops P0, "i", 2
> # > > #                 branch $start0
> # > > #         $advance:
> # > > #                 rx_advance P0, $fail
> # > > #         $start0:
> # > > #                 rx_literal P0, "a", $advance
> # > > #
> # > > # First, we set the rx engine to case-insensitive. Why is
> # that bad? It's
> # > > # setting a runtime property for what should be compile-time
> # > > # unicode-character-kung-fu. Assuming your "CPU" knows
> # what the gritty
> # > > # details of unicode in the first place just feels wrong,
> # but I digress.
> # > >
> # > > That "i" does a once-off case-folding operation on the
> # target string.
> # > > All other input to the engine MUST already be case-folded
> # for speed.
> # >
> # > Hm, is that going to work ? What about a rx like
> # /^a(?i:b)C/ where the
> # > case insensitivity only applies to part of the pattern ?
> #
> # Or worse, in /^a(b)c/i, where you want to capture the
> # original character,
> # not the case-folded version?
> 
> Parentheses just record a pair of indices, not a string.

Yes, I was assuming that. However what is to be gained by case
folding the input string ?

Because parts of an rx can be case-insensitive while other parts
are case-sensitive, we will probably need two sorts of ops anyway
(or a way to tell the op to be case-insensitive).  And you will
only be able to do the case folding when the whole rx is case-insensitive.

It also means creating a copy of the input string, which is something
the current rx engine in perl5 tries to avoid. And while I will agree
that it is often faster todo lc($str) =~ /.../ than $str =~ /.../i
that is normally only the case for small-ish strings.

Graham.

Reply via email to