> Because parts of an rx can be case-insensitive while other parts
> are case-sensitive, we will probably need two sorts of ops anyway
> (or a way to tell the op to be case-insensitive).  And you will
> only be able to do the case folding when the whole rx is 
> case-insensitive.

I don't like your suggestion. I think we should have one set of
ops, but two input strings: one is the original, the other is case-
folded. Rx chooses the right one depending on the current 
case-sensitivity. 2 regex opcodes will be used for this purpose,
op-case-sensitive-start and op-case-insensitive-start. The opcode
will switch strings begins, ends, positions etc.

> It also means creating a copy of the input string, which is something
> the current rx engine in perl5 tries to avoid. And while I will agree
> that it is often faster todo lc($str) =~ /.../ than $str =~ /.../i
> that is normally only the case for small-ish strings.

I don't think the perl5 approach is the best choice. Unicode case folding
is much much more expensive than malloc/free. And we can always use
per-thread free list, unless the regex is nested or the string is very
big, we don't need to allocate any memory.

Hong

Reply via email to