On Thu, Jan 31, 2002 at 11:18:58AM -0800, Hong Zhang wrote:
> > Because parts of an rx can be case-insensitive while other parts
> > are case-sensitive, we will probably need two sorts of ops anyway
> > (or a way to tell the op to be case-insensitive).  And you will
> > only be able to do the case folding when the whole rx is 
> > case-insensitive.
> 
> I don't like your suggestion. I think we should have one set of
> ops, but two input strings: one is the original, the other is case-
> folded. Rx chooses the right one depending on the current 
> case-sensitivity. 2 regex opcodes will be used for this purpose,
> op-case-sensitive-start and op-case-insensitive-start. The opcode
> will switch strings begins, ends, positions etc.
> 
> > It also means creating a copy of the input string, which is something
> > the current rx engine in perl5 tries to avoid. And while I will agree
> > that it is often faster todo lc($str) =~ /.../ than $str =~ /.../i
> > that is normally only the case for small-ish strings.
> 
> I don't think the perl5 approach is the best choice. Unicode case folding
> is much much more expensive than malloc/free. And we can always use
> per-thread free list, unless the regex is nested or the string is very
> big, we don't need to allocate any memory.

But as you say, case folding is expensive. And with this approach you
are going to case-fold every string that is matched against an rx
that has some part of it that is case-insensitive.

The case-folding should be done in the rx itself, at compile time if possible.
Then it is only done once, which will save a lot of time if the rx happens
to be used in a loop or something.

Graham.

Reply via email to