On Thu, Jan 31, 2002 at 11:18:58AM -0800, Hong Zhang wrote: > > Because parts of an rx can be case-insensitive while other parts > > are case-sensitive, we will probably need two sorts of ops anyway > > (or a way to tell the op to be case-insensitive). And you will > > only be able to do the case folding when the whole rx is > > case-insensitive. > > I don't like your suggestion. I think we should have one set of > ops, but two input strings: one is the original, the other is case- > folded. Rx chooses the right one depending on the current > case-sensitivity. 2 regex opcodes will be used for this purpose, > op-case-sensitive-start and op-case-insensitive-start. The opcode > will switch strings begins, ends, positions etc. > > > It also means creating a copy of the input string, which is something > > the current rx engine in perl5 tries to avoid. And while I will agree > > that it is often faster todo lc($str) =~ /.../ than $str =~ /.../i > > that is normally only the case for small-ish strings. > > I don't think the perl5 approach is the best choice. Unicode case folding > is much much more expensive than malloc/free. And we can always use > per-thread free list, unless the regex is nested or the string is very > big, we don't need to allocate any memory.
But as you say, case folding is expensive. And with this approach you are going to case-fold every string that is matched against an rx that has some part of it that is case-insensitive. The case-folding should be done in the rx itself, at compile time if possible. Then it is only done once, which will save a lot of time if the rx happens to be used in a loop or something. Graham.