--- In [email protected], "Sheri" <sheri...@...> wrote: > > Wouldn't it be simpler just to keep with the current case flags? > > Woyld just be a simple extra test if such a flag found ("is > > unicode present? then go thataway). > > Yes, I just doubted that the process knows if the "utf8" option is in effect > when processing $u0, etc.
I checked that out, possible to propagate "utf8" option through to replacment string parsing. Already made the change. > Thought adding extra triggers might be cleaner. Also that changing $u0 to > always test for utf8 penalizes the usual non-utf8 situation for every match > in a multiple match situation where a case-modifier is in the format string. > Am I too conservative? I either got to test for the new option letters (extending a switch statement) or test a simple binary within each branch of existing switch. Not much in it. > This regex feature will likely get little usage. Yes, when/if ever needed, it > would be very convenient if the format string handled it. When processing > utf8 regex at that level, you can be confident that the unicode plugin is > running. If existing unicode methods for effecting case changes in utf8 prove > to be too inefficient, don't you think the logical place for adding utf8 > convenience functions would be the unicode plugin? Maybe, it's a thought. I'd need to extend the unicode interplugin api. There's still the overhead of UTF-8/unicode/UTF-8 conversion, no matter where it's done. Here's a thing. The existing regex case conversion stuff takes advantage of conversion not affecting string length; no need to allocate space to accept a string possibly bigger than the one one started with. Suppose I have a lower case UTF-8 string. Any way to know if the result of conversion to upper case will be the same length. In other words, is there any pattern in UTF-8 that says upper/lower case forms of same letter take same number of bytes? I'll have a look, myself, but if you find any rule on the subject, useful to know.
