--- In [email protected], "entropyreduction" 
<alancampbelllists+ya...@...> wrote:
>
> --- In [email protected], "Sheri" <sherip99@> wrote:
> >
> > I suppose it would be possible (if you want) to implement a
> > second set of signals in the format string such as x, y, z
> > instead of l, u, t (lower, upper, title). I'm just try to avoid
> > impacting the performance of the case mods for non-utf8 stuff. So
> > the user would include, e.g. $x0 for lower case $0 in utf8.
> > Behind the scene you'd need to convert the backreference from
> > utf8 to unicode, modify the case, and convert back to utf8.
> > Hopefully nothing would be added or lost in translation.
> 
> Wouldn't it be simpler just to keep with the current case flags?
> Woyld just be a simple extra test if such a flag found ("is
> unicode present? then go thataway).

Yes, I just doubted that the process knows if the "utf8" option is in effect 
when processing $u0, etc. Thought adding extra triggers might be cleaner. Also 
that changing $u0 to always test for utf8 penalizes the usual non-utf8 
situation for every match in a multiple match situation where a case-modifier 
is in the format string. Am I too conservative?

> 
> Seems there ought to be some more direct was to convert case in
> UTF8, but a quick search suggested I'd probably have to import a
> big chunk of code to do it. (see e.g. 
> 
> http://bytes.com/topic/c/answers/469334-how-convert-characters-upper-case-utf8-env

This regex feature will likely get little usage. Yes, when/if ever needed, it 
would be very convenient if the format string handled it. When processing utf8 
regex at that level, you can be confident that the unicode plugin is running. 
If existing unicode methods for effecting case changes in utf8 prove to be too 
inefficient, don't you think the logical place for adding utf8 convenience 
functions would be the unicode plugin?

Regards,
Sheri

Reply via email to