[power-pro] Re: Unicode bugs?

entropyreduction Sat, 15 Aug 2009 12:48:17 -0700

--- In [email protected], "Sheri" <sheri...@...> wrote:
> > Wouldn't it be simpler just to keep with the current case flags?
> > Woyld just be a simple extra test if such a flag found ("is
> > unicode present? then go thataway).
> 
> Yes, I just doubted that the process knows if the "utf8" option is in effect 
> when processing $u0, etc.


I checked that out, possible to propagate "utf8" option through to replacment 
string parsing.  Already made the change.

> Thought adding extra triggers might be cleaner. Also that changing $u0 to 
> always test for utf8 penalizes the usual non-utf8 situation for every match 
> in a multiple match situation where a case-modifier is in the format string. 
> Am I too conservative?

I either got to test for the new option letters (extending a switch statement) 
or test a simple binary within each branch of existing switch.  Not much in it.

> This regex feature will likely get little usage. Yes, when/if ever needed, it 
> would be very convenient if the format string handled it. When processing 
> utf8 regex at that level, you can be confident that the unicode plugin is 
> running. If existing unicode methods for effecting case changes in utf8 prove 
> to be too inefficient, don't you think the logical place for adding utf8 
> convenience functions would be the unicode plugin?
 
Maybe, it's a thought.  I'd need to extend the unicode interplugin api.  
There's still the overhead of UTF-8/unicode/UTF-8 conversion, no matter where 
it's done.

Here's a thing.  The existing regex case conversion stuff takes advantage of 
conversion not affecting string length; no need to allocate space to accept a 
string possibly bigger than the one one started with.  

Suppose I have a lower case UTF-8 string.  Any way to know if the result of 
conversion to upper case will be the same length.  In other words, is there any 
pattern in UTF-8 that says upper/lower case forms of same letter take same 
number of bytes?  I'll have a look, myself, but if you find any rule on the 
subject, useful to know.

[power-pro] Re: Unicode bugs?

Reply via email to