[power-pro] Re: Unicode bugs? Bruce: re ++)

entropyreduction Tue, 11 Aug 2009 08:54:08 -0700

--- In [email protected], "Sheri" <sheri...@...> wrote:
>
> Unicode.from_num is giving an error that 2 or more arguments are required.
> 
> e.g.,
> local patternu=unicode.from_num(0x2153)



fixed see unicodePlugin0.73_090811.zip
in 
http://tech.groups.yahoo.com/group/power-pro/files/0_TEMP_/AlansPluginProvisional/

> > 
> > If you've got unicode subject why wouldn't one want to search it
> > with a unicode pattern?
> 
> Regex metacharacters, etc., don't rely on high value characters, no need for 
> special encoding. If you needed to include some literal utf8 text in the 
> pattern, I expect it would be easier to do it with pcre's \x{....} than with 
> unicode handles or utf8 strings. If using a literal utf8 string, anything 
> conflicting with PCRE metacharacters would need escaping. The literal would 
> also need concatenating with the rest of the pattern text. All of that would 
> be done more easily with utf8 strings than with unicode handles.

Okay, understood.
 
> [snip] 
>  
> > unicode plugin doesn't know it's being used by another plugin,
> > and can't prevent being so used.
> 
> > 
> > Could be a regex service:
> > 
> > regex.allow_unicode_handles(0/1)
> 
> My thought was perhaps user could use unicode to set and unset a global 
> variable that regex could read. If regex sets the variable, it doesn't mean 
> unicode is loaded.

Ok, how about this: you worry seems to be that regex plugin always drags in 
unicode plugin, even though it's seldom needed.

So I've compiled the few lines of code from unicode source that's needed to 
recognise a unicode handle.  Unicode plugin isn't loaded unless a unicode 
handle is identified within regex.  Just-in-time loading of unicode dll.  That 
do?

If so no need for regex.allow_unicode_handles, I'll take it out.

BTW I seem to be parsing config ini file for

defaultmatchseparator        
defaultutf8matchseparator  

But not using them for anything, or remembering result.  Redundant code?

> > to override config ini setting -- but again, don't undertand why
> > it should be necessary. If for some reason you can't use unicode
> > because regex is interfering, there's an error of some kind in my
> > code.
> 
> I'm not using unicode handles in any regex services (except recent testing). 
> I prefer therefore that regex not be doing extra work looking for unicode 
> handles that aren't there. :D
> 
> For test purposes I just tried putting a unicode handle into pattern and 
> subject (both worked). I also tried putting a unicode handle in the 
> replacement string, but I got the error: regex.pcreReplace: PCRE exec failed 
> Matching error -3
> Programminng Error: PCRE_ERROR_BADOPTION

This seemed to work in regexPlugin207_090811.zip
 
> The option was "utf8". Also tried "u". Worked fine as long as the replacement 
> string was not a unicode handle (including if the replacement string was 
> decoded from a unicode handle to a utf8 string).
> 
> Is it safe to chain the utf8 operations as done below?
> 
> local 
> subjectstring=unicode.from_nums(0x00BC,0x0020,0x2153,0x00A0,0x2154).to_utf8
> ;local replaceu=unicode.from_num(0x2154);; fails
> local replaceu=unicode.new(" ")
> unicode.default_get_set_type("numeric")
> replaceu[0]=0x2154
> local replacestring=unicode.to_utf8(replaceu)
> ;local test=regex.pcrereplace(?"\x{2153}", subjectstring, replaceu, "utf8") 
> ;;fails
> local test=regex.pcrereplace(?"\x{2153}", subjectstring, replacestring, 
> "utf8")
> unicode.messagebox("OK", unicode.from_utf8(test))
> local test=regex.pcrereplace(?"\x{2154}", test, ?"2/3", "utf8")
> win.debug(unicode.from_utf8(test).to_ascii)

Um, that last one's weird.  unicode.from_utf8(test) returns a string.
I'm surprised it worked (because handle syntax shouldn't work when object (left 
side) isn't a unicode handle.

[power-pro] Re: Unicode bugs? Bruce: re ++)

Reply via email to