--- In [email protected], "entropyreduction" 
<alancampbelllists+ya...@...> wrote:
>
> --- In [email protected], "Sheri" <sherip99@> wrote:
> >
> > --- In [email protected], "entropyreduction" 
> > <alancampbelllists+yahoo@> wrote:
> 
> > > Another that checks for unicode handles, but only if the utf
> > > option has been specified (I'm right in thinking a unicode string
> > > can only work if "utf8" option specifed?) 

> > 
> > Great, thanks. Yes, only valid with the utf8 option.
> 
> Okay, try regexPluginVariants207_090813.zip, which contains
> 
> regexNoUnicodeDllLoaded.dll
> regexUnicodeOnlyLoadedOnUTFoption.dll
> 
> but _serious_ health warning: I changed code and compiled, but
> have no time at all to test. The changes were tiny, but that
> doesn;t preclude me screwing things up, as you know. 
>
> I didn't realise (remember?) that you were keen on very efficient
> code. At some time in the autumn I could have a look at the code
> to see if could be speeded up: making very fast code wasn't my
> first object when I wrote the thing to begin with. 

Well in this case it seemed possible that a bottleneck might be removed. 
However it doesn't seem to make much difference. Thanks for letting me see that.

Seems like driving it off the "utf8" option is the most correct approach. 
However utf8 is (for pcre) a compile-time only option. Unicode handles are not 
currently working as arguments for compiled regex pattern handles because its 
erroneous to set the utf8 option on those services. If included on one you get 
an error, e.g., 

ERROR: regex.pcreReplace: Option incomprehensible: utf8

Error occurred near line 103 of script badunicoderegex:
local test=rxpat.pcrereplace(subj2u, repl2u, "utf8")

Maybe the plugin could be changed to observe but discard the option in the 
handle form of a pcreservice. ?

This currently works fine for that precompiled utf8 pattern:

local test=rxpat.pcrereplace(subj2u.to_utf8, repl2u.to_utf8)

Another place they don't currently work is if the newish internal utf8 option 
is used. But I don't see a problem with that. If the user wants to specify 
unicode handles, the external "utf8" option should be used.

Regards,
Sheri

Here were some test runs. I revised the test to remove most use of the debug 
window (which cut the time significantly).

206 NO unicode.dll regexPluginTest elapsed time: 9.83164
206 NO unicode.dll regexPluginTest elapsed time: 9.93363
206 NO unicode.dll regexPluginTest elapsed time: 10.0021

207 All Unicode = yesterday's build, no unicode.dll available
207 All Unicode regexPluginTest elapsed time: 10.0402
207 All Unicode regexPluginTest elapsed time: 10.2889
207 All Unicode regexPluginTest elapsed time: 9.80878

207 NO unicode regexPluginTest elapsed time: 10.2943
207 NO unicode regexPluginTest elapsed time: 9.8729
207 NO unicode regexPluginTest elapsed time: 9.91535

207 NO unicode (but plugin available) regexPluginTest elapsed time: 9.88163
207 NO unicode (but plugin available) regexPluginTest elapsed time: 9.91519

207 U on UTF OPT regexPluginTest elapsed time: 9.83704
207 U on UTF OPT regexPluginTest elapsed time: 10.0283
207 U on UTF OPT regexPluginTest elapsed time: 10.2531


Reply via email to