>The idea is this: the programme that's using the pcre32 API wants to
>use it on some data it has. That data isn't only used for matching
>however, ie it may also be displayed, etc, and the programme has
>therefore stored some flags into the unused-by-UTF-32 high bits of the

Wow, wow... stop it right there.  Back in the seventies, when we used such 
techniques, they were already considered IMPOLITE (or shall we say, downright 
wrong).  And in those days, both core (actually real CORE) memory and disk 
(usually tape) space were expensive so there was some twisted justification for 
that behavior.

>characters. Now it can't just pass that data to pcre32_exec() since
>those high bits make it not-UTF-32. It could a) create a copy of the

So PCRE (and Perl and anybody else who does pattern matching) has to bow down 
to activities that border in criminal behavior.  I would say, just the 
contrary, if the data is NOT UTF-32 then don't pass it as such or deal with it 
before you pass it to PCRE.

>data, which is costly (allocate + copy), or it could simply instruct
>pcre to ignore those high bits. See the advantage? :-)

I guess, in the scenario you describe (and I assume that it is widely accepted 
methodology, though clearly misguided and wrong) we should provide this masking 
capability, but let me warn anybody who wants to use it, the risk of passing 
some garbage as bona fide UTF-32 may overcome the benefits of using it.
 
Ze'ev Atlas
-- 
## List details at https://lists.exim.org/mailman/listinfo/pcre-dev 

Reply via email to