On Sun, 28 Oct 2012, Christian Persch wrote:

> I'm against a compile time switch like this; 

(I knew I should have kept out of this discussion, but now that I 
haven't...)

I too am against a compile-time switch.

> that there is just one bit left in the 32-bit integer for the pcre
> flags, so I'd be reluctant to use it for something like this).

Actually, there are two bits. Also, if this is an exec-time only feature 
(as is the current implementation), you *could* re-use one of the 
existing bits that is currently compile-time only (e.g. PCRE_EXTENDED).

> * If the data isn't pre-checked by the programme itself, just simply
>   must _not_ pass PCRE_NO_UTF32_CHECK; pcre32_exec() will do that
>   check, returning an error for anything that's not UTF-32 (including
>   the case of those high bits being set!).
> 
> * If the data has been pre-checked for UTF-32-ness by the programme,
>   you can save a bit of runtime by passing PCRE_NO_UTF32_CHECK. This
>   applies whether the programme leaves the high bits as 0, or stores
>   internal stuff there. The bit masking simply allows the programme to
>   save making a sanitised copy of the data just to pass it to
>   pcre32_exec().

I am not unhappy with that, when it comes down to it. My only worry is 
the fact that a single switch does two things, but I can probably live 
with that as long as there is good documentation (something along the 
lines of those two paragraphs). I'll make sure there is. :-)

> It _is not_ matching the code 0x10000042. The input to pcre32_exec()
> may contain that bit pattern, but it is simply transformed (just like
> the gzip example above) before it reaches the matching stage.

I think I see your point here: by setting PCRE_NO_UTF32_CHECK the caller 
has said "the bottom 21 bits of each 32-bit value are guaranteed by me to 
be valid UTF-32; just use them without checking, and ignore the top 11 
bits". This is a different situation to UTF-8 and UTF-16 where there are 
no "spare" bits in the data values. I suppose a hypothetical analogy 
would be storing UTF-16 in the bottom 16 bits of 32-bit values.

> In conclusion: I don't think we need this as a compile-time flag, nor
> as a run-time flag.

I'm beginning to agree. :-)

Philip

-- 
Philip Hazel

-- 
## List details at https://lists.exim.org/mailman/listinfo/pcre-dev 

Reply via email to