On Sun, 28 Oct 2012, Christian Persch wrote: > I'm against a compile time switch like this;
(I knew I should have kept out of this discussion, but now that I haven't...) I too am against a compile-time switch. > that there is just one bit left in the 32-bit integer for the pcre > flags, so I'd be reluctant to use it for something like this). Actually, there are two bits. Also, if this is an exec-time only feature (as is the current implementation), you *could* re-use one of the existing bits that is currently compile-time only (e.g. PCRE_EXTENDED). > * If the data isn't pre-checked by the programme itself, just simply > must _not_ pass PCRE_NO_UTF32_CHECK; pcre32_exec() will do that > check, returning an error for anything that's not UTF-32 (including > the case of those high bits being set!). > > * If the data has been pre-checked for UTF-32-ness by the programme, > you can save a bit of runtime by passing PCRE_NO_UTF32_CHECK. This > applies whether the programme leaves the high bits as 0, or stores > internal stuff there. The bit masking simply allows the programme to > save making a sanitised copy of the data just to pass it to > pcre32_exec(). I am not unhappy with that, when it comes down to it. My only worry is the fact that a single switch does two things, but I can probably live with that as long as there is good documentation (something along the lines of those two paragraphs). I'll make sure there is. :-) > It _is not_ matching the code 0x10000042. The input to pcre32_exec() > may contain that bit pattern, but it is simply transformed (just like > the gzip example above) before it reaches the matching stage. I think I see your point here: by setting PCRE_NO_UTF32_CHECK the caller has said "the bottom 21 bits of each 32-bit value are guaranteed by me to be valid UTF-32; just use them without checking, and ignore the top 11 bits". This is a different situation to UTF-8 and UTF-16 where there are no "spare" bits in the data values. I suppose a hypothetical analogy would be storing UTF-16 in the bottom 16 bits of 32-bit values. > In conclusion: I don't think we need this as a compile-time flag, nor > as a run-time flag. I'm beginning to agree. :-) Philip -- Philip Hazel -- ## List details at https://lists.exim.org/mailman/listinfo/pcre-dev
