Re: [pcre-dev] Detecting starting code units

ph10 Wed, 17 Jul 2019 02:01:49 -0700

On Sat, 13 Jul 2019, I wrote:

> > May be "[^a]" can use the same algorithm as "[^ab]"?
> 
> [^a] is optimized into a different (faster) opcode; I will see if this
> can easily produce the same starting code units as [^ab] for tidyness. I
> do not expect it will do much for performance.


Having looked at the code, I have decided for the moment just to leave 
this on the Wish List. Reasons: (a) I don't think it will give much 
performance improvement. (b) It is a surprising amount of work, because 
[^a] is handled as a special "not a", and like just "a" there are a 
number of different opcodes for [^a]* [^a]+ [^a]{1,4} and so on, all of 
which would need handling. (c) It gets complicated in the 16-bit and 
32-bit cases, and is pointless for the UTF-8 case for values greater 
than 255 (e.g. [^\x{1234}]) where it would not lock out any starting
bytes.

Regards,
Philip

-- 
Philip Hazel

-- 
## List details at https://lists.exim.org/mailman/listinfo/pcre-dev

Re: [pcre-dev] Detecting starting code units

Reply via email to