Re: [pcre-dev] PCRE < 7.6's buffer overflow with UTF-8 character classes

Philip Hazel Wed, 07 Jan 2009 11:32:28 -0800

On Thu, 1 Jan 2009, Geoffrey Sneddon wrote:

> Is there anyway of knowing how many UTF-8 codepoints were needed in a  
> character class to cause the buffer overflow to happen, or is it  
> entirely platform dependent (and not even constant on, say, all 32-bit  
> OSes)? Are codepoint ranges affected? What about when the characters  
> are specified as hex escapes?


It shouldn't matter how the characters are specified. It's now almost a 
year since I fixed that bug, and I'm afraid I cannot remember the 
details at all clearly. The ChangeLog does say "very large number", so 
it must be several thousand, I would have thought. I think it would be 
the same in all environments. However, the limit would have been a 
number of bytes, not a number of codepoints, since each codepoint could 
be a different number of bytes. PCRE uses a temporary buffer when it's 
scanning to see how much store the compiled pattern needs, and if my
memory is right, it wasn't emptying this buffer often enough when 
dealing with this kind of character class.

To learn more, one would have to compare the sources.

Philip

-- 
Philip Hazel

-- 
## List details at http://lists.exim.org/mailman/listinfo/pcre-dev

Re: [pcre-dev] PCRE < 7.6's buffer overflow with UTF-8 character classes

Reply via email to