On Thu, 1 Jan 2009, Geoffrey Sneddon wrote: > Is there anyway of knowing how many UTF-8 codepoints were needed in a > character class to cause the buffer overflow to happen, or is it > entirely platform dependent (and not even constant on, say, all 32-bit > OSes)? Are codepoint ranges affected? What about when the characters > are specified as hex escapes?
It shouldn't matter how the characters are specified. It's now almost a year since I fixed that bug, and I'm afraid I cannot remember the details at all clearly. The ChangeLog does say "very large number", so it must be several thousand, I would have thought. I think it would be the same in all environments. However, the limit would have been a number of bytes, not a number of codepoints, since each codepoint could be a different number of bytes. PCRE uses a temporary buffer when it's scanning to see how much store the compiled pattern needs, and if my memory is right, it wasn't emptying this buffer often enough when dealing with this kind of character class. To learn more, one would have to compare the sources. Philip -- Philip Hazel -- ## List details at http://lists.exim.org/mailman/listinfo/pcre-dev
