While playing with Alexander's pg_trgm regexp patch, I noticed that the regexp library trips an assertion (if enabled) or crashes, when passed an input string that contains more than 32k different characters:

select 'foo' ~ (select string_agg(chr(x),'') from generate_series(100, 35000) x) as nastyregex;

This is because it uses 'short' as the datatype to identify colors. When it overflows, -32768 is used as index to the colordesc array, and you get a crash. AFAICS this can't reliably be used for anything more sinister than crashing the backend.

A regex with that many different colors is an extreme case, so I think it's enough to turn the assertion in newcolor() into a run-time check, and throw a "too many colors in regexp" error. Alternatively, we could expand 'color' from short to int, but that would double the memory usage of sane regexps with less different characters.

Thoughts?

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to