Karan Sikka <karanssi...@gmail.com> writes:
>> Having said that, I've had a bee in my bonnet for a long time about
>> removing per-row setup cost for repetitive regex matches, and
>> whatever infrastructure that needs would work for this too.

> What are the per-row setup costs for regex matches?

Well, they're pretty darn high if you have more active regexps than will
fit in that cache, and even if you don't, the cache lookup seems a bit
inefficient.  What I'd really like to do is get rid of that cache in favor
of having a way to treat a precompiled regexp as a constant.  I think this
is probably possible via inventing a "regexp" datatype, which we make the
declared RHS input type for the ~ operator, and give it an implicit cast
from text so that existing queries don't break.  The compiled regexp tree
structure contains pointers so it could never go to disk, but now that
we have the "expanded datum" infrastructure you could imagine that the
on-disk representation is the same as text but we support adding a
compiled tree to it in-memory.

Or maybe we just need a smarter cache mechanism in regexp.c.  A cache
like that might be the only way to deal with a query using variable
patterns (e.g, pattern argument coming from a table column).  But it
seems like basically the wrong approach for the common case of a
constant pattern.

                        regards, tom lane

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to