On 09/17/2014 07:56 AM, Alexander Hill wrote:
Hello,

I have a table of tree nodes with a tsquery column. To get a subtree's
tsquery, I need to OR all of its nodes' tsqueries together.

I defined a custom aggregate using tsquery_or:

     CREATE AGGREGATE tsquery_or_agg (tsquery)
     (
         sfunc = tsquery_or,
         stype = tsquery
     );

but I've found that

     tsquery_or_agg(query)

is about a hundred times slower than this:

     ('(' || string_agg(query::text, ')|(') || ')')::tsquery

That works perfectly so I'm happy to continue doing it, but I'm curious to
know why the difference is so great and if anything can be done about it?

string_agg's state transition function uses a buffer that's expanded as needed. At every step, the next string is appended to the buffer. Your custom aggregate is less efficient, because it constructs a new tsquery object at every step. In every step, a new tsquery object is allocated and the old result and the next source tsquery are copied to it. That's much more expensive.

If you're not shy of writing C code, you could write a more efficient version of tsquery_or_agg too, using a similar technique.

- Heikki


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Reply via email to