>>>>> "Tom" == Tom Lane <t...@sss.pgh.pa.us> writes:
Tom> Nyet ... at least not without you actually making that argument, Tom> with numbers, rather than just handwaving. We use 100 for plpgsql Tom> and suchlike functions. I'd be OK with making it 10 just on Tom> general principles, but claiming that it's as expensive as a Tom> plpgsql function requires evidence. [TL/DR: 10 isn't enough, even 100 may be too low] On a text corpus consisting of ~18 thousand blog comments + ~5% of dead rows, median length 302 bytes, only about 3% long enough to be toasted, and selecting a common word (~22% of the table): explain analyze select * from comments where to_tsvector('english',message) @@ '''one'''::tsquery; QUERY PLAN ---------------------------------------------------------------------------------------------------------------- Seq Scan on comments (cost=0.00..2406.18 rows=4140 width=792) (actual time=0.601..3946.589 rows=4056 loops=1) Filter: (to_tsvector('english'::regconfig, message) @@ '''one'''::tsquery) Rows Removed by Filter: 14310 Planning time: 0.270 ms Execution time: 3954.745 ms (5 rows) QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------- Bitmap Heap Scan on comments (cost=204.09..2404.30 rows=4140 width=792) (actual time=2.401..11.564 rows=4056 loops=1) Recheck Cond: (to_tsvector('english'::regconfig, message) @@ '''one'''::tsquery) Heap Blocks: exact=1911 -> Bitmap Index Scan on comments_to_tsvector_idx (cost=0.00..203.05 rows=4140 width=0) (actual time=1.974..1.974 rows=4313 loops=1) Index Cond: (to_tsvector('english'::regconfig, message) @@ '''one'''::tsquery) Planning time: 0.278 ms Execution time: 17.640 ms (7 rows) (strangely, the seqscan plan is picked despite having a cost more than a point higher? what's up with that?) So for two plans with virtually identical cost, we have an execution time difference on the order of 200x. We can rule out the performance of the @@ by using a precalculated tsvector: explain analyze select * from comments where tsv @@ '''one'''::tsquery; QUERY PLAN -------------------------------------------------------------------------------------------------------------- Seq Scan on comments (cost=0.00..2359.31 rows=4140 width=792) (actual time=0.023..47.746 rows=4056 loops=1) Filter: (tsv @@ '''one'''::tsquery) Rows Removed by Filter: 14310 Planning time: 0.262 ms Execution time: 54.220 ms (5 rows) So we're looking at an execution time for to_tsvector on the order of 200us, which is a seriously big deal when looking at a potential seqscan. That's not just _as_ expensive as a plpgsql function, but more than 50 times as expensive as a simple one like this: create function f1(text) returns integer language plpgsql as $f$ begin return length($1); end; $f$; select sum(length(message)) from comments; -- 89ms select sum(f1(message)) from comments; -- 155ms 66ms difference divided by 18366 rows = 3.6us per call Now, obviously the default cost for plpgsql functions is assuming that the function is a whole lot more complex than that, so one wouldn't argue that to_tsvector should cost 5000. But there's a strong case for arguing that it should cost a whole lot more than 100, because even at that value the relative costs for the first two plans in this post only differ by 2x, compared to a 200x runtime difference. A value of 10 would be inadequate in many cases; in this example it leaves the slower plan with a cost only ~15% higher, which is way too close to be comfortable. (As another example, a function with a simple query in it, such as obj_description, can have runtimes on the order of 40us, still 5x faster than to_tsvector.) -- Andrew (irc:RhodiumToad) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers