[PERFORM] poor execution plan because column dependence

Václav Ovsík Tue, 12 Apr 2011 16:31:19 -0700

Hi,
I have done migration of the Request Tracker 3.8.9
(http://requesttracker.wikia.com/wiki/HomePage) from Mysql to
PostgreSQL in testing environment.
The RT schema used can be viewed at
https://github.com/bestpractical/rt/blob/3.8-trunk/etc/schema.Pg.
I have added full text search on table Attachments based on trigrams
(and still experimenting with it), but is is not interesting for the
problem (the problem is not caused by it directly).
The full text search alone works quite good. A user testing a new RT instance
reported a poor performance problem with a bit more complex query (more
conditions resulting in table joins).
Queries are constructed by module DBIx::SearchBuilder.
The problematic query logged:


rt=# EXPLAIN ANALYZE SELECT DISTINCT  main.* FROM Tickets main JOIN 
Transactions Transactions_1  ON ( Transactions_1.ObjectId = main.id ) JOIN 
Attachments Attachments_2  ON ( Attachments_2.TransactionId = Transactions_1.id 
)  WHERE (Transactions_1.ObjectType = 'RT::Ticket') AND (main.Status != 
'deleted') AND (main.Status = 'resolved' AND main.LastUpdated > '2008-12-31 
23:00:00' AND main.Created > '2005-12-31 23:00:00' AND main.Queue = '15' AND  ( 
Attachments_2.trigrams @@ text_to_trgm_tsquery('uir') AND Attachments_2.Content 
ILIKE '%uir%' ) ) AND (main.Type = 'ticket') AND (main.EffectiveId = main.id)  
ORDER BY main.id ASC;
                                                                                
                                                                                
           QUERY PLAN                                                           
                                                                                
                                 
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=23928.60..23928.67 rows=1 width=162) (actual 
time=5201.139..5207.965 rows=649 loops=1)
   ->  Sort  (cost=23928.60..23928.61 rows=1 width=162) (actual 
time=5201.137..5201.983 rows=5280 loops=1)
         Sort Key: main.effectiveid, main.issuestatement, main.resolution, 
main.owner, main.subject, main.initialpriority, main.finalpriority, 
main.priority, main.timeestimated, main.timeworked, main.timeleft, main.told, 
main.starts, main.started, main.due, main.resolved, main.lastupdatedby, 
main.lastupdated, main.creator, main.created, main.disabled
         Sort Method:  quicksort  Memory: 1598kB
         ->  Nested Loop  (cost=0.00..23928.59 rows=1 width=162) (actual 
time=10.060..5120.834 rows=5280 loops=1)
               ->  Nested Loop  (cost=0.00..10222.38 rows=1734 width=166) 
(actual time=8.702..1328.970 rows=417711 loops=1)
                     ->  Seq Scan on tickets main  (cost=0.00..5687.88 rows=85 
width=162) (actual time=8.258..94.012 rows=25410 loops=1)
                           Filter: (((status)::text <> 'deleted'::text) AND 
(lastupdated > '2008-12-31 23:00:00'::timestamp without time zone) AND (created 
> '2005-12-31 23:00:00'::timestamp without time zone) AND (effectiveid = id) 
AND (queue = 15) AND ((type)::text = 'ticket'::text) AND ((status)::text = 
'resolved'::text))
                     ->  Index Scan using transactions1 on transactions 
transactions_1  (cost=0.00..53.01 rows=27 width=8) (actual time=0.030..0.039 
rows=16 loops=25410)
                           Index Cond: (((transactions_1.objecttype)::text = 
'RT::Ticket'::text) AND (transactions_1.objectid = main.effectiveid))
               ->  Index Scan using attachments2 on attachments attachments_2  
(cost=0.00..7.89 rows=1 width=4) (actual time=0.008..0.009 rows=0 loops=417711)
                     Index Cond: (attachments_2.transactionid = 
transactions_1.id)
                     Filter: ((attachments_2.trigrams @@ '''uir'''::tsquery) 
AND (attachments_2.content ~~* '%uir%'::text))
 Total runtime: 5208.149 ms
(14 rows)

The above times are for already cached data (repeated query).
I think the execution plan is poor. Better would be to filter table attachments
at first and then join the rest. The reason is a bad estimate on number of rows
returned from table tickets (85 estimated -> 25410 in the reality).
Eliminating sub-condition...


rt=# explain analyze select * from tickets where effectiveid = id;
                                                  QUERY PLAN                    
                              
--------------------------------------------------------------------------------------------------------------
 Seq Scan on tickets  (cost=0.00..4097.40 rows=530 width=162) (actual 
time=0.019..38.130 rows=101869 loops=1)
   Filter: (effectiveid = id)
 Total runtime: 54.318 ms
(3 rows)

Estimated 530 rows, but reality is 101869 rows.

The problem is the strong dependance between id and effectiveid. The RT
documentation says:

    EffectiveId:
    By default, a ticket's EffectiveId is the same as its ID. RT supports the
    ability to merge tickets together. When you merge a ticket into
    another one, RT sets the first ticket's EffectiveId to the second
    ticket's ID. RT uses this data to quickly look up which ticket
    you're really talking about when you reference a merged ticket.


I googled the page http://wiki.postgresql.org/wiki/Cross_Columns_Stats

Maybe I identified the already documented problem. What I can do with this
situation? Some workaround?

Thanks in advance for any suggestions.
Best Regards
-- 
Zito

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[PERFORM] poor execution plan because column dependence

Reply via email to