So, building the partial index will avoid the table lookup.
Currently answerselectindex only has single-column indexes on memberid and 
answerid, so any query with a predicate on both columns is gonna be forced to 
do an index lookup on one column followed by a table lookup to get the other 
one (which is what the plan shows).
This will be slower than if you can get it to lookup only an index.
I suggested a partial index (and not a two-column index) to keep it small, and 
to reduce the likelihood that it will screw up another query.
Anyway - good luck man.
?


From: Aaron Burnett [mailto:aburn...@bzzagent.com]
Sent: Monday, August 16, 2010 7:20 PM
To: Mark Rostron; pgsql-performance@postgresql.org
Subject: RE: Very poor performance



Thanks Mark,

Yeah, I apologize, I forgot to mention a couple of things.

m.id is the primary key but the biggest problem is that the query loops 626410 
times because at one time people were allowed to delete member.id rows which 
now will break the application if the a.memberid comes out and it doesn't exist 
in the member table.

The version you sent me yields pretty much the same results.

All I really SHOULD have to do is query the a.memberid column to get distinct 
memberid and the query takes less than 2 seconds. The join to the member table 
and subsequnt 600K loops are the killer. The answerselectinstance table has 166 
million rows... so the math is pretty easy on why it's painfully slow.

Other than delting data in the answerselectinstance table to get rid of the 
orphan memberid's I was hoping someone had a better way to do this.


-----Original Message-----
From: Mark Rostron [mailto:mrost...@ql2.com]
Sent: Mon 8/16/2010 9:51 PM
To: Aaron Burnett; pgsql-performance@postgresql.org
Subject: RE: Very poor performance

This is weird - is there a particular combination of memberid/answered in 
answerselectindex that has a very high rowcount?

First change I would suggest looking into would be to try changing sub-query 
logic to check existence and limit the result set of the sub-query to a single 
row

Select distinct(m.id)
>From member m
Where exists (
      Select 1
      From answerselectinstance a
      Where a.member_id = m.id
      And a.answerid between 127443 and 127448
      Limit 1
)


If member.id is a primary key, you can eliminate the "distinct" i.e. the sort.


Second would be to build a partial index on answersselectindex to index only 
the memberid's you are interested in:

"Create index <new_index_name> on answersselectindex(memberid) where answerid 
between 127443 and 127448"



Mr



From: 
pgsql-performance-ow...@postgresql.org<mailto:pgsql-performance-ow...@postgresql.org>
 [mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Aaron Burnett
Sent: Monday, August 16, 2010 6:07 PM
To: pgsql-performance@postgresql.org<mailto:pgsql-performance@postgresql.org>
Subject: [PERFORM] Very poor performance



Hi,

I'm hoping someone can offer some help here. The query and explain analyze and 
table layout are below and attached in a text file if the formatting is bad.

The query is part of a bigger query that our front end runs. This is the part 
that takes forever (84 minutes in this case) to finish and more often than not 
the front end times out. The table (answerselectinstance) has 168664317 rows 
while the member table has 626435 rows.

Postgres Version 8.25
CentOs 5.2
16 Gig RAM
192MB work_mem  (increasing to 400MB didn't change the outcome)
very light use on this server, it ais a slave to a slony replicated 
master/slave setup.

Again, apologies if the formatting got munged, the attached text file has the 
same info.

Thanking you in advance for any help and suggestions.

Aaron

explain analyze select distinct(id) from member  where id in (select memberid 
from answerselectinstance where   nswerid = 127443  OR  answerid = 127444  OR  
answerid = 127445  OR  answerid = 127446  OR  answerid = 127447  OR  answerid = 
127448   ) ;

LOG:  duration: 5076038.709 ms  statement: explain analyze select distinct(id) 
from member  where id in (select memberid from answerselectinstance where   
answerid = 127443  OR  answerid = 127444  OR  answerid = 127445  OR  answerid = 
127446  OR  answerid = 127447  OR  answerid = 127448   ) ;
                                                                              
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=101357.24..101357.28 rows=9 width=4) (actual 
time=5075511.974..5075911.077 rows=143520 loops=1)
   ->  Sort  (cost=101357.24..101357.26 rows=9 width=4) (actual 
time=5075511.971..5075644.323 rows=143520 loops=1)
         Sort Key: member.id
         ->  Nested Loop IN Join  (cost=0.00..101357.10 rows=9 width=4) (actual 
time=19.867..5075122.724 rows=143520 loops=1)
               ->  Seq Scan on member  (cost=0.00..78157.65 rows=626265 
width=4) (actual time=3.338..2003.582 rows=626410 loops=1)
               ->  Index Scan using asi_memberid_idx on answerselectinstance  
(cost=0.00..444.46 rows=9 width=4) (actual time=8.096..8.096 rows=0 
loops=626410)
                     Index Cond: (member.id = answerselectinstance.memberid)
                     Filter: ((answerid = 127443) OR (answerid = 127444) OR 
(answerid = 127445) OR (answerid = 127446) OR (answerid = 127447) OR (answerid 
= 127448))
 Total runtime: 5076034.203 ms
(9 rows)

     Column     |            Type             |                         
Modifiers
----------------+-----------------------------+------------------------------------------------------------
 memberid       | integer                     | not null
 answerid       | integer                     | not null
 taskinstanceid | integer                     | not null default 0
 created        | timestamp without time zone | default "timestamp"('now'::text)
 id             | integer                     | not null default 
nextval(('"asi_id_seq"'::text)::regclass)
Indexes:
    "asi_pkey" PRIMARY KEY, btree (id)
    "asi_answerid_idx" btree (answerid)
    "asi_memberid_idx" btree (memberid)
    "asi_taskinstanceid_idx" btree (taskinstanceid)
Triggers:
    _bzzprod_cluster_denyaccess_301 BEFORE INSERT OR DELETE OR UPDATE ON 
answerselectinstance FOR EACH ROW EXECUTE PROCEDURE 
_bzzprod_cluster.denyaccess('_bzzprod_cluster')

Reply via email to