Re: [PERFORM] Best way to delete unreferenced rows?

Craig James Mon, 11 Jun 2007 08:43:31 -0700

Tyrrill, Ed wrote:

QUERY PLAN


------------------------------------------------------------------------
------------------------------------------------------------------------
-------------------
 Merge Left Join  (cost=38725295.93..42505394.70 rows=13799645 width=8)
(actual time=6503583.342..8220629.311 rows=93524 loops=1)
   Merge Cond: ("outer".record_id = "inner".record_id)
   Filter: ("inner".record_id IS NULL)
   ->  Index Scan using backupobjects_pkey on backupobjects
(cost=0.00..521525.10 rows=13799645 width=8) (actual
time=15.955..357813.621 rows=13799645 loops=1)
   ->  Sort  (cost=38725295.93..39262641.69 rows=214938304 width=8)
(actual time=6503265.293..7713657.750 rows=214938308 loops=1)
         Sort Key: backup_location.record_id
         ->  Seq Scan on backup_location  (cost=0.00..3311212.04
rows=214938304 width=8) (actual time=11.175..1881179.825 rows=214938308
loops=1)
 Total runtime: 8229178.269 ms
(8 rows)

I ran vacuum analyze after the last time any inserts, deletes, or
updates were done, and before I ran the query above.  I've attached my
postgresql.conf.  The machine has 4 GB of RAM.


I thought maybe someone with more expertise than me might answer this, but 
since they haven't I'll just make a comment.  It looks to me like the sort of 
214 million rows is what's killing you.  I suppose you could try to increase 
the sort memory, but that's a lot of memory.  It seems to me an index merge of 
a relation this large would be faster, but that's a topic for the experts.

On a theoretical level, the problem is that it's sorting the largest table.  
Perhaps you could re-cast the query so that it only has to sort the smaller 
table, something like

  select a.id from a where a.id not in (select distinct b.id from b)

where "b" is the smaller table.  There's still no guarantee that it won't do a sort on 
"a", though.  In fact one of the clever things about Postgres is that it can convert a query like 
the one above into a regular join, unless you do something like "select ... offset 0" which blocks 
the optimizer from doing the rearrangement.

But I think the first approach is to try to tune for a better plan using your 
original query.

Craig

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] Best way to delete unreferenced rows?

Reply via email to