[PERFORM] any hope for my big query?

Ben Thu, 28 Sep 2006 15:19:29 -0700

Hey guys, I've got a query that is inherently expensive, because it has todo some joins against some large tables. But it's currently *very*expensive (at least for a web app), and I've been struggling in vain allday to knock the cost down. Annoyingly, the least costly version I've comeup with remains my first attempt, and is the most straight-forward:


explain select
        distinct public.album.id
from
        public.album,public.albumjoin,public.track,umdb.node
where
        node.dir=2811
        and albumjoin.album = public.album.id
        and public.albumjoin.track = public.track.id
        and levenshtein(substring(public.track.name for 75),
                substring(node.file for 75)) <= 10
        and public.album.id in

(select album from albumjoin group by album having count(*) between 15 and 25)group by public.album.id

having count(*) >= 5;



 Unique  (cost=991430.53..1013711.74 rows=425772 width=4)
   ->  GroupAggregate  (cost=991430.53..1012647.31 rows=425772 width=4)
         Filter: (count(*) >= 5)
         ->  Sort  (cost=991430.53..996373.93 rows=1977360 width=4)
               Sort Key: album.id
               ->  Nested Loop  (cost=513549.06..737866.68 rows=1977360 width=4)
                     Join Filter: (levenshtein("substring"(("inner".name)::text, 1, 75), 
"substring"("outer".file, 1, 75)) <= 10)
                     ->  Index Scan using node_dir on node  (cost=0.00..3.22 
rows=16 width=40)
                           Index Cond: (dir = 2811)
                     ->  Materialize  (cost=513549.06..520153.61 rows=370755 
width=25)
                           ->  Hash Join  (cost=271464.72..510281.31 
rows=370755 width=25)
                                 Hash Cond: ("outer".id = "inner".track)
                                 ->  Seq Scan on track  (cost=0.00..127872.69 
rows=5111469 width=25)
                                 ->  Hash  (cost=268726.83..268726.83 
rows=370755 width=8)
                                       ->  Hash Join  
(cost=150840.51..268726.83 rows=370755 width=8)
                                             Hash Cond: ("outer".album = 
"inner".id)
                                             ->  Seq Scan on albumjoin  
(cost=0.00..88642.18 rows=5107318 width=8)
                                             ->  Hash  
(cost=150763.24..150763.24 rows=30908 width=8)
                                                   ->  Hash Join  
(cost=127951.57..150763.24 rows=30908 width=8)
                                                         Hash Cond: ("outer".id = 
"inner".album)
                                                         ->  Seq Scan on album  
(cost=0.00..12922.72 rows=425772 width=4)
                                                         ->  Hash  
(cost=127874.30..127874.30 rows=30908 width=4)
                                                               ->  
HashAggregate  (cost=126947.06..127565.22 rows=30908 width=4)
                                                                     Filter: 
((count(*) >= 15) AND (count(*) <= 25))
                                                                     ->  Seq 
Scan on albumjoin  (cost=0.00..88642.18 rows=5107318 width=4)

I've tried adding a length(public.track.name) index and filteringpublic.track to those rows where length(name) is within a few charactersof node.file, but that actually makes the plan more expensive.

Is there any hope to make things much cheaper? Unfortunately, I can'tfilter out anything from the album or albumjoin tables.


---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
      subscribe-nomail command to [EMAIL PROTECTED] so that your
      message can get through to the mailing list cleanly

[PERFORM] any hope for my big query?

Reply via email to