On 2015-05-29 10:55:44 +0200, Peter J. Holzer wrote:
> wdsah=> explain analyze select facttablename, columnname, term, concept_id, 
> t.hidden, language, register 
>         from term t where facttablename='facttable_stat_fta4' and 
> columnname='einheit' and exists (select 1 from facttable_stat_fta4 f where 
> f.einheit=t.term );
>                                                                               
>  QUERY PLAN                                                                   
>              
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>  Nested Loop Semi Join  (cost=0.00..384860.48 rows=1 width=81) (actual 
> time=0.061..0.119 rows=2 loops=1)
>    ->  Index Scan using term_facttablename_columnname_idx on term t  
> (cost=0.00..391.46 rows=636 width=81) (actual time=0.028..0.030 rows=3 
> loops=1)
>          Index Cond: (((facttablename)::text = 'facttable_stat_fta4'::text) 
> AND ((columnname)::text = 'einheit'::text))
>    ->  Index Scan using facttable_stat_fta4_einheit_idx on 
> facttable_stat_fta4 f  (cost=0.00..384457.80 rows=21788970 width=3) (actual 
> time=0.027..0.027 rows=1 loops=3)
>          Index Cond: ((einheit)::text = (t.term)::text)
>  Total runtime: 0.173 ms
> (6 rows)
> 
> 0.17 ms. Much faster than a plain select distinct over a table with 43
> million rows could ever hope to be. 
> 
> warenstrom is very similar and the columns with more distinct values
> aren't that bad either. 
> 
> But for column berechnungsart the result is bad:
> 
> wdsah=> explain analyze select facttablename, columnname, term, concept_id, 
> t.hidden, language, register 
>         from term t where facttablename='facttable_stat_fta4' and 
> columnname='berechnungsart' and exists (select 1 from facttable_stat_fta4 f 
> where f.berechnungsart=t.term );
>                                                                               
>            QUERY PLAN                                                         
>                                 
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>  Merge Semi Join  (cost=316864.57..319975.79 rows=1 width=81) (actual 
> time=7703.917..30948.271 rows=2 loops=1)
>    Merge Cond: ((t.term)::text = (f.berechnungsart)::text)
>    ->  Index Scan using term_term_idx on term t  (cost=0.00..319880.73 
> rows=636 width=81) (actual time=7703.809..7703.938 rows=3 loops=1)
>          Filter: (((facttablename)::text = 'facttable_stat_fta4'::text) AND 
> ((columnname)::text = 'berechnungsart'::text))
>    ->  Index Scan using facttable_stat_fta4_berechnungsart_idx on 
> facttable_stat_fta4 f  (cost=0.00..2545748.85 rows=43577940 width=2) (actual 
> time=0.089..16263.582 rows=21336180 loops=1)
>  Total runtime: 30948.648 ms
> (6 rows)
> 
> Over 30 seconds! That's almost 200'000 times slower. 

First I'd like to apologize for dropping out of the thread without
providing a test data set. I actually had one prepared (without
confidential data), but I wanted to make sure that I could reproduce the
problem with the test data, and I didn't get around to it for a week or
two and then I went on vacation ...

Anyway, in the meantime you released 9.5alpha (thanks for that, I
probably would have compiled a snapshot sooner or later, but installing
debian packages is just a lot more convenient - I hope you get a lot of
useful feedback) and I installed that this weekend. 

I am happy to report that the problem appears to be solved. All the
queries of this type I threw at the database finish in a few
milliseconds now.

        hp


-- 
   _  | Peter J. Holzer    | I want to forget all about both belts and
|_|_) |                    | suspenders; instead, I want to buy pants 
| |   | h...@hjp.at         | that actually fit.
__/   | http://www.hjp.at/ |   -- http://noncombatant.org/

Attachment: signature.asc
Description: Digital signature

Reply via email to