Re: [PERFORM] Querying distinct values from a large table

Igor Lobanov Tue, 30 Jan 2007 01:33:38 -0800


Richard Huxton wrote:

I have rather large table with about 5 millions of rows and a dozen ofcolumns. Let's suppose that columns are named 'a', 'b', 'c' etc. Ineed to query distinct pairs of ('a';'b') from this table.

What version of PostgreSQL is it?


8.1.4

How many distinct values are you getting back from your 5 million rows?If there are too many, an index isn't going to help.


No more than 10,000.

Can you share the EXPLAIN ANALYSE output? You might want to tryincreasing work_mem for this one query to speed any sorting.


Real table and colum names are obfuscated because of NDA, sorry.

explain analyze select distinct a, b from tbl

EXPLAIN ANALYZE output is:

Unique (cost=500327.32..525646.88 rows=1848 width=6) (actualtime=52719.868..56126.356 rows=5390 loops=1)-> Sort (cost=500327.32..508767.17 rows=3375941 width=6) (actualtime=52719.865..54919.989 rows=3378864 loops=1)

         Sort Key: a, b

-> Seq Scan on tbl (cost=0.00..101216.41 rows=3375941width=6) (actual time=16.643..20652.610 rows=3378864 loops=1)

 Total runtime: 57307.394 ms

How often is the table updated? Clustering might buy you someimprovements (but not a huge amount I suspect).


It is updated once per 3-5 seconds.

And one more thing. I don't know if it helps, but column 'a' can havevalue from a limited set: 0, 1 or 2. Column 'b' is also an integer(foreign key, actually).


--
Igor Lobanov
Internal Development Engineer
SWsoft, Inc.


---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
      subscribe-nomail command to [EMAIL PROTECTED] so that your
      message can get through to the mailing list cleanly

Re: [PERFORM] Querying distinct values from a large table

Reply via email to