Actually, as I recently discovered, GROUP BY is faster than DISTINCT.  It's 
just due to how they are implemented, so don't go looking for any deep reason 
:)  The thread "GROUP BY vs DISTINCT" from 2006-12-20 discusses it.  DISTINCT 
sorts the results to find the unique rows, but GROUP BY uses a hash.

Brian

----- Original Message ----
From: Adam Rich <[EMAIL PROTECTED]>
To: tsuraan <[EMAIL PROTECTED]>; pgsql-performance 
<pgsql-performance@postgresql.org>
Sent: Wednesday, 10 January, 2007 7:32:50 AM
Subject: Re: [PERFORM] group by will not use an index?

Message

 



That 
query looks strange to me (a group by without an aggregate).  See if this 
is

any 
faster:

 

SELECT 
DISTINCT DATE(inserted) FROM Messages

 

I 
won't hold my breath though, I don't think there's any way around the full 
table 
scan

in 
Postgres, because the index does not contain enough information about 
transactional

state, 
so table access is always required (unlike virtually every other type of 
db)

 


  

  -----Original Message-----
From: 
  [EMAIL PROTECTED] 
  [mailto:[EMAIL PROTECTED] On Behalf Of 
  tsuraan
Sent: Tuesday, January 09, 2007 5:06 PM
To: 
  pgsql-performance
Subject: [PERFORM] group by will not use an 
  index?


I have a table of messages with paths and inserted 
  dates (among other things), like so:

CREATE TABLE Messages 
  (
    msgkey BIGSERIAL PRIMARY KEY,
    
  path TEXT NOT NULL,
    inserted TIMESTAMP WITHOUT TIMEZONE 
  DEFAULT NOW() 
);

I run a query to determine which days actually saw 
  emails come in, like so:

SELECT DATE(inserted) FROM Messages GROUP BY 
  DATE(inserted);

That's obviously not very efficient, so I made an 
  index: 

CREATE INDEX messages_date_inserted_ind ON 
  Messages(DATE(inserted));

However, GROUP BY does not use this 
  index:

=# explain analyze select date(inserted) from messages group by 
  date(inserted);
                                                          
  QUERY 
  PLAN                                                          
  
------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  
  (cost=104773.10..104789.51 rows=1313 width=8) (actual 
  time=31269.476..31269.557 rows=44 loops=1)
   ->  Seq 
  Scan on messages  (cost=0.00..101107.25 rows=1466340 width=8) (actual 
  time=23.923..25248.400 rows=1467036 loops=1)
 Total runtime: 31269.735 
  ms
(3 rows)


Is it possible to get pg to use an index in a group 
  by?  I don't see why it wouldn't be possible, but maybe I'm missing 
  something. 

Using pg 8.1.4...



Reply via email to