Re: [PERFORM] group by will not use an index?

2007-01-10 Thread tsuraan

For the reasons indicated (that is, MVCC), PG can not do a DISTINCT or the
equivalent
GROUP BY from index values alone.



Ok, that makes sense.  Thanks for the help everybody!

If this table is large, perhaps you could denormalize and maintain a

summary table with date (using truncation) and count, updated with
triggers on the original table. This table will presumably have a
small number of rows at the cost of doubling the times for updates,
inserts, and deletes.



Well, the inserted time, at least, is never updated, and deletions are very
rare (never, so far), so I'll have a look at doing things that way.  Thanks!


[PERFORM] group by will not use an index?

2007-01-09 Thread tsuraan

I have a table of messages with paths and inserted dates (among other
things), like so:

CREATE TABLE Messages (
   msgkey BIGSERIAL PRIMARY KEY,
   path TEXT NOT NULL,
   inserted TIMESTAMP WITHOUT TIMEZONE DEFAULT NOW()
);

I run a query to determine which days actually saw emails come in, like so:

SELECT DATE(inserted) FROM Messages GROUP BY DATE(inserted);

That's obviously not very efficient, so I made an index:

CREATE INDEX messages_date_inserted_ind ON Messages(DATE(inserted));

However, GROUP BY does not use this index:

=# explain analyze select date(inserted) from messages group by
date(inserted);
 QUERY
PLAN
--
HashAggregate  (cost=104773.10..104789.51 rows=1313 width=8) (actual time=
31269.476..31269.557 rows=44 loops=1)
  -  Seq Scan on messages  (cost=0.00..101107.25 rows=1466340 width=8)
(actual time=23.923..25248.400 rows=1467036 loops=1)
Total runtime: 31269.735 ms
(3 rows)


Is it possible to get pg to use an index in a group by?  I don't see why it
wouldn't be possible, but maybe I'm missing something.

Using pg 8.1.4...


Re: [PERFORM] group by will not use an index?

2007-01-09 Thread Adam Rich
That query looks strange to me (a group by without an aggregate).  See
if this is
any faster:
 
SELECT DISTINCT DATE(inserted) FROM Messages
 
I won't hold my breath though, I don't think there's any way around the
full table scan
in Postgres, because the index does not contain enough information about
transactional
state, so table access is always required (unlike virtually every other
type of db)
 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of tsuraan
Sent: Tuesday, January 09, 2007 5:06 PM
To: pgsql-performance
Subject: [PERFORM] group by will not use an index?


I have a table of messages with paths and inserted dates (among other
things), like so:

CREATE TABLE Messages (
msgkey BIGSERIAL PRIMARY KEY,
path TEXT NOT NULL,
inserted TIMESTAMP WITHOUT TIMEZONE DEFAULT NOW() 
);

I run a query to determine which days actually saw emails come in, like
so:

SELECT DATE(inserted) FROM Messages GROUP BY DATE(inserted);

That's obviously not very efficient, so I made an index: 

CREATE INDEX messages_date_inserted_ind ON Messages(DATE(inserted));

However, GROUP BY does not use this index:

=# explain analyze select date(inserted) from messages group by
date(inserted);
  QUERY PLAN


--
 HashAggregate  (cost=104773.10..104789.51 rows=1313 width=8) (actual
time=31269.476..31269.557 rows=44 loops=1)
   -  Seq Scan on messages  (cost=0.00..101107.25 rows=1466340 width=8)
(actual time=23.923..25248.400 rows=1467036 loops=1)
 Total runtime: 31269.735 ms
(3 rows)


Is it possible to get pg to use an index in a group by?  I don't see why
it wouldn't be possible, but maybe I'm missing something. 

Using pg 8.1.4...




Re: [PERFORM] group by will not use an index?

2007-01-09 Thread Scott Marlowe
On Tue, 2007-01-09 at 17:05, tsuraan wrote:
 I have a table of messages with paths and inserted dates (among other
 things), like so:
 
 CREATE TABLE Messages (
 msgkey BIGSERIAL PRIMARY KEY,
 path TEXT NOT NULL,
 inserted TIMESTAMP WITHOUT TIMEZONE DEFAULT NOW() 
 );
 
 I run a query to determine which days actually saw emails come in,
 like so:
 
 SELECT DATE(inserted) FROM Messages GROUP BY DATE(inserted);

You're probably under the mistaken impression that PostgreSQL and can
retrieve all the data it needs from the index alone.  It can't.  Anytime
postgresql gets an index reference, it has to then visit the actual
table file to grab the individual entry.  That's because indexes don't
store mvcc visibility information, and due to the problems locking both
indexes and tables together would present, probably won't any time soon.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] group by will not use an index?

2007-01-09 Thread Andrew Lazarus
For the reasons indicated (that is, MVCC), PG can not do a DISTINCT or the 
equivalent
GROUP BY from index values alone.

If this table is large, perhaps you could denormalize and maintain a
summary table with date (using truncation) and count, updated with
triggers on the original table. This table will presumably have a
small number of rows at the cost of doubling the times for updates,
inserts, and deletes.



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly