On Mon, Apr 7, 2008 at 1:42 PM, Alban Hertroys
<[EMAIL PROTECTED]> wrote:
>
> Have you tried this query with enable_seqscan=off? If my guess is right
> (and the planners, in that case) it'd be even slower.
set enable_seqscan=off;
explain select distinct datestamp from vals;
QUERY PLAN
----------------------------------------------------------------------------------------------
Unique (cost=115003047.47..115380004.83 rows=4263 width=4)
-> Sort (cost=115003047.47..115191526.15 rows=75391472 width=4)
Sort Key: datestamp
-> Seq Scan on vals (cost=100000000.00..101531261.72
rows=75391472 width=4)
It appears to be doing a sequential scan regardless of the set, as if
it doesn't believe it can use the index for some reason
>
> Something that might help you, but I'm not sure whether it might hurt the
> performance of other queries, is to cluster that table on val_datestamp_idx.
> That way the records are already (mostly) sorted on disk in the order of the
> datestamps, which seems to be the brunt of above query plan.
That's a good thought. I'll give that a try this evening when the DB
has some downtime and see what happens.
>
> There seems to be quite a bit of overlap in your index definitions. From my
> experience this can confuse the planner.
>
> I suggest you combine them, but not knowing your data... Maybe rewriting
> your UNIQUE constraint to (val_dur, datestamp, eid, sid) would be enough to
> replace all those other indexes.
> If not, it's probably better to have one index per column, so that the
> planner is free to combine them as it sees fit. That'd result in a bitmap
> index scan, btw.
I can take a look at the other indices again, but those are all in
place for specific other queries that generally involve some set of
a=1, b=2, c=3, datestamp>5 type of where-clause and were created
specifically in response to sequential scans showing up in other
queries (and had the proper effect of fixing them!)
>
> I'm not a postgresql tuning expert (I know my way around though), other
> people can explain you way better than I can. Bruce Momjian for example:
> http://www.linuxjournal.com/article/4791
I'll take a look at that, thanks.
> That calculation doesn't look familiar to me, I'm more used to:
> select pg_size_pretty(pg_relation_size('...'));
>
> You can put the name of any relation in there, be it tables, indexes, etc.
>
> 11GB is pretty large for an index on an integer column, especially with
> only 75M rows: that's 146 bytes/row in your index. Maybe your index got
> bloated somehow? I think it should be about a tenth of that.
pg_total_relation_size('..') gives the number of bytes for the table +
all associated indices; pg_relation_size('..') gives for just the
table. The difference between the two should be total bytes take up by
the 5 total indices (11 total index cols), giving a
back-of-the-envelope estimation of 1gb for the size of the datestamp
index. I am fairly certain that I didn't give pg 1gb to fit the index
in memory, so I'll try upping its total available memory tonight and
see if that doesn't improve things.
I appreciate the responses so far! I'm used to several minutes for
some of the complex queries on this DB, but 12.5 minutes for a select
distinct just seems wrong. :)
--
- David T. Wilson
Princeton Satellite Systems
[EMAIL PROTECTED]
--
Sent via pgsql-general mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general