Re: [PERFORM] inconsistent/weird index usage

John Meinel Fri, 01 Oct 2004 06:54:01 -0700

Dustin Sallings wrote:


[...]

OK, now that that's over with, I have this one particular query that I attempt to run for a report from my phone that no longer works because it tries to do a table scan on *some* of the tables. Why it chooses this table scan, I can't imagine. The query is as follows:
select
        s.serial as serial_num,
        s.name as name,
        date(ts) as day,
        min(sample) as min_temp,
        avg(sample) as avg_temp,
        stddev(sample) as stddev_temp,
        max(sample) as max_temp
    from
        samples inner join sensors s using (sensor_id)
    where
        ts > current_date - 7
    group by
        serial_num, name, day
    order by
        serial_num, day desc


[ next section heavily clipped for clarity ]

->  Seq Scan on samples_1999  (cost rows=103677) (actual rows=0 loops=1)

-> Index Scan using samples_2000_bytime on samples_2000 (cost rows=714082 (actual rows=0 loops=1)


->  Seq Scan on samples_2001  (cost rows=902191) (actual rows=0 loops=1)

->  Seq Scan on samples_2002  (cost rows=1037201) (actual rows=0 loops=1)

-> Index Scan using samples_2003_bytime on samples_2003 (cost rows=1049772) (actual rows=0 loops=1)

-> Index Scan using samples_2004_bytime on samples_2004 (cost rows=791991) (actual rows=66389 loops=1)

[...]

Essentially, what you can see here is that it's doing an index scan on samples_2000, samples_2003, and samples_2004, but a sequential scan on samples_1999, samples_2001, and samples_2002. It's very strange to me that it would make these choices. If I disable sequential scans altogether for this session, the query runs in under 4 seconds.

This is a very cool solution for long-term storage, and isn't terribly hard to manage. I actually have other report queries that seem to be making pretty good index selection currently...but I always want more! :) Does anyone have any suggestions as to how to get this to do what I want?
    Of course, ideally, it would ignore five of the tables altogether.  :)
--
SPY                      My girlfriend asked me which one I like better.
pub  1024/3CAE01D5 1994/11/03 Dustin Sallings <[EMAIL PROTECTED]>
|    Key fingerprint =  87 02 57 08 02 D0 DA D6  C8 0F 3E 65 51 98 D8 BE
L_______________________ I hope the answer won't upset her. ____________

Just as a heads up. You have run vacuum analyze before running this query, correct?

Because you'll notice that the query planner is thinking that it will have 103677 rows from 1999, 700,000 rows from 2000, 900,000 rows from 2001, etc, etc. Obviously the query planner is not planning well considering it there are only 60,000 rows from 2004, and no rows from anything else.

It just seems like it hasn't updated it's statistics to be aware of when the time is on most of the tables.

(By the way, an indexed scan returning 0 entries is *really* fast, so I wouldn't worry about ignoring the extra tables. :)

I suppose the other question is whether this is a prepared or stored query. Because sometimes the query planner cannot do enough optimization in a stored query. (I ran into this problem where I had 1 column with 500,000+ entries referencing 1 number. If I ran manually, the time was much better because I wasn't using *that* number. With a stored query, it had to take into account that I *might* use that number, and didn't want to do 500,000+ indexed lookups)

The only other thing I can think of is that there might be some collision between datetime and date. Like it is thinking it is looking at the time of day when it plans the queries (hence why so many rows), but really it is looking at the date. Perhaps a cast is in order to make it work right. I don't really know.

Interesting problem, though.
John
=:->

signature.asc
Description: OpenPGP digital signature

Re: [PERFORM] inconsistent/weird index usage

Reply via email to