> >> I have an experimental patch lying around somewhere that tries to > >> work around these problems by offering different estimation methods > >> for index scans. If you are interested, I'll dig it out. > > > >Sure, I'll take a gander... had my head in enough Knuth recently to > >even hopefully have some kind of a useful response to the patch. > > Sean, the patch is at http://www.pivot.at/pg/16-correlation-732.diff. > A short description of its usage can be found at > http://archives.postgresql.org/pgsql-performance/2002-11/msg00256.php. > If you are interested how the different interpolation methods work, > read the source - it shouldn't be too hard to find. > > You might also want to read the thread starting at > http://archives.postgresql.org/pgsql-hackers/2002-10/msg00072.php.
Hrm... let me bop back in my archives and reply there... very interesting work though. I hope a reasonable algorythm can be found in time for 7.5, or even 7.4 as this seems to be biting many people and the current algo is clearly not right. > >> does this mean that sensorid, evtime, and action are not > >> independent? > > > >Hrm... sensorid is sequence and grows proportional with evtime, > >obviously. > > So a *low* sensorid (7) is quite uncommon for a *late* evtime? This > would help understand the problem. Unfortunately I have no clue what > to do about it. :-( Correct. > >Having spent a fair amount of time looking at the two following plans, > >it seems as though an additional statistic is needed to change the > >cost of doing an index lookup when the index is linearly ordered. > > I'm not sure I understand what you mean by "index is linearly > ordered", but I guess correlation is that statistic you are talking > about. However, it is calculated per column, not per index. If two rows are id's 123456 and 123457, what are the odds that the tuples are going to be on the same page? ie, if 123456 is read, is 123457 already in the OS or PostgreSQL's disk cache? > >Whether CLUSTER does this or not, I don't know, > > If you CLUSTER on an index and then ANALYSE, you get a correlation of > 1.0 (== optimum) for the first column of the index. Correlating of what to what? Of data to nearby data? Of data to related data (ie, multi-column index?)? Of related data to pages on disk? Not 100% sure in what context you're using the word correlation... But that value will degrade after time and at what rate? Does ANALYZE maintain that value so that it's kept acurrate? The ANALYZE page was lacking in terms of implementation details in terms of how many rows ANALYZE actually scans on big tables, which could dramatically affect the correlation of a table after time if ANALYZE is maintaining the correlation for a column. > > I never heard back from him after getting the runtime down to a > > few ms. :-/ > > Pity! I'd have liked to see EXPLAIN ANALYSE for > > SELECT * > FROM mss_fwevent > WHERE sensorid = 7 > AND evtime > (now() - '6 hours'::INTERVAL) > AND NOT action; > > SELECT * > FROM mss_fwevent > WHERE sensorid = 7 > AND evtime > (now() - '6 hours'::INTERVAL); > > SELECT * > FROM mss_fwevent > WHERE evtime > (now() - '6 hours'::INTERVAL); > > SELECT * > FROM mss_fwevent > WHERE sensorid = 7; ditto > > Are indexes > >on linearly ordered data rebalanced somehow? I thought CLUSTER only > >reordered data on disk. -sc > > AFAIK CLUSTER re-creates all indices belonging to the table. As of 7.3 or 7.4, yes. -sc -- Sean Chittenden ---------------------------(end of broadcast)--------------------------- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match