On Mon, 2007-03-19 at 21:24 -0400, Tom Lane wrote: > Jeff Davis <[EMAIL PROTECTED]> writes: > > We can already keep stats for a functional index. Is there a reason we > > can't keep stats for a multi-column index? > > The questions that need to be answered are (1) what stats are you gonna > collect, and (2) exactly what are you going to do with them when you > have 'em? > > All the previous discussions have stalled on the question of how to > avoid trying to collect stats about an exponentially large number of > column combinations; we've never even reached the question of what > stats we'd actually want given that a particular combination has been > determined to be interesting. Perhaps that's a trivial question, > but it's been a mighty long time since I took statistics ... >
I know we can't keep stats on every combination of columns. My initial idea would be to only keep stats about a multi-column index (and probably optional for those, too). My thinking was that we could keep a histogram (and MCVs, etc.) of the non-scalar key in the multi-column index. That would provide the data the planner needs to answer a query like "WHERE a = 1 and b < 1000" if a and b are dependent and you have an index on (a,b). It seemed within reach to me initially because I could use a functional index (in which the function turns multiple values into a comparable scalar) and postgresql would index that and keep stats. And when it has those stats, it makes the correct plan. Of course, I have to litter the SQL with unnecessary function calls (so that it can use the functional index), which makes this undesirable. AndrewSN pointed out on IRC that keeping a histogram of non-scalar values is not as easy as I thought, because PostgreSQL doesn't allow arrays of composite types, among other problems. Is this a worthwhile area of exploration? Regards, Jeff Davis ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq