...vastly overestimate the number of pages .. because postgresql's guess at the correlation being practically 0 despite the fact that the distinct values for any given column are closely packed on a few pages.
I think we need a serious statistics jock to pipe up with some standard
metrics that do what we need. Otherwise we'll never have a solid footing for
the predictions we make and will never know how much we can trust them.

Do we know if any such people participate/lurk on this list, or
if the conversation should go elsewhere?
I lurk... I don't know if I'm a 'statistics jock', but I may be valuable if only I had a better understanding of how the optimizer works. I have been following this thread with interest, but could really do with a good pointer to background information beyond what I have read in the main postgres manual. Does such information exist, and if so, where ?

