On Sat, Dec 2, 2017 at 8:04 PM, Justin Pryzby <pry...@telsasoft.com> wrote:

> On Sat, Dec 02, 2017 at 05:27:51PM -0800, Jeff Janes wrote:
> > I think the non-extended stats code also has trouble with booleans.
> > pg_stats gives me a correlation  of 0.8 or higher for the flag column.
> It's not due to the boolean though; you see the same thing if you do:
> CREATE INDEX aaa_f ON aaa((flag::text));
> ANALYZE aaa;
> correlation | 0.81193
> or:
> ALTER TABLE aaa ADD flag2 int; UPDATE aaa SET flag2= flag::int
> correlation | 0.81193
> I think it's caused by having so few (2) values to correlate.
> most_common_vals       | {f,t}
> most_common_freqs      | {0.9014,0.0986}
> correlation            | 0.822792
> It thinks there's somewhat-high correlation since it gets a list of x and y
> values (integer positions by logical and physical sort order) and 90% of
> the x
> list (logical value) are the same value ('t'), and the CTIDs are in order
> on
> the new index, so 90% of the values are 100% correlated.

But there is no index involved (except in the case of the functional
index).  The correlation of table columns to physical order of the table
doesn't depend on the existence of an index, or the physical order within
an index.

But I do see that ties within the logical order of the column values are
broken to agree with the physical order.  That is wrong, right?  Is there
any argument that this is desirable?

It looks like it could be fixed with a few extra double calcs per distinct
value.  Considering we already sorted the sample values using SQL-callable
collation dependent comparators, I doubt a few C-level double calcs is
going to be meaningful.



Reply via email to