On Thu, Oct 21, 2021 at 2:13 PM Greg Stark wrote:
> The problem I'm finding is that the distribution of these small
> subsets can swing quickly. And understanding intercolumn correlations
> even if we could do it perfectly would be no help at all.
>
> Consider a table with millions of rows that ar
On Fri, Oct 22, 2021 at 10:13 AM Greg Stark wrote:
> Obviously this could get complex quickly. Perhaps it should be
> something users could declare. Some kind of "partitioned statistics"
> where you declare a where clause and we generate statistics for the
> table where that where clause is true.
One problem I've seen in multiple databases and is when a table has a
mixture of data sets within it. E.g. A queue table where 99% of the
entries are "done" but most queries are working with the 1% that are
"new" or in other states. Often the statistics are skewed by the
"done" entries and give bad