On 03/22/2016 09:13 AM, Tatsuo Ishii wrote:
Do you have any other missing parts in this work? I am asking
because I wonder if you want to push this into 9.6 or rather 9.7.

I think the first few parts of the patch series, namely:

  * shared infrastructure (0002)
  * functional dependencies (0003)
  * MCV lists (0004)
  * histograms (0005)

might make it into 9.6. I believe the code for building and storing
the different kinds of stats is reasonably solid. What probably needs
more thorough review are the changes in clauselist_selectivity(), but
the code in these parts is reasonably simple as it only supports using
a single multi-variate statistics per relation.

The part (0006) that allows using multiple statistics (i.e. selects
which of the available stats to use and in what order) is probably the
most complex part of the whole patch, and I myself do have some
questions about some aspects of it. I don't think this part might get
into 9.6 at this point (although it'd be nice if we managed to do

Hum. So without 0006 or beyond, there's not much benefit for the
PostgreSQL users, and you are not too confident about 0006 or
beyond. Then I would think it is a little bit hard to justify in
putting 000[2-5] into 9.6. I really like this feature and would like
to see in PostgreSQL someday, but I'm not sure if we should put the
patches (0002-0005) into PostgreSQL now. Please let me know if there's
some reaons we should put the patches into PostgreSQL now.

I don't think so. While being able to combine multiple statistics is certainly useful, I'm convinced that the initial patched add enough value on their own, even if the 0006 patch gets committed later.

A lot of queries will be just fine with the "single multivariate statistics" limitation, either because it's using less than 8 columns, or because only 8 columns are actually correlated. (FWIW the 8 column limit is mostly arbitrary, it may get increased if needed.)

I haven't really mentioned the aspects of 0006 that I think need more discussion, but it's mostly about the question whether combining the statistics by using the overlapping clauses as "conditions" is the right thing to do (or whether a more expensive approach is needed). None of that however invalidates the preceding patches.


