Most importantly, it looks like this forgets to update catalog documentation for stxexprs and stxkind='e'
It seems like you're preferring to use pluralized "statistics" in a lot of places that sound wrong to me. For example: > Currently the first statistics wins, which seems silly. I can write more separately, but I think this is resolved and clarified if you write "statistics object" and not just "statistics". > + Name of schema containing table I don't know about the nearby descriptions, but this one sounds too much like a "schema-containing" table. Say "Name of the schema which contains the table" ? > + Name of table Say "name of table on which the extended statistics are defined" > + Name of extended statistics "Name of the extended statistic object" > + Owner of the extended statistics ..object > + Expression the extended statistics is defined on I think it should say "the extended statistic", or "the extended statistics object". Maybe "..on which the extended statistic is defined" > + of random access to the disk. (This expression is null if the > expression > + data type does not have a <literal><</literal> operator.) expression's data type > + much-too-small row count estimate in the first two queries. Moreover, the maybe say "dramatically underestimates the rowcount" > + planner has no information about relationship between the expressions, so > it the relationship > + assumes the two <literal>WHERE</literal> and <literal>GROUP BY</literal> > + conditions are independent, and multiplies their selectivities together to > + arrive at a much-too-high group count estimate in the aggregate query. severe overestimate ? > + This is further exacerbated by the lack of accurate statistics for the > + expressions, forcing the planner to use default ndistinct estimate for the use *a default > + expression derived from ndistinct for the column. With such statistics, > the > + planner recognizes that the conditions are correlated and arrives at much > + more accurate estimates. are correlated comma > + if (type->lt_opr == InvalidOid) These could be !OidIsValid > + * expressions. It's either expensive or very easy to defeat for > + * determined used, and there's no risk if we allow such statistics (the > + * statistics is useless, but harmless). I think it's meant to say "for a determined user" ? > + * If there are no simply-referenced columns, give the statistics an > auto > + * dependency on the whole table. In most cases, this will be > redundant, > + * but it might not be if the statistics expressions contain no Vars > + * (which might seem strange but possible). > + */ > + if (!nattnums) > + { > + ObjectAddressSet(parentobject, RelationRelationId, relid); > + recordDependencyOn(&myself, &parentobject, DEPENDENCY_AUTO); > + } Can this be unconditional ? > + * Translate the array of indexs to regular attnums for the dependency > (we sp: indexes > + * Not found a matching expression, so > we can simply skip Found no matching expr > + /* if found a matching, */ matching .. > +examine_attribute(Node *expr) Maybe you should rename this to something distinct ? So it's easy to add a breakpoint there, for example. > + stats->anl_context = CurrentMemoryContext; /* XXX should be using > + > * something else? */ > + bool nulls[Natts_pg_statistic]; ... > + * Construct a new pg_statistic tuple > + */ > + for (i = 0; i < Natts_pg_statistic; ++i) > + { > + nulls[i] = false; > + } Shouldn't you just write nulls[Natts_pg_statistic] = {false}; or at least: memset(nulls, 0, sizeof(nulls)); > + * We don't store collations used to build the > statistics, but > + * we can use the collation for the attribute > itself, as > + * stored in varcollid. We do reset the > statistics after a > + * type change (including collation change), so > this is OK. We > + * may need to relax this after allowing > extended statistics > + * on expressions. This text should be updated or removed ? > @@ -2705,7 +2705,108 @@ describeOneTableDetails(const char *schemaname, > } > > /* print any extended statistics */ > - if (pset.sversion >= 100000) > + if (pset.sversion >= 140000) > + { > + printfPQExpBuffer(&buf, > + "SELECT oid, " > + > "stxrelid::pg_catalog.regclass, " > + > "stxnamespace::pg_catalog.regnamespace AS nsp, " > + "stxname,\n" > + > "pg_get_statisticsobjdef_columns(oid) AS columns,\n" > + " 'd' = any(stxkind) > AS ndist_enabled,\n" > + " 'f' = any(stxkind) > AS deps_enabled,\n" > + " 'm' = any(stxkind) > AS mcv_enabled,\n"); > + > + if (pset.sversion >= 130000) > + appendPQExpBufferStr(&buf, " stxstattarget\n"); > + else > + appendPQExpBufferStr(&buf, " -1 AS > stxstattarget\n"); >= 130000 is fully determined by >= 14000 :) > + * type of the opclass, which is not interesting for our purposes. > (Note: > + * if we did anything with non-expression index columns, we'd need to index is wrong ? I mentioned a bunch of other references to "index" and "predicate" which are still around: On Thu, Jan 07, 2021 at 08:35:37PM -0600, Justin Pryzby wrote: > There's some remaining copy/paste stuff from index expressions: > > errmsg("statistics expressions and predicates can refer only to the table > being indexed"))); > left behind by evaluating the predicate or index expressions. > Set up for predicate or expression evaluation > Need an EState for evaluation of index expressions and > partial-index predicates. Create it in the per-index context to be > Fetch function for analyzing index expressions.