On 03/21/2016 04:34 AM, Alvaro Herrera wrote:
Another skim on 0002:
reference.sgml is missing a call to &alterStatistic.
ObjectProperty contains a comment that the ACL is "same as relation",
but is that still correct, given that now stats may be related to more
than one relation? Do we even know what the rules for ACLs on
cross-relation stats are? One very simple way to get around this is to
dictate that all the rels must have the same owner. Perhaps we're not
considering the multi-relation case yet?
As I wrote in response to Robert's message, I don't think we need ACLs
for statistics - the user should be able to use them when they can
access all the underlying relations (in a query). For ALTER STATISTICS
the (owner || superuser) check should be enough, right?
We have this FIXME comment in do_analyze_rel:
+ * FIXME This sample sizing is mostly OK when computing stats for
+ * individual columns, but when computing multi-variate stats
+ * for multivariate stats (histograms, mcv, ...) it's rather
+ * insufficient. For stats on multiple columns / complex stats
+ * we need larger sample sizes, because we need to build more
+ * detailed stats (more MCV items / histogram buckets) to get
+ * good accuracy. Maybe it'd be appropriate to use samples
+ * proportional to the table (say, 0.5% - 1%) instead of a
+ * fixed size might be more appropriate. Also, this should be
+ * bound to the requested statistics size - e.g. number of MCV
+ * items or histogram buckets should require several sample
+ * rows per item/bucket (so the sample should be k*size).
Maybe this merits more discussion. Right now we have an upper bound on
how much to scan for analyze; if we introduce the idea of scanning a
percentage of the relation, the time to analyze very large relations
could increase significantly. Do we have an idea of what to do for
this? For instance, a rule that would make me comfortable would say to
scan a sample 3x the current size when you have a mvstats on 3 columns;
then the size of fraction to scan is still bounded. But does that
actually work? From the wording of this comment, I assume you don't
Yeah. I think more discussion is needed, because I myself am not sure
the FIXME is actually correct. For now I think we're OK with using the
same logic as statistics on a single column (300 * target).
In this block (CreateStatistics)
+ /* look for duplicities */
+ for (i = 0; i < numcols; i++)
+ for (j = 0; j < numcols; j++)
+ if ((i != j) && (attnums[i] == attnums[j]))
+ errmsg("duplicate column name in
isn't it easier to have the inner loop go from i+1 to numcols?
It probably is.
I wonder if this is sensible with multi-relation statistics:
+ * Store a dependency too, so that statistics are dropped on DROP TABLE
+ parentobject.classId = RelationRelationId;
+ parentobject.objectId = ObjectIdGetDatum(RelationGetRelid(rel));
+ parentobject.objectSubId = 0;
+ childobject.classId = MvStatisticRelationId;
+ childobject.objectId = statoid;
+ childobject.objectSubId = 0;
I suppose the idea is to drop the stats if any of the rels they are for
What do you mean by sensible? I mean, we don't support multiple tables
at this point (except for choosing a syntax that should allow that), but
the code assumes a single relation on a few places (like this one).
Right after that you create a dependency on the schema. Is that
necessary? Since you have the dependency on the relation, the stats
would be dropped by recursion.
Hmmmm, that's probably right. Also, now that I think about it, it
probably gets broken after ALTER STATISTICS ... SET SCHEMA, because the
code does not remove the old dependency (and does not create a new one).
Why are you #include'ing builtins.h everywhere?
RelationGetMVStatList() needs a comment.
Please get rid of common.h. It's totally unlike the way we structure
our header files. We don't keep headers in src/backend; they're all in
src/include. One reason is that the latter gets installed as a whole in
include/server, which this file will not be. This file may be necessary
to build some extensions in the future, for example.
OK, I'll rework that and move it to src/include/.
In mvstats.h, please mark function prototypes as "extern".
Many files need a pgindent pass.
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Sent via pgsql-hackers mailing list (email@example.com)
To make changes to your subscription: