On Tue, Mar 13, 2018 at 6:31 AM, David Rowley <david.row...@2ndquadrant.com>
> On 13 March 2018 at 11:44, Tom Lane <t...@sss.pgh.pa.us> wrote:
> > While it would certainly be nice to have better behavior for that,
> > "add a hook so users who can write C can fix it by hand" doesn't seem
> > like a great solution. On top of the sheer difficulty of writing a
> > hook function, you'd have the problem that no pre-written hook could
> > know about all available functions. I think somehow we'd need a way
> > to add per-function knowledge, perhaps roughly like the protransform
> > feature.
I think this isn't either-or. I think a general hook can be useful for
that want to optimize particular data distributions/workloads using
domain-knowledge about functions common for those workloads.
That way users working with that data can use extensions to optimize
workloads without writing C themselves. I also think a
protransform like feature would add a lot of power to the native planner
but this could take a while
to get into core properly and may not handle all kinds of data
An example, of a case a protransform type system would not be able to
optimize is mathematical operator expressions like bucketing integers by
decile --- (integer / 10) * 10.
This is somewhat analogous to date_trunc in the integer space and would
also change the number of resulting distinct rows.
> I always imagined that extended statistics could be used for this.
> Right now the estimates are much better when you create an index on
> the function, but there's no real reason to limit the stats that are
> gathered to just plain columns + expression indexes.
> I believe I'm not the only person to have considered this. Originally
> extended statistics were named multivariate statistics. I think it was
> Dean and I (maybe others too) that suggested to Tomas to give the
> feature a more generic name so that it can be used for a more general
> purpose later.
I also think that the point with extended statistics is a good one and
points to the need for more experimentation/experience which I think
a C hook is better suited for. Putting in a hook will allow extension
writers like us to experiment and figure out the kinds of transform on
statistics that are useful while having
a small footprint on the core. I think designing a protransform-like system
would benefit from more experience with the kinds of transformations that
For example, can anything be done if the interval passed to date_trunc is
not constant, or is it not even worth bothering with that case? Maybe
statistics is a better approach, etc.
> David Rowley http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services