Re: [HACKERS] [v9.5] Custom Plan API

Stephen Frost Thu, 08 May 2014 12:11:39 -0700

* Simon Riggs (si...@2ndquadrant.com) wrote:
> It would seem normal and natural to have
> 
> * pg_joinam catalog table for "join methods" with a join method API
> Which would include some way of defining which operators/datatypes we
> consider this for, so if PostGIS people come up with some fancy GIS
> join thing, we don't invoke it every time even when its inapplicable.
> I would prefer it if PostgreSQL also had some way to control when the
> joinam was called, possibly with some kind of table_size_threshold on
> the AM tuple, which could be set to >=0 to control when this was even
> considered.


It seems useful to think about how we would redefine our existing join
methods using such a structure.  While thinking about that, it seems
like we would worry more about what the operators provide rather than
the specific operators themselves (ala hashing / HashJoin) and I'm not
sure we really care about the data types directly- just about the
operations which we can do on them..

I can see a case for sticking data types into this if we feel that we
have to constrain the path possibilities for some reason, but I'd rather
try and deal with any issues around "it doesn't make sense to do X
because we'll know it'll be really expensive" through the cost model
instead of with a table that defines what's allowed or not allowed.
There may be cases where we get the costing wrong and it's valuable
to be able to tweak cost values on a per-connection basis or for
individual queries.

I don't mean to imply that a 'pg_joinam' table is a bad idea, just that
I'd think of it being defined in terms of what capabilities it requires
of operators and a way for costing to be calculated for it, plus the
actual functions which it provides to implement the join itself (to
include some way to get output suitable for explain, etc..).

> * pg_scanam catalog table for "scan methods" with a scan method API
> Again, a list of operators that can be used with it, like indexes and
> operator classes

Ditto for this- but there's lots of other things this makes me wonder
about because it's essentially trying to define a pluggable storage
layer, which is great, but also requires some way to deal with all of
things we use our storage system for: cacheing / shared buffers,
locking, visibility, WAL, unique identifier / ctid (for use in indexes,
etc)...

> By analogy to existing mechanisms, we would want
> 
> * A USERSET mechanism to allow users to turn it off for testing or
> otherwise, at user, database level

If we re-implement our existing components through this ("eat our own
dogfood" as it were), I'm not sure that we'd be able to have a way to
turn it on/off..  I realize we wouldn't have to, but then it seems like
we'd have two very different code paths and likely a different level of
support / capability afforded to "external" storage systems and then I
wonder if we're not back to just FDWs again..

> We would also want
> 
> * A startup call that allows us to confirm it is available and working
> correctly, possibly with some self-test for hardware, performance
> confirmation/derivation of planning parameters

Yeah, we'd need this for anything that supports a GPU, regardless of how
we implement it, I'd think.

> * Some kind of trace mode that would allow people to confirm the
> outcome of calls

Seems like this would be useful independently of the rest..

> * Some interface to the stats system so we could track the frequency
> of usage of each join/scan type. This would be done within Postgres,
> tracking the calls by name, rather than trusting the plugin to do it
> for us

This is definitely something I want for core already...

        Thanks,

                Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] [v9.5] Custom Plan API

Reply via email to