Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)

Kohei KaiGai Tue, 04 Mar 2014 07:01:50 -0800

2014-03-04 23:10 GMT+09:00 Stephen Frost <[email protected]>:
>> The "cache_scan" module that I and Haribabu are discussing in another
>> thread also might be a good demonstration for custom-scan interface,
>> however, its code scale is a bit larger than ctidscan.
>
> That does sound interesting though I'm curious about the specifics...
>
This module caches a part of columns, but not all, thus allows to hold
much larger number of records for a particular amount of RAM than the
standard buffer cache.
It is constructed on top of custom-scan node, and also performs a new
hook for a callback on page vacuuming to invalidate its cache entry.
(I originally designed this module for demonstration of on-vacuum hook
because I already made ctidscan and postgres_fdw enhancement for
custom-scan node, by the way.)


>> > For one thing, an example where you could have this CustomScan node calling
>> > other nodes underneath would be interesting.  I realize the CTID scan can't
>> > do that directly but I would think your GPU-based system could; after all,
>> > if you're running a join or an aggregate with the GPU, the rows could come
>> > from nearly anything.  Have you considered that, or is the expectation that
>> > users will just go off and access the heap and/or whatever indexes 
>> > directly,
>> > like ctidscan does?  How would such a requirement be handled?
>> >
>> In case when custom-scan node has underlying nodes, it shall be invoked using
>> ExecProcNode as built-in node doing, then it will be able to fetch tuples
>> come from underlying nodes. Of course, custom-scan provider can perform the
>> tuples come from somewhere as if it came from underlying relation. It is
>> responsibility of extension module. In some cases, it shall be required to
>> return junk system attribute, like ctid, for row-level locks or table 
>> updating.
>> It is also responsibility of the extension module (or, should not add custom-
>> path if this custom-scan provider cannot perform as required).
>
> Right, tons of work to do to make it all fit together and play nice-
> what I was trying to get at is: has this actually been done?  Is the GPU
> extension that you're talking about as the use-case for this been
> written?
>
Its chicken-and-egg problem, because implementation of the extension module
fully depends on the interface from the backend. Unlike commit-fest, here is no
deadline for my extension module, so I put higher priority on the submission of
custom-scan node, than the extension.
However, GPU extension is not fully theoretical stuff. I had implemented
a prototype using FDW APIs, and it allowed to accelerate sequential scan if
query has enough complicated qualifiers.

See the movie (from 2:45). The table t1 is a regular table, and t2 is a foreign
table. Both of them has same contents, however, response time of the query
is much faster, if GPU acceleration is working.
    http://www.youtube.com/watch?v=xrUBffs9aJ0
So, I'm confident that GPU acceleration will have performance gain once it
can run regular tables, not only foreign tables.

> How does it handle all of the above?  Or are we going through
> all these gyrations in vain hope that it'll actually all work when
> someone tries to use it for something real?
>
I don't talk something difficult. If junk attribute requires to return "ctid" of
the tuple, custom-scan provider reads a tuple of underlying relation then
includes a correct item pointer. If this custom-scan is designed to run on
the cache, all it needs to do is reconstruct a tuple with correct item-pointer
(thus this cache needs to have ctid also). It's all I did in the cache_scan
module.

Thanks,
-- 
KaiGai Kohei <[email protected]>


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)

Reply via email to