* Kouhei Kaigai (kai...@ak.jp.nec.com) wrote:
> IIUC, his approach was integration of join-pushdown within FDW APIs,
> however, it does not mean the idea of remote-join is rejected.

For my part, trying to consider doing remote joins *without* going
through FDWs is just nonsensical.  What are you joining remotely if not
two foreign tables?  With regard to the GPU approach, if that model
works whereby the normal PG tuples are read off disk, fed over to the
GPU, processed, then returned back to the user through PG, then I
wouldn't consider it really a 'remote' join but rather simply a new
execution node inside of PG which is planned and costed just like the
others.  We've been over the discussion already about trying to make
that a pluggable system but the, very reasonable, push-back on that has
been if it's really possible and really makes sense to be pluggable.  It
certainly doesn't *have* to be- PostgreSQL is written in C, as we all
know, and plenty of C code talks to GPUs and shuffles memory around- and
that's almost exactly what Robert is working on supporting with regular
CPUs and PG backends already.

In many ways, trying to conflate this idea of using-GPUs-to-do-work with
the idea of remote-FDW-joins has really disillusioned me with regard to
the CustomScan approach.

> > Then perhaps they should be exposed more directly?  I can understand
> > generally useful functionality being exposed in a way that anyone can use
> > it, but we need to avoid interfaces which can't be stable due to normal
> > / ongoing changes to the backend code.
> > 
> The functions my patches want to expose are:
>  - get_restriction_qual_cost()
>  - fix_expr_common()

I'll try and find time to go look at these in more detail later this
week.  I have reservations about exposing the current estimates on costs
as we may want to adjust them in the future- but such adjustments may
need to be made in balance with other changes throughout the system and
an external module which depends on one result from the qual costing
might end up having problems with the costing changes because the
extension author wasn't aware of the other changes happening in other
areas of the costing.

I'm talking about this from a "beyond-just-the-GUCs" point of view, I
realize that the extension author could go look at the GUC settings, but
it's entirely reasonable to believe we'll make changes to the default
GUC settings along with how they're used in the future.

> And, the functions my patches newly want are:
>  - bms_to_string()
>  - bms_from_string()

Offhand, these look fine, if there's really an external use for them.
Will try to look at them in more detail later.

> > That's fine, if we can get data to and from those co-processors efficiently
> > enough that it's worth doing so.  If moving the data to the GPU's memory
> > will take longer than running the actual aggregation, then it doesn't make
> > any sense for regular tables because then we'd have to cache the data in
> > the GPU's memory in some way across multiple queries, which isn't something
> > we're set up to do.
> > 
> When I made a prototype implementation on top of FDW, using CUDA, it enabled
> to run sequential scan 10 times faster than SeqScan on regular tables, if
> qualifiers are enough complex.
> Library to communicate GPU (OpenCL/CUDA) has asynchronous data transfer
> mode using hardware DMA. It allows to hide the cost of data transfer by
> pipelining, if here is enough number of records to be transferred.

That sounds very interesting and certainly figuring out the costing to
support that model will be tricky.  Also, shuffling the data around in
that way will also be interesting.  It strikes me that it'll be made
more difficult if we're trying to do it through the limitations of a
pre-defined API between the core code and an extension.

> Also, the recent trend of semiconductor device is GPU integration with CPU,
> that shares a common memory space. See, Haswell of Intel, Kaveri of AMD, or
> Tegra K1 of nvidia. All of them shares same memory, so no need to transfer
> the data to be calculated. This trend is dominated by physical law because
> of energy consumption by semiconductor. So, I'm optimistic for my idea.

And this just makes me wonder why the focus isn't on the background
worker approach instead of trying to do this all in an extension.

> The usage was found by the contrib module that wants to call static
> functions, or feature to translate existing data structure to/from
> cstring. But, anyway, does separated patch make sense?

I haven't had a chance to go back and look into the functions in detail,
but offhand I'd say the bms ones are probably fine while the others
would need more research as to if they make sense to expose to an
extension.

> Hmm... It seems to me we should follow the existing manner to construct
> join path, rather than special handling. Even if a query contains three or
> more foreign tables managed by same server, it shall be consolidated into
> one remote join as long as its cost is less than local ones.

I'm not convinced that it's going to be that simple, but I'm certainly
interested in the general idea.

> So, I'd like to bed using the new add_join_path_hook to compute possible
> join path. If remote join implemented by custom-scan is cheaper than local
> join, it shall be chosen, then optimizer will try joining with other foreign
> tables with this custom-scan node. If remote-join is still cheap, then it
> shall be consolidated again.

And I'm still unconvinced that trying to make this a hook and
implemented by an extension makes sense.

> > Admittedly, getting the costing right isn't easy either, but it's not clear
> > to me how it'd make sense for the local server to be doing costing for 
> > remote
> > servers.
> > 
> Right now, I ignored the cost to run remote-server, focused on the cost to
> transfer via network. It might be an idea to discount the CPU cost of remote
> execution.

Pretty sure we're going to need to consider the remote processing cost
of the join as well..

        Thanks,

                Stephen

Attachment: signature.asc
Description: Digital signature

Reply via email to