Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)

Kouhei Kaigai Thu, 09 Apr 2015 05:24:07 -0700

> 2015/04/09 10:48、Kouhei Kaigai <[email protected]> のメール：
> * merge_fpinfo()
> >>> It seems to me fpinfo->rows should be joinrel->rows, and
> >>> fpinfo->width also should be joinrel->width.
> >>> No need to have special intelligence here, isn't it?
> >>
> >>
> >> Oops. They are vestige of my struggle which disabled SELECT clause 
> >> optimization
> >> (omit unused columns).  Now width and rows are inherited from joinrel.
> Besides
> >> that, fdw_startup_cost and fdw_tuple_cost seem wrong, so I fixed them to 
> >> use
> simple
> >> summary, not average.
> >>
> > Does fpinfo->fdw_startup_cost represent a cost to open connection to remote
> > PostgreSQL, doesn't it?
> >
> > postgres_fdw.c:1757 says as follows:
> >
> >    /*
> >     * Add some additional cost factors to account for connection overhead
> >     * (fdw_startup_cost), transferring data across the network
> >     * (fdw_tuple_cost per retrieved row), and local manipulation of the data
> >     * (cpu_tuple_cost per retrieved row).
> >     */
> >
> > If so, does a ForeignScan that involves 100 underlying relation takes 100
> > times heavy network operations on startup? Probably, no.
> > I think, average is better than sum, and max of them will reflect the cost
> > more correctly.
> 
> In my current opinion, no. Though I remember that I've written such comments
> before :P.
> 
> Connection establishment occurs only once for the very first access to the 
> server,
> so in the use cases with long-lived session (via psql, connection pooling, 
> etc.),
> taking connection overhead into account *every time* seems too pessimistic.
> 
> Instead, for practical cases, fdw_startup_cost should consider overheads of 
> query
> construction and getting first response of it (hopefully it minus retrieving
> actual data).  These overheads are visible in the order of milliseconds.  I’m
> not sure how much is appropriate for the default, but 100 seems not so bad.
> 
> Anyway fdw_startup_cost is per-server setting as same as fdw_tuple_cost, and 
> it
> should not be modified according to the width of the result, so using
> fpinfo_o->fdw_startup_cost would be ok.
>
Indeed, I forgot the connection cache mechanism. As long as we define
fdw_startup_cost as you mentioned, it seems to me your logic is heuristically
reasonable.


> > Also, fdw_tuple_cost introduce the cost of data transfer over the network.
> > I thinks, weighted average is the best strategy, like:
> >  fpinfo->fdw_tuple_cost =
> >    (fpinfo_o->width / (fpinfo_o->width + fpinfo_i->width) *
> fpinfo_o->fdw_tuple_cost +
> >    (fpinfo_i->width / (fpinfo_o->width + fpinfo_i->width) *
> fpinfo_i->fdw_tuple_cost;
> >
> > That's just my suggestion. Please apply the best way you thought.
> 
> I can’t agree that strategy, because 1) width 0 causes per-tuple cost 0, and 
> 2)
> fdw_tuple_cost never vary in a foreign server.  Using fpinfo_o->fdw_tuple_cost
> (it must be identical to fpinfo_i->fdw_tuple_cost) seems reasonable.  
> Thoughts?
>
OK, you are right.

I think it is time to hand over the patch reviewing to committers.
So, let me mark it "ready for committers".

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <[email protected]>

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)

Reply via email to