> 2015/04/09 10:48、Kouhei Kaigai <kai...@ak.jp.nec.com> のメール: > * merge_fpinfo() > >>> It seems to me fpinfo->rows should be joinrel->rows, and > >>> fpinfo->width also should be joinrel->width. > >>> No need to have special intelligence here, isn't it? > >> > >> > >> Oops. They are vestige of my struggle which disabled SELECT clause > >> optimization > >> (omit unused columns). Now width and rows are inherited from joinrel. > Besides > >> that, fdw_startup_cost and fdw_tuple_cost seem wrong, so I fixed them to > >> use > simple > >> summary, not average. > >> > > Does fpinfo->fdw_startup_cost represent a cost to open connection to remote > > PostgreSQL, doesn't it? > > > > postgres_fdw.c:1757 says as follows: > > > > /* > > * Add some additional cost factors to account for connection overhead > > * (fdw_startup_cost), transferring data across the network > > * (fdw_tuple_cost per retrieved row), and local manipulation of the data > > * (cpu_tuple_cost per retrieved row). > > */ > > > > If so, does a ForeignScan that involves 100 underlying relation takes 100 > > times heavy network operations on startup? Probably, no. > > I think, average is better than sum, and max of them will reflect the cost > > more correctly. > > In my current opinion, no. Though I remember that I've written such comments > before :P. > > Connection establishment occurs only once for the very first access to the > server, > so in the use cases with long-lived session (via psql, connection pooling, > etc.), > taking connection overhead into account *every time* seems too pessimistic. > > Instead, for practical cases, fdw_startup_cost should consider overheads of > query > construction and getting first response of it (hopefully it minus retrieving > actual data). These overheads are visible in the order of milliseconds. I’m > not sure how much is appropriate for the default, but 100 seems not so bad. > > Anyway fdw_startup_cost is per-server setting as same as fdw_tuple_cost, and > it > should not be modified according to the width of the result, so using > fpinfo_o->fdw_startup_cost would be ok. > Indeed, I forgot the connection cache mechanism. As long as we define fdw_startup_cost as you mentioned, it seems to me your logic is heuristically reasonable.
> > Also, fdw_tuple_cost introduce the cost of data transfer over the network. > > I thinks, weighted average is the best strategy, like: > > fpinfo->fdw_tuple_cost = > > (fpinfo_o->width / (fpinfo_o->width + fpinfo_i->width) * > fpinfo_o->fdw_tuple_cost + > > (fpinfo_i->width / (fpinfo_o->width + fpinfo_i->width) * > fpinfo_i->fdw_tuple_cost; > > > > That's just my suggestion. Please apply the best way you thought. > > I can’t agree that strategy, because 1) width 0 causes per-tuple cost 0, and > 2) > fdw_tuple_cost never vary in a foreign server. Using fpinfo_o->fdw_tuple_cost > (it must be identical to fpinfo_i->fdw_tuple_cost) seems reasonable. > Thoughts? > OK, you are right. I think it is time to hand over the patch reviewing to committers. So, let me mark it "ready for committers". Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kai...@ak.jp.nec.com> -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers