On 2015/10/20 9:36, Kouhei Kaigai wrote:
Even if we fetch whole-row of both side, join pushdown is exactly working
because we can receive less number of rows than local join + 2 of foreign-
scan. (If planner works well, we can expect join-path that increases number
of rows shall be dropped.)
One downside of my proposition is growth of width for individual rows.
It is a trade-off situation. The above approach takes no changes for
existing EPQ infrastructure, thus, its implementation design is clear.
On the other hands, your approach will reduce traffic over the network,
however, it is still unclear how we integrate scanrelid==0 with EPQ
infrastructure.
I agree with KaiGai-san that his proposition (or my proposition based on
secondary plans) is still a performance improvement over the current
implementation on local joining plus early row locking, since that that
wouldn't have to transfer useless data that didn't satisfy join
conditions at all!
On the other hands, in case of custom-scan that takes underlying local
scan-nodes, thus, any kind of ROW_MARK_* except for ROW_MARK_COPY will
happen. I think width of the joined tuples are relatively minor issue
than FDW cases. However, we cannot expect the fetched rows are protected
by early row-locking mechanism, so probability of re-fetching rows and
reconstruction of joined-tuple has relatively higher priority.
I see.
There is also some possible loss of efficiency with this approach.
Suppose that we have two tables ft1 and ft2 which are being joined,
and we push down the join. They are being joined on an integer
column, and the join needs to select several other columns as well.
However, ft1 and ft2 are very wide tables that also contain some text
columns. The query is like this:
SELECT localtab.a, ft1.p, ft2.p FROM localtab LEFT JOIN (ft1 JOIN ft2
ON ft1.x = ft2.x AND ft1.huge ~ 'stuff' AND f2.huge2 ~ 'nonsense') ON
localtab.q = ft1.q;
If we refetch each row individually, we will need a wholerow image of
ft1 and ft2 that includes all columns, or at least f1.huge and
f2.huge2. If we just fetch a wholerow image of the join output, we
can exclude those. The only thing we need to recheck is that it's
still the case that localtab.q = ft1.q (because the value of
localtab.q might have changed).
As KaiGai-san mentioned above, what we need to discuss more about with
Robert's proposition is how to integrate that into the existing EPQ
machinery. For example, when, where, and how should we refetch the
whole-row image of the join output in the case of late row locking? IMV
I think that that would need to add a new FDW API different from
RefetchForeignRow, say RefetchForeignJoinRow.
IMO I think that another benefit from the proposition from KaiGai-san
(or me) would be that that could provide the whole functionality for row
locking in remote joins, without an additional development burden on an
FDW author; the author only has to write GetForeignRowMarkType and
RefetchForeignRow, which I think is relatively easy. I think that in
the proposition, the use of rowmark types such as ROW_MARK_SHARE or
ROW_MARK_EXCLUSIVE for foreign tables in remote joins would be quite
inefficient, but I think that the use of ROW_MARK_REFERENCE instead of
ROW_MARK_COPY would be an option for the workload where EPQ rechecks are
rarely invoked, because we just need to transfer ctids, not whole-row
images.
Best regards,
Etsuro Fujita
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers