On 2015/11/04 17:10, Kouhei Kaigai wrote:
On 2015/10/28 6:04, Robert Haas wrote:
On Tue, Oct 20, 2015 at 12:39 PM, Etsuro Fujita
Sorry, my explanation was not correct. (Needed to take in caffeine.) What
I'm concerned about is the following:
SELECT * FROM localtab JOIN (ft1 LEFT JOIN ft2 ON ft1.x = ft2.x) ON
localtab.id = ft1.id FOR UPDATE OF ft1
-> Nested Loop
Join Filter: (localtab.id = ft1.id)
-> Seq Scan on localtab
-> Foreign Scan on <ft1, ft2>
Remote SQL: SELECT * FROM ft1 LEFT JOIN ft2 WHERE ft1.x = ft2.x
FOR UPDATE OF ft1
Assume that ft1 performs late row locking.
If the SQL includes "FOR UPDATE of ft1", then it clearly performs
early row locking. I assume you meant to omit that.
If an EPQ recheck was invoked
due to a concurrent transaction on the remote server that changed only the
value x of the ft1 tuple previously retrieved, then we would have to
generate a fake ft1/ft2-join tuple with nulls for ft2. (Assume that the ft2
tuple previously retrieved was not a null tuple.) However, I'm not sure how
we can do that in ForeignRecheck; we can't know for example, which one is
outer and which one is inner, without an alternative local join execution
plan. Maybe I'm missing something, though.
I would expect it to issue a new query like: SELECT * FROM ft1 LEFT
JOIN ft2 WHERE ft1.x = ft2.x AND ft1.tid = $0 AND ft2.tid = $1.
We assume here that ft1 uses late row locking, so I thought the above
SQL should include "FOR UPDATE of ft1". But I still don't think that
that is right; the SQL with "FOR UPDATE of ft1" wouldn't generate the
fake ft1/ft2-join tuple with nulls for ft2, as expected. The reason for
that is that the updated version of the ft1 tuple wouldn't satisfy the
ft1.tid = $0 condition in an EPQ recheck, because the ctid for the
updated version of the ft1 tuple has changed. (IIUC, I think that if we
use a TID scan for ft1, the SQL would generate the expected result,
because I think that the TID condition would be ignored in the EPQ
recheck, but I don't think it's guaranteed to use a TID scan for ft1.)
Maybe I'm missing something, though.
It looks to me, we should not use ctid system column to identify remote
row when postgres_fdw tries to support late row locking.
The documentation says:
UPDATE and DELETE operations are performed against rows previously
fetched by the table-scanning functions. The FDW may need extra information,
such as a row ID or the values of primary-key columns, to ensure that it can
identify the exact row to update or delete
The "rowid" should not be changed once it is fetched from the remote side
until it is actually updated, deleted or locked, for correct identification.
If ctid is used for this purpose, it is safe only when remote row is locked
when it is fetched - it is exactly early row locking behavior, isn't it?
Yeah, we should use early row locking for a target foreign table in
In case of SELECT FOR UPDATE, I think we are allowed to use ctid to
identify target rows for late row locking, but I think the above SQL
should be changed to something like this:
SELECT * FROM (SELECT * FROM ft1 WHERE ft1.tid = $0 FOR UPDATE) ss1 LEFT
JOIN (SELECT * FROM ft2 WHERE ft2.tid = $1) ss2 ON ss1.x = ss2.x
This should be significantly more efficient than fetching the base
rows from each of two tables with two separate queries.
Maybe I think we could fix the SQL, so I have to admit that, but I'm
just wondering (1) what would happen for the case when ft1 uses late row
rocking and ft2 uses early row rocking and (2) that would be still more
efficient than re-fetching only the base row from ft1.
It should be decision by FDW driver. It is not easy to estimate a certain
FDW driver mixes up early and late locking policy within a same remote join
query. Do you really want to support such a mysterious implementation?
Yeah, the reason for that is because GetForeignRowMarkType allows that.
Or, do you expect all the FDW driver is enforced to return a joined tuple
if remote join case?
No. That wouldn't make sense if at least one component table involved
in a foreign join uses the rowmark type other than ROW_MARK_COPY.
It is different from my idea; it shall be an extra
optimization option if FDW can fetch a joined tuple at once, but not always.
So, if FDW driver does not support this optimal behavior, your driver can
fetch two base tables then run local alternative join (or something other).
OK, so if we all agree that the joined-tuple optimization is just an
option for the case where all the component tables use ROW_MARK_COPY,
I'd propose to leave that for 9.6.
Sent via pgsql-hackers mailing list (firstname.lastname@example.org)
To make changes to your subscription: