On 23/02/2014 10:27, Arnt Gulbrandsen wrote:
The point of the locking was to block a race:
delete from x where y in (select y from x)
The select runs first, then the delete, and if there's an insert
between the two, the delete may fail.
I'll look at the suggested patch when I'm back home tonight.
Arnt
Perhaps I wasn't clear. My expectation is that you:
1) identify ids that look like they are candidates
2) delete, checking that they are still candidates
For the particular case of vacuum, we're essentially removing things
that are garbage and they are not generally resuscitated. Updates to an
item that matches these criteria are unlikely. In any case, I suggest
that the criteria is re-evaluated during the delete for safety.
My expectation is that given the start point for the checks in 2) are
already ids, the query plan should be OK, especially if the batch size
is small.
You don't care if other ids become candidates while all this is
processing - they get caught next time around.
Ideally this process would run frequently as a background task, but you
can't do that at the moment because it can be rather disruptive.
On 23/02/2014 11:08, Stephen R. van den Berg wrote:
Collecting all ids to be deleted in one temp table is not a good idea
to begin with. Collecting the ids could become a scaling problem.
I disagree. Temp tables are lightweight and ids are small; workmem
could easily be more of a problem. And if its really a concern that so
much has become garbage (in which case a big transaction is also a big
problem) then just enforce an upper limit in the select with a max
count. I'd wager that you'd probably set that count pretty high. I'd
also look for an async commit on this if the postgres version supports it.
In principle I agree that this activity can be run as small batches of
updates, its just a question of how todo the initial check and whether
it is best run repeatedly or run to select a candidate set. And then
whether that candidate set should be spooled to a temp table on the
Postgres server, or returned to the C++ app and then processed there.
I'm not sure there is a reason to move the ids back to the C++ app; once
they are in a temp table a stored proc could iterate over a cursor on it
and do the checked deletes.