On 23/02/2014 10:27, Arnt Gulbrandsen wrote:
The point of the locking was to block a race:

   delete from x where y in (select y from x)

The select runs first, then the delete, and if there's an insert between the two, the delete may fail.

I'll look at the suggested patch when I'm back home tonight.

Arnt


Perhaps I wasn't clear.  My expectation is that you:

1) identify ids that look like they are candidates
2) delete, checking that they are still candidates

For the particular case of vacuum, we're essentially removing things that are garbage and they are not generally resuscitated. Updates to an item that matches these criteria are unlikely. In any case, I suggest that the criteria is re-evaluated during the delete for safety.

My expectation is that given the start point for the checks in 2) are already ids, the query plan should be OK, especially if the batch size is small.

You don't care if other ids become candidates while all this is processing - they get caught next time around.

Ideally this process would run frequently as a background task, but you can't do that at the moment because it can be rather disruptive.


On 23/02/2014 11:08, Stephen R. van den Berg wrote:

Collecting all ids to be deleted in one temp table is not a good idea
to begin with.  Collecting the ids could become a scaling problem.

I disagree. Temp tables are lightweight and ids are small; workmem could easily be more of a problem. And if its really a concern that so much has become garbage (in which case a big transaction is also a big problem) then just enforce an upper limit in the select with a max count. I'd wager that you'd probably set that count pretty high. I'd also look for an async commit on this if the postgres version supports it.

In principle I agree that this activity can be run as small batches of updates, its just a question of how todo the initial check and whether it is best run repeatedly or run to select a candidate set. And then whether that candidate set should be spooled to a temp table on the Postgres server, or returned to the C++ app and then processed there. I'm not sure there is a reason to move the ids back to the C++ app; once they are in a temp table a stored proc could iterate over a cursor on it and do the checked deletes.




Reply via email to