Re: aox vacuum overhaul, splitup

james Sun, 23 Feb 2014 04:56:33 -0800

On 23/02/2014 10:27, Arnt Gulbrandsen wrote:

The point of the locking was to block a race:
   delete from x where y in (select y from x)
The select runs first, then the delete, and if there's an insertbetween the two, the delete may fail.
I'll look at the suggested patch when I'm back home tonight.

Arnt


Perhaps I wasn't clear.  My expectation is that you:

1) identify ids that look like they are candidates
2) delete, checking that they are still candidates

For the particular case of vacuum, we're essentially removing thingsthat are garbage and they are not generally resuscitated. Updates to anitem that matches these criteria are unlikely. In any case, I suggestthat the criteria is re-evaluated during the delete for safety.

My expectation is that given the start point for the checks in 2) arealready ids, the query plan should be OK, especially if the batch sizeis small.

You don't care if other ids become candidates while all this isprocessing - they get caught next time around.

Ideally this process would run frequently as a background task, but youcan't do that at the moment because it can be rather disruptive.



On 23/02/2014 11:08, Stephen R. van den Berg wrote:

Collecting all ids to be deleted in one temp table is not a good idea
to begin with.  Collecting the ids could become a scaling problem.

I disagree. Temp tables are lightweight and ids are small; workmemcould easily be more of a problem. And if its really a concern that somuch has become garbage (in which case a big transaction is also a bigproblem) then just enforce an upper limit in the select with a maxcount. I'd wager that you'd probably set that count pretty high. I'dalso look for an async commit on this if the postgres version supports it.

In principle I agree that this activity can be run as small batches ofupdates, its just a question of how todo the initial check and whetherit is best run repeatedly or run to select a candidate set. And thenwhether that candidate set should be spooled to a temp table on thePostgres server, or returned to the C++ app and then processed there.I'm not sure there is a reason to move the ids back to the C++ app; oncethey are in a temp table a stored proc could iterate over a cursor on itand do the checked deletes.

Re: aox vacuum overhaul, splitup

Reply via email to