Re: [PATCHES] Page at a time index scan

2006-05-08 Thread Simon Riggs
On Fri, 2006-05-05 at 18:04 -0400, Tom Lane wrote: Heikki Linnakangas [EMAIL PROTECTED] writes: On Fri, 5 May 2006, Tom Lane wrote: BTW, I just realized another bug in the patch: btbulkdelete fails to guarantee that it visits every page in the index. The first solution that occurs to me

Re: [PATCHES] Page at a time index scan

2006-05-08 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes: I read your earlier post about needing to lock everything and spent some time thinking about this. The issue of needing to lock everything means that we would never be able to do a partial vacuum of an index i.e. remove one page without a scan. I'm more

Re: [PATCHES] Page at a time index scan

2006-05-08 Thread Simon Riggs
On Mon, 2006-05-08 at 10:18 -0400, Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: I read your earlier post about needing to lock everything and spent some time thinking about this. The issue of needing to lock everything means that we would never be able to do a partial vacuum of an

Re: [PATCHES] Page at a time index scan

2006-05-08 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes: That wasn't the proposal. Every split would be marked and stay marked until those blocks were VACUUMed. The data used to mark is readily available and doesn't rely on whether or not VACUUM is running. IMHO this does work. OK, I misunderstood what you had

Re: [PATCHES] Page at a time index scan

2006-05-08 Thread Simon Riggs
On Mon, 2006-05-08 at 11:26 -0400, Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: That wasn't the proposal. Every split would be marked and stay marked until those blocks were VACUUMed. The data used to mark is readily available and doesn't rely on whether or not VACUUM is running.

Re: [PATCHES] Page at a time index scan

2006-05-08 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes: So we just optimised for slowly-but-continually churning tables (i.e. DELETEs match INSERTs, or just UPDATEs). i.e. we just improved VACUUM performance for those that don't need it that often. That might be the common case, but it isn't the one thats

Re: [PATCHES] Page at a time index scan

2006-05-08 Thread Luke Lonergan
Tom, On 5/8/06 11:46 AM, Tom Lane [EMAIL PROTECTED] wrote: I made a table of 16M rows with an index over a random-data integer column. With a thoroughly disordered index (built on-the-fly as the random data was inserted), the time to VACUUM after deleting a small number of rows was 615

Re: [PATCHES] Page at a time index scan

2006-05-08 Thread Simon Riggs
On Mon, 2006-05-08 at 14:46 -0400, Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: So we just optimised for slowly-but-continually churning tables (i.e. DELETEs match INSERTs, or just UPDATEs). i.e. we just improved VACUUM performance for those that don't need it that often. That

Re: [PATCHES] Page at a time index scan

2006-05-07 Thread Tom Lane
I've committed a rewritten version of this patch. Heikki Linnakangas [EMAIL PROTECTED] writes: On Fri, 5 May 2006, Tom Lane wrote: btbulkdelete arrives at a page, it need take no special action unless the page is newly split *and* its right-link points to a lower physical address. If that's

Re: [PATCHES] Page at a time index scan

2006-05-06 Thread Heikki Linnakangas
On Fri, 5 May 2006, Tom Lane wrote: I have a sketch of a solution that doesn't require any change in page allocation behavior. Can anyone see any holes in this: Looks good to me. Assume that we have some way to recognize whether a page has been split since the current btbulkdelete scan

Re: [PATCHES] Page at a time index scan

2006-05-06 Thread Tom Lane
Heikki Linnakangas [EMAIL PROTECTED] writes: That's not too bad. Where exactly were you thinking of putting the counter and the lock? My original thought was to keep it in btree metapages, but that's kind of annoying since we just went to some effort to cache metapage contents; which means the

Re: [PATCHES] Page at a time index scan

2006-05-05 Thread Tom Lane
I've just realized that the locking considerations in this patch are considerably more complicated than we thought. The discussion so far considered only half of what the deletion interlock needs to accomplish. We have to ensure that indexscans don't lose their place, which is solved in the patch

Re: [PATCHES] Page at a time index scan

2006-05-05 Thread Tom Lane
I wrote: BTW, I just realized another bug in the patch: btbulkdelete fails to guarantee that it visits every page in the index. It was OK for btvacuumcleanup to ignore pages added to the index after it starts, but btbulkdelete has to deal with such pages. Actually, as written this patch does

Re: [PATCHES] Page at a time index scan

2006-05-05 Thread Heikki Linnakangas
On Fri, 5 May 2006, Tom Lane wrote: I wrote: BTW, I just realized another bug in the patch: btbulkdelete fails to guarantee that it visits every page in the index. It was OK for btvacuumcleanup to ignore pages added to the index after it starts, but btbulkdelete has to deal with such pages.

Re: [PATCHES] Page at a time index scan

2006-05-05 Thread Tom Lane
Heikki Linnakangas [EMAIL PROTECTED] writes: The first solution that occurs to me is to force page splits to choose the target page so that it's blkno the original page's blkno during vacuum. I thought about that too, but don't like it for three reasons: * it encourages index bloat, the

Re: [PATCHES] Page at a time index scan

2006-05-05 Thread Tom Lane
Heikki Linnakangas [EMAIL PROTECTED] writes: On Fri, 5 May 2006, Tom Lane wrote: BTW, I just realized another bug in the patch: btbulkdelete fails to guarantee that it visits every page in the index. The first solution that occurs to me is to force page splits to choose the target page so

Re: [PATCHES] Page at a time index scan

2006-05-04 Thread Heikki Linnakangas
On Wed, 3 May 2006, Tom Lane wrote: Heikki, were you planning to make a round of revisions in the patch, or is this as far as you wanted to take it? Here's an updated patch. It's the same as the original, but merged with the changes to the vacuum_cleanup API you committed, and some comment

Re: [PATCHES] Page at a time index scan

2006-05-04 Thread Tom Lane
Heikki Linnakangas [EMAIL PROTECTED] writes: Here's an updated patch. Uh ... no patch actually attached? To-do: Not big things, but I don't have the time to work on them at the moment. I can take it from here if you'll send me what you have. regards, tom lane

Re: [PATCHES] Page at a time index scan

2006-05-03 Thread Heikki Linnakangas
On Tue, 2 May 2006, Tom Lane wrote: Heikki Linnakangas [EMAIL PROTECTED] writes: On Tue, 2 May 2006, Tom Lane wrote: Backwards scan may break this whole concept; are you sure you've thought it through? I think so. The patch doesn't change the walk-left code. Do you have something specific

Re: [PATCHES] Page at a time index scan

2006-05-03 Thread Simon Riggs
On Tue, 2006-05-02 at 15:35 -0400, Tom Lane wrote: Heikki Linnakangas [EMAIL PROTECTED] writes: On Tue, 2 May 2006, Tom Lane wrote: Backwards scan may break this whole concept; are you sure you've thought it through? I think so. The patch doesn't change the walk-left code. Do you have

Re: [PATCHES] Page at a time index scan

2006-05-03 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes: AFAICS we will need to return to the page for a backward scan, so we could just keep the pin the whole way. It's not possible to cache the left page pointer because block splits to our immediate left can update them even after we read the page contents.

Re: [PATCHES] Page at a time index scan

2006-05-03 Thread Simon Riggs
On Tue, 2006-05-02 at 15:35 -0400, Tom Lane wrote: I'm worried about synchronization, particularly what happens if the page gets deleted from under you while you don't have it pinned. On Wed, 2006-05-03 at 10:17 -0400, Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: We need never

Re: [PATCHES] Page at a time index scan

2006-05-03 Thread Simon Riggs
On Wed, 2006-05-03 at 10:17 -0400, Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: You are optimizing the wrong thing here. If we choose not to mark an entry dead then we will pay for that omission on every future scan of the same entry. I don't think that outweighs the (doubtless

Re: [PATCHES] Page at a time index scan

2006-05-03 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes: On Tue, 2006-05-02 at 15:35 -0400, Tom Lane wrote: This is unnecessary and probably wrong. You'll need to be more specific about what you mean. There is no point in marking a page half-dead, as that doesn't save anyone else from visiting it, and it's

Re: [PATCHES] Page at a time index scan

2006-05-03 Thread Simon Riggs
On Wed, 2006-05-03 at 10:56 -0400, Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: On Tue, 2006-05-02 at 15:35 -0400, Tom Lane wrote: This is unnecessary and probably wrong. You'll need to be more specific about what you mean. There is no point in marking a page half-dead, as

Re: [PATCHES] Page at a time index scan

2006-05-03 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes: So do you see a problem scenario like this? A, B and C separate backends: A1 Reads page, some row versions are *not* marked LP_DELETE but will be later when A2 happens B1 VACUUM removes dead rows, just happens to be all of them B2 Recycles

Re: [PATCHES] Page at a time index scan

2006-05-03 Thread Simon Riggs
On Wed, 2006-05-03 at 13:39 -0400, Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: So do you see a problem scenario like this? A, B and C separate backends: A1 Reads page, some row versions are *not* marked LP_DELETE but will be later when A2 happens B1 VACUUM removes dead

Re: [PATCHES] Page at a time index scan

2006-05-03 Thread Heikki Linnakangas
On Wed, 3 May 2006, Heikki Linnakangas wrote: On Tue, 2 May 2006, Tom Lane wrote: Heikki Linnakangas [EMAIL PROTECTED] writes: On Tue, 2 May 2006, Tom Lane wrote: Backwards scan may break this whole concept; are you sure you've thought it through? I think so. The patch doesn't change the

Re: [PATCHES] Page at a time index scan

2006-05-03 Thread Tom Lane
Heikki Linnakangas [EMAIL PROTECTED] writes: The point remains, however. A page won't get deleted while a scan might still be interested in it, because deleted pages are not immediately recycled (except on vacuum full), and the left and right sibling pointers stay intact until no

[PATCHES] Page at a time index scan

2006-05-02 Thread Heikki Linnakangas
Here's a patch that implements page at a time index scans discussed at pgsql-hackers earlier. See proposal 1 at: http://archives.postgresql.org/pgsql-hackers/2006-03/msg01237.php It passes regression tests, and there's no known bugs. There's some minor issues I'd like to point out, though:

Re: [PATCHES] Page at a time index scan

2006-05-02 Thread Heikki Linnakangas
On Tue, 2 May 2006, Tom Lane wrote: Agreed. The pin has two functions: - keep the page from being moved out of the bufmgr - no need anymore - stop a vacuum from removing the page - no need anymore. We'll not stop on a removable row anymore, so no need. At the moment, backward scan returns to

Re: [PATCHES] Page at a time index scan

2006-05-02 Thread Tom Lane
Heikki Linnakangas [EMAIL PROTECTED] writes: On Tue, 2 May 2006, Tom Lane wrote: Also, as noted in other contexts, it'd be a good idea if vacuumcleanup was told the total number of heap tuples (GIN needs this), and both steps really ought to be able to find out if it's a full or lazy vacuum.

Re: [PATCHES] Page at a time index scan

2006-05-02 Thread Tom Lane
Heikki Linnakangas [EMAIL PROTECTED] writes: On Tue, 2 May 2006, Tom Lane wrote: Backwards scan may break this whole concept; are you sure you've thought it through? I think so. The patch doesn't change the walk-left code. Do you have something specific in mind? I'm worried about