The attached patch requires the new row to fit, and 10% to be free on the page. Would someone test that?
--------------------------------------------------------------------------- Tom Lane wrote: > ITAGAKI Takahiro <[EMAIL PROTECTED]> writes: > > This is a revised patch originated by Junji TERAMOTO for HEAD. > > [BTree vacuum before page splitting] > > http://archives.postgresql.org/pgsql-patches/2006-01/msg00301.php > > I think we can resurrect his idea because we will scan btree pages > > at-atime now; the missing-restarting-point problem went away. > > I've applied this but I'm now having some second thoughts about it, > because I'm seeing an actual *decrease* in pgbench numbers from the > immediately prior CVS HEAD code. Using > pgbench -i -s 10 bench > pgbench -c 10 -t 1000 bench (repeat this half a dozen times) > with fsync off but all other settings factory-stock, what I'm seeing > is that the first run looks really good but subsequent runs tail off in > spectacular fashion :-( Pre-patch there was only minor degradation in > successive runs. > > What I think is happening is that because pgbench depends so heavily on > updating existing records, we get into a state where an index page is > about full and there's one dead tuple on it, and then for each insertion > we have > > * check for uniqueness marks one more tuple dead (the > next-to-last version of the tuple) > * newly added code removes one tuple and does a write > * now there's enough room to insert one tuple > * lather, rinse, repeat, never splitting the page. > > The problem is that we've traded splitting a page every few hundred > inserts for doing a PageIndexMultiDelete, and emitting an extra WAL > record, on *every* insert. This is not good. > > Had you done any performance testing on this patch, and if so what > tests did you use? I'm a bit hesitant to try to fix it on the basis > of pgbench results alone. > > One possible fix that comes to mind is to only perform the cleanup > if we are able to remove more than one dead tuple (perhaps about 10 > would be good). Or do the deletion anyway, but then go ahead and > split the page unless X amount of space has been freed (where X is > more than just barely enough for the incoming tuple). > > After all the thought we've put into this, it seems a shame to > just abandon it :-(. But it definitely needs more tweaking. > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org -- Bruce Momjian [EMAIL PROTECTED] EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Index: src/backend/access/nbtree/nbtinsert.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/access/nbtree/nbtinsert.c,v retrieving revision 1.142 diff -c -c -r1.142 nbtinsert.c *** src/backend/access/nbtree/nbtinsert.c 25 Jul 2006 19:13:00 -0000 1.142 --- src/backend/access/nbtree/nbtinsert.c 26 Jul 2006 01:35:52 -0000 *************** *** 438,445 **** if (P_ISLEAF(lpageop) && P_HAS_GARBAGE(lpageop)) { _bt_vacuum_one_page(rel, buf); ! if (PageGetFreeSpace(page) >= itemsz) ! break; /* OK, now we have enough space */ } /* --- 438,451 ---- if (P_ISLEAF(lpageop) && P_HAS_GARBAGE(lpageop)) { _bt_vacuum_one_page(rel, buf); ! /* ! * Free space should be large enough for the new tuple and ! * should be >= 10% because scanning the page over and ! * over again to get just a little free space is inefficient. ! */ ! if (PageGetFreeSpace(page) >= itemsz && ! PageGetFreeSpace(page) >= BLCKSZ / 10) ! break; } /*
---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster