Re: [HACKERS] New gist vacuum.

2017-11-12 Thread Andrey Borodin
Hello!

> 31 янв. 2016 г., в 17:18, Alvaro Herrera  
> написал(а):
> 
> Костя Кузнецов wrote:
>> Thank you, Jeff.I reworking patch now. All // warning will 
>> be deleted.About memory consumption new version will control size 
>> of stack and will operate with map of little size because i want delete old 
>> style vacuum(now if maintenance_work_mem less than needed to build info map 
>> we use old-style vacuum with logical order).
> 
> You never got around to submitting the updated version of this patch,
> and it's been a long time now, so I'm marking it as returned with
> feedback for now.  Please do submit a new version once you have it,
> since this seems to be very useful.

I've rebased patch (see attachment), it seems to work. It still requires a bit 
of styling, docs and tests, at least.
Also, I thinks that hash table is not very good option if we have all pages 
there: we should either use array or do not fill table for every page.

If author and community do not object, I want to continue work on Konstantin's 
patch.

Best regards, Andrey Borodin.


0001-GiST-VACUUM-rebase.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New gist vacuum.

2015-12-01 Thread Костя Кузнецов
Thank you, Jeff.I reworking patch now. All // warning will be deleted.About memory consumption new version will control size of stack and will operate with map of little size because i want delete old style vacuum(now if maintenance_work_mem less than needed to build info map we use old-style vacuum with logical order).



Re: [HACKERS] New gist vacuum.

2015-10-31 Thread Jeff Janes
On Thu, Sep 10, 2015 at 3:52 PM, Костя Кузнецов  wrote:
> Hello. I am student from gsoc programm.
> My project is sequantial access in vacuum of gist.
>
> New vacuum has 2 big step:
> physical order scan pages and cleaning after 1 step.
>
>
> 1 scan - scan all pages and create information map(hashmap) and add
> information to rescan stack( stack of pages that needed to rescanning

This is interesting work.  I think the patch needs a rebase to the git
HEAD.  There is a minor conflict in gistRedoPageUpdateRecord.  It is a
little confusing  because your patch introduces new code and then
immediately comments it out (using //, which is not a comment style
allowed in our style guide) and that phantom code confuses the
conflict resolution process.

There are several other places throughout the patch that use //
comment style to comment out code which the patch itself added.  Those
should be removed, and the real comments should be converted to /*
this */ style.

I also got a compiler warning, it looks like a missing #include:

gistutil.c: In function 'gistNewBuffer':
gistutil.c:788:4: warning: implicit declaration of function
'TransactionIdPrecedes' [-Wimplicit-function-declaration]
if (GistPageIsDeleted(page) &&
TransactionIdPrecedes(p->pd_prune_xid, RecentGlobalDataXmin)) {
^


Also, I didn't see a check on the size of the stack.  Is there an
argument that this stack will not be able to grow to be large enough
to cause trouble?

Thanks,

Jeff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New gist vacuum.

2015-09-13 Thread Michael Paquier
On Fri, Sep 11, 2015 at 7:52 AM, Костя Кузнецов  wrote:
> old version:
>
> INFO: vacuuming "public.point_tbl"
> INFO: scanned index "gpointind" to remove 11184520 row versions
> DETAIL: CPU 84.70s/72.26u sec elapsed 27007.14 sec.
> [...]
>
> new vacuum is about
> INFO: vacuuming "public.point_tbl"
> INFO: scanned index "gpointind" to remove 11184520 row versions
> DETAIL: CPU 13.00s/27.57u sec elapsed 1864.22 sec.
> [...]
> There is a big speed up + we can reuse some pages.

Indeed. Interesting. You should definitely add your patch to the next
commit fest:
https://commitfest.postgresql.org/7/
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] New gist vacuum.

2015-09-11 Thread Костя Кузнецов
Hello. I am student from gsoc programm.My project is sequantial access in vacuum of gist. New vacuum has 2 big step:physical order scan pages and cleaning after 1 step.  1 scan - scan all pages and create information map(hashmap) and add information to rescan stack( stack of pages that needed to rescanning second step is work only with page(from rescan stack) where there is a changes. In new version of vacuum besides increased speed also there is a deleting of pages. Only leaf pages can be deleted. The process of deleteing pages is (1. delete link to page. 2. change rightlinks (if needed) 3. set deleted). I added 2 action in wal (when i set delete flag and when i change rightlinks). When i delete links to leaf pages from inner page i always save 1 link to leaf(avoiding situations with empty inner pages).I attach some speed benchmarks.i compare old and new version on my laptop(without ssd). the test: table "point_tbl" from regression database. i insert about 200 millions rows. after that i delete 33 million and run vacuum.size of index is about 18 gb.old version:INFO: vacuuming "public.point_tbl"INFO: scanned index "gpointind" to remove 11184520 row versionsDETAIL: CPU 84.70s/72.26u sec elapsed 27007.14 sec.INFO: "point_tbl": removed 11184520 row versions in 400715 pagesDETAIL: CPU 3.96s/3.10u sec elapsed 233.12 sec.INFO: scanned index "gpointind" to remove 11184523 row versionsDETAIL: CPU 87.10s/69.05u sec elapsed 26410.44 sec.INFO: "point_tbl": removed 11184523 row versions in 400715 pagesDETAIL: CPU 4.23s/3.36u sec elapsed 331.43 sec.INFO: scanned index "gpointind" to remove 11184523 row versionsDETAIL: CPU 87.65s/65.73u sec elapsed 26230.35 sec.INFO: "point_tbl": removed 11184523 row versions in 400715 pagesDETAIL: CPU 4.47s/3.41u sec elapsed 342.93 sec.INFO: scanned index "gpointind" to remove 866 row versionsDETAIL: CPU 79.97s/39.64u sec elapsed 23341.88 sec.INFO: "point_tbl": removed 866 row versions in 31 pagesDETAIL: CPU 0.00s/0.00u sec elapsed 0.00 sec.INFO: index "gpointind" now contains 201326592 row versions in 2336441 pagesDETAIL: 33554432 index row versions were removed.0 index pages have been deleted, 0 are currently reusable.  new vacuum is about INFO: vacuuming "public.point_tbl"INFO: scanned index "gpointind" to remove 11184520 row versionsDETAIL: CPU 13.00s/27.57u sec elapsed 1864.22 sec.INFO: "point_tbl": removed 11184520 row versions in 400715 pagesDETAIL: CPU 3.46s/2.86u sec elapsed 214.04 sec.INFO: scanned index "gpointind" to remove 11184523 row versionsDETAIL: CPU 14.17s/27.02u sec elapsed 2163.67 sec.INFO: "point_tbl": removed 11184523 row versions in 400715 pagesDETAIL: CPU 3.33s/2.99u sec elapsed 222.60 sec.INFO: scanned index "gpointind" to remove 11184523 row versionsDETAIL: CPU 11.84s/25.23u sec elapsed 1828.71 sec.INFO: "point_tbl": removed 11184523 row versions in 400715 pagesDETAIL: CPU 3.44s/2.81u sec elapsed 215.06 sec.INFO: scanned index "gpointind" to remove 866 row versionsDETAIL: CPU 5.62s/6.68u sec elapsed 176.67 sec.INFO: "point_tbl": removed 866 row versions in 31 pagesDETAIL: CPU 0.00s/0.00u sec elapsed 0.01 sec.INFO: index "gpointind" now contains 201326592 row versions in 2336360 pagesDETAIL: 33554432 index row versions were removed.150833 index pages have been deleted, 150833 are currently reusable.CPU 5.54s/2.08u sec elapsed 165.61 sec.INFO: "point_tbl": found 33554432 removable, 201326592 nonremovable row versions in 1202176 out of 1202176 pagesDETAIL: 0 dead row versions cannot be removed yet.There were 0 unused item pointers.Skipped 0 pages due to buffer pins.0 pages are entirely empty.CPU 73.50s/116.82u sec elapsed 8300.73 sec.INFO: analyzing "public.point_tbl"INFO: "point_tbl": scanned 100 of 1202176 pages, containing 16756 live rows and 0 dead rows; 100 rows in sample, 201326601 estimated total rowsVACUUM There is a big speed up + we can reuse some pages.Thanks.diff --git a/src/backend/access/gist/gist.c b/src/backend/access/gist/gist.c
index 0e49959..229d3f4 100644
--- a/src/backend/access/gist/gist.c
+++ b/src/backend/access/gist/gist.c
@@ -619,6 +619,12 @@ gistdoinsert(Relation r, IndexTuple itup, Size freespace, GISTSTATE *giststate)
 			GISTInsertStack *item;
 			OffsetNumber downlinkoffnum;
 
+			if(GistPageIsDeleted(stack->page)) {
+UnlockReleaseBuffer(stack->buffer);
+xlocked = false;
+state.stack = stack = stack->parent;
+continue;
+			}
 			downlinkoffnum = gistchoose(state.r, stack->page, itup, giststate);
 			iid = PageGetItemId(stack->page, downlinkoffnum);
 			idxtuple = (IndexTuple) PageGetItem(stack->page, iid);
diff --git a/src/backend/access/gist/gistbuild.c b/src/backend/access/gist/gistbuild.c
index ff888e2..c99ff7e 100644
--- a/src/backend/access/gist/gistbuild.c
+++ b/src/backend/access/gist/gistbuild.c
@@ -1128,11 +1128,6 @@ gistGetMaxLevel(Relation index)
  * but will be added there the first time we visit them.
  */
 
-typedef struct
-{
-	BlockNumber childblkno;		/* hash key */
-	BlockNumber parentblkno;
-}