Re: [HACKERS] Review: GIN non-intrusive vacuum of posting tree
Thank you, pushed Andrew Borodin wrote: 2017-03-22 22:48 GMT+05:00 Teodor Sigaev : hasEmptyChild? and hasNonEmptyChild (BTW, isAnyNonempy has missed 't') Yes, I think this naming is good. It's clear what's in common in these flags and what's different. And if the whole posting tree is empty,then we could mark root page as leaf and remove all other pages in tree without any locking. Although, it could be a task for separate patch. From the performance point of view, this is a very good idea. Both, performance of VACUUM and performance of Scans. But doing so we risk to leave some garbage pages in case of a crash. And I do not see how to avoid these without unlinking pages one by one. I agree, that leaving this trick for a separate patch is quite reasonable. Best regards, Andrey Borodin. -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: GIN non-intrusive vacuum of posting tree
2017-03-22 22:48 GMT+05:00 Teodor Sigaev : > hasEmptyChild? and hasNonEmptyChild (BTW, isAnyNonempy has missed 't') Yes, I think this naming is good. It's clear what's in common in these flags and what's different. > And if the whole posting tree is empty,then we could mark root page as leaf > and remove all other pages in tree without any locking. Although, it could > be a task for separate patch. >From the performance point of view, this is a very good idea. Both, performance of VACUUM and performance of Scans. But doing so we risk to leave some garbage pages in case of a crash. And I do not see how to avoid these without unlinking pages one by one. I agree, that leaving this trick for a separate patch is quite reasonable. Best regards, Andrey Borodin. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: GIN non-intrusive vacuum of posting tree
No, second conditional code will be called for any subtree, which contains totally empty subtree. That check !isRoot covers case when the entire posting tree should be erased: we cannot just quit out of recursive cleanup, we have to make a scan here, starting from root. Oh, I see Probably, variable isChildHasVoid has a bit confusing name. This flag indicates that some subtree: 1. Had empty pages 2. Did not bother deleting them, because there is a chance that it is a part of a bigger empty subtree. May be it'd be better to call the variable "someChildIsVoidSubtree". hasEmptyChild? and hasNonEmptyChild (BTW, isAnyNonempy has missed 't') And if the whole posting tree is empty,then we could mark root page as leaf and remove all other pages in tree without any locking. Although, it could be a task for separate patch. -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: GIN non-intrusive vacuum of posting tree
Hi, Teodor! 2017-03-21 20:32 GMT+05:00 Teodor Sigaev : > I had a look on patch That's great, thanks! > > /* > * All subtree is empty - just return TRUE to indicate that parent > must > * do a cleanup. Unless we are ROOT an there is way to go upper. > */ > > if(isChildHasVoid && !isAnyNonempy && !isRoot) > return TRUE; > > if(isChildHasVoid) > { > ... > ginScanToDelete(gvs, blkno, TRUE, &root, > InvalidOffsetNumber); > } > > In first 'if' clause I see !isRoot, so second if and corresponding > ginScanToDelete() could be called only for root in posting tree. No, second conditional code will be called for any subtree, which contains totally empty subtree. That check !isRoot covers case when the entire posting tree should be erased: we cannot just quit out of recursive cleanup, we have to make a scan here, starting from root. Probably, variable isChildHasVoid has a bit confusing name. This flag indicates that some subtree: 1. Had empty pages 2. Did not bother deleting them, because there is a chance that it is a part of a bigger empty subtree. May be it'd be better to call the variable "someChildIsVoidSubtree". >just remove lock for cleanup over ginVacuumPostingTreeLeaves() and if it >returns that tree contains empty page then lock the whole posting tree to do >ginScanToDelete() work. It is, indeed, viable approach. But currently proposed patch is capable of dealing with small page deletes without much of locking fuss, even in 4-5 level trees. How do you think, which way should we take? Best regards, Andrey Borodin. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: GIN non-intrusive vacuum of posting tree
Thank you for your suggestions, do not hesitate to ask any questions, concurrency and GIN both are very interesting topics. I had a look on patch and found some issue. Look at ginvacuum.c around line 387, function ginVacuumPostingTreeLeaves(): /* * All subtree is empty - just return TRUE to indicate that parent must * do a cleanup. Unless we are ROOT an there is way to go upper. */ if(isChildHasVoid && !isAnyNonempy && !isRoot) return TRUE; if(isChildHasVoid) { ... ginScanToDelete(gvs, blkno, TRUE, &root, InvalidOffsetNumber); } In first 'if' clause I see !isRoot, so second if and corresponding ginScanToDelete() could be called only for root in posting tree. If so, it seems to me, it wasn't a good idea to move ginScanToDelete() from ginVacuumPostingTree() and patch should just remove lock for cleanup over ginVacuumPostingTreeLeaves() and if it returns that tree contains empty page then lock the whole posting tree to do ginScanToDelete() work. -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: GIN non-intrusive vacuum of posting tree
2017-03-16 23:55 GMT+05:00 Peter Geoghegan : > On Thu, Mar 16, 2017 at 11:53 AM, Andrew Borodin wrote: >> 2. Thus, L&S fully concurrent vacuum is possible, indeed, and >> furthermore Theodor suggested that I should implement not only page >> eviction, but also page merge and tree condence algorithm. > > I think that it's very hard to make merging of pages that are not > completely empty work, while also using the L&Y algorithm. That's true. This is a distant plan... Best regards, Andrey Borodin. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: GIN non-intrusive vacuum of posting tree
On Thu, Mar 16, 2017 at 11:53 AM, Andrew Borodin wrote: > 2. Thus, L&S fully concurrent vacuum is possible, indeed, and > furthermore Theodor suggested that I should implement not only page > eviction, but also page merge and tree condence algorithm. I think that it's very hard to make merging of pages that are not completely empty work, while also using the L&Y algorithm. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: GIN non-intrusive vacuum of posting tree
2017-03-16 21:27 GMT+05:00 David Steele : > This patch applies cleanly and compiles at cccbdde. > > Jeff, any thoughts on Andrew's responses? Hi, David! I've got some updates on the matter of this patch, since the understanding of the B-tree bothered me much. Currently, I'm at PgConf.Russia, where I've contacted Theodor Sigaev, and he answered my questions about the GIN. 0. I think that proposed patch is safe (deadlock free, does not introduce new livelocks, all the resources guarded properly) 1. There _are_ high keys at the posting trees, they are just called rightmost keys, but in fact they are high keys in terms of L&Y algorithm. 2. Thus, L&S fully concurrent vacuum is possible, indeed, and furthermore Theodor suggested that I should implement not only page eviction, but also page merge and tree condence algorithm. 3. Eventually, I'll do that, certainly, but, currently, I can't predict the time it'll take. I think I'll start somewhere in the summer, may be right after GiST intrapage indexing. As for now, I think that having this patch in PostgreSQL 10 is viable. Best regards, Andrey Borodin. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: GIN non-intrusive vacuum of posting tree
On 2/5/17 11:04 AM, Andrew Borodin wrote: > Hi, Jeff! > > 2017-02-05 3:45 GMT+05:00 Jeff Davis : >> On Sun, Jan 22, 2017 at 10:32 PM, Jeff Davis wrote: >>> On Sat, Jan 21, 2017 at 4:25 AM, Andrew Borodin >>> wrote: >>> One idea I had that might be simpler is to use a two-stage page >>> delete. The first stage would remove the link from the parent and mark >>> the page deleted, but leave the right link intact and prevent >>> recycling. The second stage would follow the chain of right links >>> along each level, removing the right links to deleted pages and >>> freeing the page to be recycled. >> >> Do you think this approach is viable as a simplification? > > To consider this approach I need to ask several questions. > > 1. Are we discussing simplification of existing GIN vacuum? Or > proposed GIN vacuum? Please, note that they do not differ in the way > page is unlinked, function ginDeletePage() is almost untoched. > > 2. What do you mean by "stage"? In terms of the paper "A symmetric > concurrent B-tree algorithm" by Lanin&Shasha: stage is an > uninterrupted period of holding lock on nonempty page set. > Here's the picture https://www.screencast.com/t/xUpGKgkkU from L&S > Both paper (L&Y and L&S) tend to avoid lock coupling, which is > inevitable if you want to do parent unlink first. To prevent insertion > of records on a page, you have to mark it. If you are doing this in > the stage when you unlink from parent - you have to own both locks. > > 3. What do you want to simplify? Currently we unlink a page from > parent and left sibling in one stage, at cost of aquiring CleanUp lock > (the way we aquire it - is the diference between current and patched > version). > 2-stage algorithms will not be simplier, yet it will be more concurrent. > Please note, that during absence of high fence keys in GIN B-tree we > actually should implement algorithm from figure 3A > https://www.screencast.com/t/2cfGZtrzaz0z (It would be incorrect, but > only with presence of high keys) This patch applies cleanly and compiles at cccbdde. Jeff, any thoughts on Andrew's responses? -- -David da...@pgmasters.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: GIN non-intrusive vacuum of posting tree
Hi, Jeff! 2017-02-05 3:45 GMT+05:00 Jeff Davis : > On Sun, Jan 22, 2017 at 10:32 PM, Jeff Davis wrote: >> On Sat, Jan 21, 2017 at 4:25 AM, Andrew Borodin wrote: >> One idea I had that might be simpler is to use a two-stage page >> delete. The first stage would remove the link from the parent and mark >> the page deleted, but leave the right link intact and prevent >> recycling. The second stage would follow the chain of right links >> along each level, removing the right links to deleted pages and >> freeing the page to be recycled. > > Do you think this approach is viable as a simplification? To consider this approach I need to ask several questions. 1. Are we discussing simplification of existing GIN vacuum? Or proposed GIN vacuum? Please, note that they do not differ in the way page is unlinked, function ginDeletePage() is almost untoched. 2. What do you mean by "stage"? In terms of the paper "A symmetric concurrent B-tree algorithm" by Lanin&Shasha: stage is an uninterrupted period of holding lock on nonempty page set. Here's the picture https://www.screencast.com/t/xUpGKgkkU from L&S Both paper (L&Y and L&S) tend to avoid lock coupling, which is inevitable if you want to do parent unlink first. To prevent insertion of records on a page, you have to mark it. If you are doing this in the stage when you unlink from parent - you have to own both locks. 3. What do you want to simplify? Currently we unlink a page from parent and left sibling in one stage, at cost of aquiring CleanUp lock (the way we aquire it - is the diference between current and patched version). 2-stage algorithms will not be simplier, yet it will be more concurrent. Please note, that during absence of high fence keys in GIN B-tree we actually should implement algorithm from figure 3A https://www.screencast.com/t/2cfGZtrzaz0z (It would be incorrect, but only with presence of high keys) Best regards, Andrey Borodin. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: GIN non-intrusive vacuum of posting tree
On Sun, Jan 22, 2017 at 10:32 PM, Jeff Davis wrote: > On Sat, Jan 21, 2017 at 4:25 AM, Andrew Borodin wrote: > One idea I had that might be simpler is to use a two-stage page > delete. The first stage would remove the link from the parent and mark > the page deleted, but leave the right link intact and prevent > recycling. The second stage would follow the chain of right links > along each level, removing the right links to deleted pages and > freeing the page to be recycled. Do you think this approach is viable as a simplification? Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: GIN non-intrusive vacuum of posting tree
On Tue, Jan 31, 2017 at 4:31 PM, Vladimir Borodin wrote: > 31 янв. 2017 г., в 9:50, Michael Paquier > написал(а): > >> I am marking this patch as returned with feedback. > > Michael, sorry, but why? Because I have been through many patches today. > If I understood everything right, the main question > from Jeff was why is it implemented in such way? And Jeff wanted to see > other ways of solving the problem. Andrew wrote about them above and it > seems that implementing them would be quite expensive and without any > obvious win. I would rather expect some reaction from Jeff or may be someone > else, so shouldn’t it be marked as «Ready for committer» or at least «Moved > to next CF»? I have moved to to the next CF. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: GIN non-intrusive vacuum of posting tree
> 31 янв. 2017 г., в 9:50, Michael Paquier > написал(а): > > On Mon, Jan 30, 2017 at 4:28 PM, Andrew Borodin wrote: >> I'll summarize here my state of studying concurrent methods of page >> unlinking. >> >> GIN B-tree does not have "high key". That means, that rightmost key on >> a page is maximal for the non-leaf page. >> But I do not see anything theoretical in a way of implementation of >> Lanin and Shasha`s methods of page merging, with slight modifications. >> Their paper does not even mention high key(high fence key in papers by >> Goetz Graefe). >> >> But it's not a simple task due to large portions of shared code >> between entry tree and posting tree. >> >> Also, I do not see a reason why this method can be practically >> superior to proposed patch. >> >> Currently, I do not have resources to implement a proof of concept for >> fully concurrent page unlinking to make benchmarking. > > I am marking this patch as returned with feedback. Michael, sorry, but why? If I understood everything right, the main question from Jeff was why is it implemented in such way? And Jeff wanted to see other ways of solving the problem. Andrew wrote about them above and it seems that implementing them would be quite expensive and without any obvious win. I would rather expect some reaction from Jeff or may be someone else, so shouldn’t it be marked as «Ready for committer» or at least «Moved to next CF»? > -- > Michael -- May the force be with you… https://simply.name
Re: [HACKERS] Review: GIN non-intrusive vacuum of posting tree
On Mon, Jan 30, 2017 at 4:28 PM, Andrew Borodin wrote: > I'll summarize here my state of studying concurrent methods of page unlinking. > > GIN B-tree does not have "high key". That means, that rightmost key on > a page is maximal for the non-leaf page. > But I do not see anything theoretical in a way of implementation of > Lanin and Shasha`s methods of page merging, with slight modifications. > Their paper does not even mention high key(high fence key in papers by > Goetz Graefe). > > But it's not a simple task due to large portions of shared code > between entry tree and posting tree. > > Also, I do not see a reason why this method can be practically > superior to proposed patch. > > Currently, I do not have resources to implement a proof of concept for > fully concurrent page unlinking to make benchmarking. I am marking this patch as returned with feedback. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: GIN non-intrusive vacuum of posting tree
2017-01-25 12:09 GMT+05:00 Andrew Borodin : > 2017-01-24 22:29 GMT+05:00 Jeff Davis : >> By the way, can you show me where the Lehman and Yao paper addresses >> page recycling? > > Here J. Hellerstein comments L&Y paper [1] saying that effectively > there is no deletion algorithm provided. > Here [2] is paper referenced from nbtree docs. I'll check if this is > applicable to GIN B-tree. > > [1] http://db.cs.berkeley.edu/jmh/cs262b/treeCCR.html > [2] http://dl.acm.org/citation.cfm?id=324589 Hi! I'll summarize here my state of studying concurrent methods of page unlinking. GIN B-tree does not have "high key". That means, that rightmost key on a page is maximal for the non-leaf page. But I do not see anything theoretical in a way of implementation of Lanin and Shasha`s methods of page merging, with slight modifications. Their paper does not even mention high key(high fence key in papers by Goetz Graefe). But it's not a simple task due to large portions of shared code between entry tree and posting tree. Also, I do not see a reason why this method can be practically superior to proposed patch. Currently, I do not have resources to implement a proof of concept for fully concurrent page unlinking to make benchmarking. Best regards, Andrey Borodin. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: GIN non-intrusive vacuum of posting tree
2017-01-24 22:29 GMT+05:00 Jeff Davis : > On Tue, Jan 24, 2017 at 3:18 AM, Andrew Borodin wrote: >> Technically, approach of locking a subtree is not novel. Lehman and >> Yao focused on "that any process for manipulating the tree uses only a >> small (constant) number of locks at any time." We are locking unknown >> and possibly large amount of pages. > > By the way, can you show me where the Lehman and Yao paper addresses > page recycling? > > It says that one approach is to allow fewer than K entries on a leaf > node; presumably as few as zero. But it doesn't seem to show how to > remove all references to the page and recycle it in a new place in the > tree. > > Regards, > Jeff Davis Here J. Hellerstein comments L&Y paper [1] saying that effectively there is no deletion algorithm provided. Here [2] is paper referenced from nbtree docs. I'll check if this is applicable to GIN B-tree. [1] http://db.cs.berkeley.edu/jmh/cs262b/treeCCR.html [2] http://dl.acm.org/citation.cfm?id=324589 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: GIN non-intrusive vacuum of posting tree
On Tue, Jan 24, 2017 at 3:18 AM, Andrew Borodin wrote: > Technically, approach of locking a subtree is not novel. Lehman and > Yao focused on "that any process for manipulating the tree uses only a > small (constant) number of locks at any time." We are locking unknown > and possibly large amount of pages. By the way, can you show me where the Lehman and Yao paper addresses page recycling? It says that one approach is to allow fewer than K entries on a leaf node; presumably as few as zero. But it doesn't seem to show how to remove all references to the page and recycle it in a new place in the tree. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: GIN non-intrusive vacuum of posting tree
Hello! I see your point on alternatives. I will do my best to generate more ideas on how vacuum can be done withing existing index structure, this will take a day or two. In this message I'll answer some questions. 2017-01-23 22:45 GMT+05:00 Jeff Davis : > 1. I haven't seen the approach before of finding the parent of any > empty subtree. While that sounds like a reasonable thing to do, it > worries me to invent new approaches in an area as well-studied as > btrees. The problem is less that it's too complex, and more that it's > different. Future developers looking at this code will expect it to be > either very simple or very standard, because it's just a posting list. Technically, approach of locking a subtree is not novel. Lehman and Yao focused on "that any process for manipulating the tree uses only a small (constant) number of locks at any time." We are locking unknown and possibly large amount of pages. > 2. It relies on searches to only start from the very leftmost page > (correct?). If someone wants to optimize the intersection of two > posting trees later, they might break it if they don't understand the > assumptions in this patch. Searches start from root, to find a correct position. But cuncurrency of the patch does not depend on that. Patch relies on the idea, that at any given subtree, if: 1. There is no inserts within subtree (guaranteed by cleanup lock) 2. Every page was locked with Exclusive lock at least once, locks were taken from left to right, without lock coupling. than: there is no read searches within the subtree. > 3. This adds a heap allocation, and copies the contents of the page, > and then releases the pin and lock. GinStepRight is careful to avoid > doing that (it locks the target before releasing the page containing > the pointer). I don't see a problem in your patch, but again, we are > breaking an assumption that future developers might make. Yes, heap allocation worries me a lot too, but let me explain details. GinStepRight - is the place where lock coupling is done. That is, you lock rightlink before releasing already locked page. Generally, B-tree tends to lock just one page at a time, but this place is an important exception (not the only exception, but others behave similarly). Proposed patch is doing tree traversal at worst two times. First: scan trough the tree, lock internal pages for read, lock leafs for write, delete dead tuples. If empty subtrees are found, invoke for them second stage. Here children Block Numbers are stored in palloc`d array, to do this scan with minimum of locks. This is unacceptable if there may be concurrent VACUUM, which can delete a page when we are asleep for some reason. (My point of worring number 1) Second: this stage recives subtree containing a lot of pages which must be just deleted. Here we take cleanup lock on subtree root, ensuring no inserts are within (they hold pins on stack from root to leaf). Than we take exclusive locks on all pages from left to right, without lock coupling. When we find empty page, we lock left page too, thus doing lock coupling form right to left. If search might be wihtin a subtree, it could deadlock with us. (My point of worring number 2) But both points seems to have reasonable explanation why it cannot happen. Best regards, Andrey Borodin. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: GIN non-intrusive vacuum of posting tree
On Mon, Jan 23, 2017 at 1:55 AM, Andrew Borodin wrote: > Proposed patch, on a contrary, simplifies code. There is more code > deleted than added. It does not affect WAL, changes nothing in page > layout. I appreciate that it is fewer lines of code. My hesitation comes from a couple areas: 1. I haven't seen the approach before of finding the parent of any empty subtree. While that sounds like a reasonable thing to do, it worries me to invent new approaches in an area as well-studied as btrees. The problem is less that it's too complex, and more that it's different. Future developers looking at this code will expect it to be either very simple or very standard, because it's just a posting list. 2. It relies on searches to only start from the very leftmost page (correct?). If someone wants to optimize the intersection of two posting trees later, they might break it if they don't understand the assumptions in this patch. 3. This adds a heap allocation, and copies the contents of the page, and then releases the pin and lock. GinStepRight is careful to avoid doing that (it locks the target before releasing the page containing the pointer). I don't see a problem in your patch, but again, we are breaking an assumption that future developers might make. Your patch solves a real problem (a 90-second stall is clearly not good) and I don't want to dismiss that. But I'd like to consider some alternatives that may not have these downsides. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: GIN non-intrusive vacuum of posting tree
Hi! 2017-01-23 11:32 GMT+05:00 Jeff Davis : > I have some reservations about the complexity and novelty of the > approach. I don't think I've seen an approach quite like this one > before. It seems like the main reason you are using this approach is > because the btree structure was too simple (no left links); is that > right? My gut is telling me we should either leave it simple, or use a > real concurrent btree implementation. GIN B-tree is departed from backend\access\nbtree almost like nbtree is departed from L&Y algorithm. As far as I understand there is no "high key" concept as in nbtree. I'm not sure that it's possible to implement same level of cuncurrency as in nbtree without that. Also GIN B-tree is actually two different B-trees, large portions of code is shared between Entries tree and Postings tree. I'm not sure going fully concurrent vacuum worth it, there will be a lot of changes. And then there is compression...installing high keys into GIN may be a mess, with pg_upgrade. Proposed patch, on a contrary, simplifies code. There is more code deleted than added. It does not affect WAL, changes nothing in page layout. > One idea I had that might be simpler is to use a two-stage page > delete. The first stage would remove the link from the parent and mark > the page deleted, but leave the right link intact and prevent > recycling. The second stage would follow the chain of right links > along each level, removing the right links to deleted pages and > freeing the page to be recycled. As far as I recall, this is resemplant to Lenin and Shasha algorithm. I'll look into it, but I think it relies on concept of "high key" on a page somehow (high key is the key from a sibling page, not local rightmost key as GIN B-tree uses). > Another idea is to partition the posting tree so that the root page > actually points to K posting trees. Scans would need to merge them, > but it would allow inserts to progress in one tree while the other is > being vacuumed. I think this would give good concurrency even for K=2. > I just had this idea now, so I didn't think it through very well. This is efficient from a point of view of frequent vacuums. concurrency of same key inserts can benefit from this approach. But all aother operations are more complicated and less efficient. GIN already has Fast Update list to tackle problems in this manner. But I do not feel I have resources to try to implement it... Thank you for your considerations. I'll think about concurrency and "high keys" more, but I'm realy afraid of amount of changes which may be prerequisites. Best regards, Andrey Borodin. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: GIN non-intrusive vacuum of posting tree
On Sat, Jan 21, 2017 at 4:25 AM, Andrew Borodin wrote: > Hello Jeff! > >>Review of the code itself: > How do you think, is there anything else to improve in that patch or > we can mark it as 'Ready for committer'? > > I've checked the concurrency algorithm with original work of Lehman > and Yao on B-link-tree. For me everything looks fine (safe, deadlock > free and not increased probability of livelock due to > LockBufferForCleanup call). Hi, I looked at the patch some more. I have some reservations about the complexity and novelty of the approach. I don't think I've seen an approach quite like this one before. It seems like the main reason you are using this approach is because the btree structure was too simple (no left links); is that right? My gut is telling me we should either leave it simple, or use a real concurrent btree implementation. One idea I had that might be simpler is to use a two-stage page delete. The first stage would remove the link from the parent and mark the page deleted, but leave the right link intact and prevent recycling. The second stage would follow the chain of right links along each level, removing the right links to deleted pages and freeing the page to be recycled. Another idea is to partition the posting tree so that the root page actually points to K posting trees. Scans would need to merge them, but it would allow inserts to progress in one tree while the other is being vacuumed. I think this would give good concurrency even for K=2. I just had this idea now, so I didn't think it through very well. What do you think? Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: GIN non-intrusive vacuum of posting tree
Hello Jeff! >Review of the code itself: How do you think, is there anything else to improve in that patch or we can mark it as 'Ready for committer'? I've checked the concurrency algorithm with original work of Lehman and Yao on B-link-tree. For me everything looks fine (safe, deadlock free and not increased probability of livelock due to LockBufferForCleanup call). Best regards, Andrey Borodin. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: GIN non-intrusive vacuum of posting tree
Hi, Jeff! Thanks for review! Here's a patch corrected according to your suggestions. 2017-01-19 11:48 GMT+05:00 Jeff Davis : > https://www.postgresql.org/message-id/CAJEAwVE4UAmm8fr%2BNW8XTnKV6M--ACoNhL3ES8yoKL2sKhbaiw%40mail.gmail.com > > Let me re-summarize what's been done here to make sure I understand: > > Each key in GIN has its own posting tree, which is itself a btree, > holding all of the tuples that contain that key. This posting tree is > really just a set of tuples, and searches always scan an entire > posting tree (because all the tuples contain the key, so all are a > potential match). > > Currently, the cleanup process requires blocking all reads and writes > in the posting list. I assume this is only a problem when a key is > common enough that its posting list is quite large (how large?). The posting tree shall be at least 3 levels high to make this patch useful. I guess it's at least 1 postings of a single term (assuming average hundred of tuples per page). > This patch makes a couple changes to improve concurrency: > > 1. Vacuum leaves first, without removing any pages. Instead, just keep > track of pages which are completely empty (more generally, subtrees > that contain empty pages). > > 2. Free the empty pages in those subtrees. That requires blocking > reads and writes to that subtree, but not the entire posting list. > Blocking writes is easy: acquiring a cleanup lock on the root of any > subtree always blocks any writes to that subtree, because the write > keeps a pin on all pages in that path. Blocking reads is accomplished > by locking the page to be deleted, then locking/unlocking the page one > left of that (which may be outside the subtree?). No, leftmost page within tree. It may be referenced by rightlink from outside the subtree, that's the reason we ceep it alive even if it's empy. > Maybe a basic question, but why is the posting list a btree at all, > and not, say a doubly-linked list? When GIN executes searches usually it have to fetch documents which contains intersection of posting lists. It is faster to intersect B-trees, if common elements are rare. > > Review of the code itself: > > * Nice and simple, with a net line count of only +45. > * Remove commented-out code. I've removed the code. Though it's purpose was to accent changed tricky concurrency places > * In the README, "leaf-to-right" should be left-to-right; and "takinf" > should be "taking". Fixed > * When you say the leftmost page is never deleted; do you mean the > leftmost of the entire posting tree, or leftmost of the subtree that > you are removing pages from? Leftmost of a subtree, containing totally empty subtree. It's not neccesarily leftmost page of whole posting tree. > * In order to keep out concurrent reads, you need to lock/unlock the > left page while holding exclusive lock on the page being deleted, but > I didn't see how that happens exactly in the code. Where does that > happen? in function ginDeletePage() LockBuffer(lBuffer, GIN_EXCLUSIVE); We have to mark this page dirty, since it's rightlink is changed. We cannot mark it dirty without locking it, even if we surely know that no any reader can reach it out (due to cleanup lock at the root of cleaned subtree). Thank you for your suggestions, do not hesitate to ask any questions, concurrency and GIN both are very interesting topics. Best regards, Andrey Borodin. diff --git a/src/backend/access/gin/README b/src/backend/access/gin/README index fade0cb..990b5ff 100644 --- a/src/backend/access/gin/README +++ b/src/backend/access/gin/README @@ -314,10 +314,17 @@ deleted. The previous paragraph's reasoning only applies to searches, and only to posting trees. To protect from inserters following a downlink to a deleted page, vacuum simply locks out all concurrent insertions to the posting tree, -by holding a super-exclusive lock on the posting tree root. Inserters hold a -pin on the root page, but searches do not, so while new searches cannot begin -while root page is locked, any already-in-progress scans can continue -concurrently with vacuum. In the entry tree, we never delete pages. +by holding a super-exclusive lock on the parent page of subtree with deletable +pages. Inserters hold a pin on the root page, but searches do not, so while +new searches cannot begin while root page is locked, any already-in-progress +scans can continue concurrently with vacuum in corresponding subtree of +posting tree. To exclude interference with readers vacuum takes exclusive +locks in a depth-first scan in left-to-right order of page tuples. Leftmost +page is never deleted. Thus before deleting any page we obtain exclusive +lock on any left page, effectively excluding deadlock with any reader, despite +taking parent lock before current and left lock after current. We take left +lock not for a concurrency reasons, but rather in need to mark page dirty. +In the entry tree, we never delete pages. (This is quite different from the mechanism the btree i
[HACKERS] Review: GIN non-intrusive vacuum of posting tree
https://www.postgresql.org/message-id/CAJEAwVE4UAmm8fr%2BNW8XTnKV6M--ACoNhL3ES8yoKL2sKhbaiw%40mail.gmail.com Let me re-summarize what's been done here to make sure I understand: Each key in GIN has its own posting tree, which is itself a btree, holding all of the tuples that contain that key. This posting tree is really just a set of tuples, and searches always scan an entire posting tree (because all the tuples contain the key, so all are a potential match). Currently, the cleanup process requires blocking all reads and writes in the posting list. I assume this is only a problem when a key is common enough that its posting list is quite large (how large?). This patch makes a couple changes to improve concurrency: 1. Vacuum leaves first, without removing any pages. Instead, just keep track of pages which are completely empty (more generally, subtrees that contain empty pages). 2. Free the empty pages in those subtrees. That requires blocking reads and writes to that subtree, but not the entire posting list. Blocking writes is easy: acquiring a cleanup lock on the root of any subtree always blocks any writes to that subtree, because the write keeps a pin on all pages in that path. Blocking reads is accomplished by locking the page to be deleted, then locking/unlocking the page one left of that (which may be outside the subtree?). Maybe a basic question, but why is the posting list a btree at all, and not, say a doubly-linked list? Review of the code itself: * Nice and simple, with a net line count of only +45. * Remove commented-out code. * In the README, "leaf-to-right" should be left-to-right; and "takinf" should be "taking". * When you say the leftmost page is never deleted; do you mean the leftmost of the entire posting tree, or leftmost of the subtree that you are removing pages from? * In order to keep out concurrent reads, you need to lock/unlock the left page while holding exclusive lock on the page being deleted, but I didn't see how that happens exactly in the code. Where does that happen? Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers