Re: [HACKERS] Skip all-visible pages during second HeapScan of CIC
On 3/7/17 9:42 PM, Pavan Deolasee wrote: Fair point. I'm not going to "persist" with the idea too long. It seemed like a good, low-risk feature to me which can benefit certain use cases quite reasonably. It's not uncommon to create indexes (or reindex existing indexes to remove index bloats) on extremely large tables and avoiding a second heap scan can hugely benefit such cases. Holding up the patch for something for which we don't even have a proposal yet seemed a bit strange at first, but I see the point. Anyways, for a recently vacuumed table of twice the size of RAM and on a machine with SSDs, the patched CIC version runs about 25% faster. That's probably the best case scenario. I also realised that I'd attached a slightly older version of the patch. So new version attached, but I'll remove the patch from CF if I don't see any more votes in favour of the patch. I have marked this submission "Returned with Feedback". Please feel free to resubmit to a future commitfest or open a discussion on hackers if you develop a new approach. Thanks, -- -David da...@pgmasters.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Skip all-visible pages during second HeapScan of CIC
* Andres Freund (and...@anarazel.de) wrote: > On 2017-03-07 21:38:40 -0500, Robert Haas wrote: > > > I wonder however, if careful snapshot managment couldn't solve this as > > > well. I have *not* thought a lot about this, but afaics we can easily > > > prevent all-visible from being set in cases it'd bother us by having an > > > "appropriate" xmin / registered snapshot. > > > > Yeah, but that's a tax on the whole system. > > I'm not sure I can buy that argument. CIC *already* holds a snapshot > during each of the two scans. By reducing the amount of time that's held > in the second scan, the systemwide impact is reduced, because it's held > for a shorter amount of time. We need to hold a slightly "more > aggressive" snapshot, that's true, but if you have high xid throughput > those effects should roughly balance each other. For my 2c, at least, this certainly sounds like a potentially promising approach too and I tend to agree with Andres on it not really being an issue to have a more aggressive snapshot for the duration of the CIC. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Skip all-visible pages during second HeapScan of CIC
* Pavan Deolasee (pavan.deola...@gmail.com) wrote: > On Wed, Mar 8, 2017 at 7:33 AM, Robert Haas wrote: > > > On Tue, Mar 7, 2017 at 4:26 PM, Stephen Frost wrote: > > > Right, that's what I thought he was getting at and my general thinking > > > was that we would need a way to discover if a CIC is ongoing on the > > > relation and therefore heap_page_prune(), or anything else, would know > > > that it can't twiddle the bits in the VM due to the ongoing CIC. > > > Perhaps a lock isn't the right answer there, but it would have to be > > > some kind of cross-process communication that operates at a relation > > > level.. > > > > Well, I guess that's one option. I lean toward the position already > > taken by Andres and Peter, namely, that it's probably not a great idea > > to pursue this optimization. > > Fair point. I'm not going to "persist" with the idea too long. It seemed > like a good, low-risk feature to me which can benefit certain use cases > quite reasonably. It's not uncommon to create indexes (or reindex existing > indexes to remove index bloats) on extremely large tables and avoiding a > second heap scan can hugely benefit such cases. Holding up the patch for > something for which we don't even have a proposal yet seemed a bit strange > at first, but I see the point. I'm not really sure that I do. CIC's will not be so frequent that not allowing VM updates during their operation should really result in a huge drag on the system, imv. Of course, if the only way to realize a CIC is happening on the table is to perform some expensive operation and we have to do that every time then it's not going to be worth it, but I'm not sure that's really the case here. The issue here, as I understand it at least, is to come up with a way that we can make sure to not do anything which would screw up the CIC while it's running, that we can detect very cheaply so we don't slow things down during normal non-CIC-running periods. I suggested a new lock, in part because anything that's updating the VM is going to have to have some kind of lock anyway, and perhaps going for the "heavier" lock that would conflict with the CIC, in an optimistic manner, would make the normal non-CIC-running case essentially the same speed. If a CIC was running then the attempt to acquire the lock would fail and extra time would be spent acquiring a lower-weight lock, of course, but that's going to happen relatively rarely. > Anyways, for a recently vacuumed table of twice the size of RAM and on a > machine with SSDs, the patched CIC version runs about 25% faster. That's > probably the best case scenario. I agree, that certainly seems like a very nice performance improvement. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Skip all-visible pages during second HeapScan of CIC
On 2017-03-07 21:38:40 -0500, Robert Haas wrote: > > I wonder however, if careful snapshot managment couldn't solve this as > > well. I have *not* thought a lot about this, but afaics we can easily > > prevent all-visible from being set in cases it'd bother us by having an > > "appropriate" xmin / registered snapshot. > > Yeah, but that's a tax on the whole system. I'm not sure I can buy that argument. CIC *already* holds a snapshot during each of the two scans. By reducing the amount of time that's held in the second scan, the systemwide impact is reduced, because it's held for a shorter amount of time. We need to hold a slightly "more aggressive" snapshot, that's true, but if you have high xid throughput those effects should roughly balance each other. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Skip all-visible pages during second HeapScan of CIC
On Wed, Mar 8, 2017 at 8:08 AM, Robert Haas wrote: > On Tue, Mar 7, 2017 at 9:12 PM, Andres Freund wrote: > > > > > I wonder however, if careful snapshot managment couldn't solve this as > > well. I have *not* thought a lot about this, but afaics we can easily > > prevent all-visible from being set in cases it'd bother us by having an > > "appropriate" xmin / registered snapshot. > > Yeah, but that's a tax on the whole system. > > May be we can do that only when patched CIC (or any other operation which may not like concurrent VM set) is in progress. We could use what Stephen suggested upthread to find that state. But right now it's hard to think because there is nothing on either side so we don't know what gets impacted by aggressive VM set and how. Thanks, Pavan -- Pavan Deolasee http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [HACKERS] Skip all-visible pages during second HeapScan of CIC
On Wed, Mar 8, 2017 at 7:33 AM, Robert Haas wrote: > On Tue, Mar 7, 2017 at 4:26 PM, Stephen Frost wrote: > > Right, that's what I thought he was getting at and my general thinking > > was that we would need a way to discover if a CIC is ongoing on the > > relation and therefore heap_page_prune(), or anything else, would know > > that it can't twiddle the bits in the VM due to the ongoing CIC. > > Perhaps a lock isn't the right answer there, but it would have to be > > some kind of cross-process communication that operates at a relation > > level.. > > Well, I guess that's one option. I lean toward the position already > taken by Andres and Peter, namely, that it's probably not a great idea > to pursue this optimization. Fair point. I'm not going to "persist" with the idea too long. It seemed like a good, low-risk feature to me which can benefit certain use cases quite reasonably. It's not uncommon to create indexes (or reindex existing indexes to remove index bloats) on extremely large tables and avoiding a second heap scan can hugely benefit such cases. Holding up the patch for something for which we don't even have a proposal yet seemed a bit strange at first, but I see the point. Anyways, for a recently vacuumed table of twice the size of RAM and on a machine with SSDs, the patched CIC version runs about 25% faster. That's probably the best case scenario. I also realised that I'd attached a slightly older version of the patch. So new version attached, but I'll remove the patch from CF if I don't see any more votes in favour of the patch. Thanks, Pavan cic_skip_all_visible_v4.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Skip all-visible pages during second HeapScan of CIC
On Tue, Mar 7, 2017 at 9:12 PM, Andres Freund wrote: > On 2017-03-07 21:03:53 -0500, Robert Haas wrote: >> On Tue, Mar 7, 2017 at 4:26 PM, Stephen Frost wrote: >> > Right, that's what I thought he was getting at and my general thinking >> > was that we would need a way to discover if a CIC is ongoing on the >> > relation and therefore heap_page_prune(), or anything else, would know >> > that it can't twiddle the bits in the VM due to the ongoing CIC. >> > Perhaps a lock isn't the right answer there, but it would have to be >> > some kind of cross-process communication that operates at a relation >> > level.. >> >> Well, I guess that's one option. I lean toward the position already >> taken by Andres and Peter, namely, that it's probably not a great idea >> to pursue this optimization. I'm not totally dead-set on that >> position, but it doesn't seem likely that we can add material >> synchronization overhead to heap_page_prune(), and I'd rather maintain >> the option to set visibility map bits in more places in the future >> than give up that opportunity by optimizing CIC now. It's nice for >> CIC to be fast, but setting all visible bits more aggressively sounds >> nicer. > > Indeed. > > I wonder however, if careful snapshot managment couldn't solve this as > well. I have *not* thought a lot about this, but afaics we can easily > prevent all-visible from being set in cases it'd bother us by having an > "appropriate" xmin / registered snapshot. Yeah, but that's a tax on the whole system. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Skip all-visible pages during second HeapScan of CIC
On 2017-03-07 21:03:53 -0500, Robert Haas wrote: > On Tue, Mar 7, 2017 at 4:26 PM, Stephen Frost wrote: > > Right, that's what I thought he was getting at and my general thinking > > was that we would need a way to discover if a CIC is ongoing on the > > relation and therefore heap_page_prune(), or anything else, would know > > that it can't twiddle the bits in the VM due to the ongoing CIC. > > Perhaps a lock isn't the right answer there, but it would have to be > > some kind of cross-process communication that operates at a relation > > level.. > > Well, I guess that's one option. I lean toward the position already > taken by Andres and Peter, namely, that it's probably not a great idea > to pursue this optimization. I'm not totally dead-set on that > position, but it doesn't seem likely that we can add material > synchronization overhead to heap_page_prune(), and I'd rather maintain > the option to set visibility map bits in more places in the future > than give up that opportunity by optimizing CIC now. It's nice for > CIC to be fast, but setting all visible bits more aggressively sounds > nicer. Indeed. I wonder however, if careful snapshot managment couldn't solve this as well. I have *not* thought a lot about this, but afaics we can easily prevent all-visible from being set in cases it'd bother us by having an "appropriate" xmin / registered snapshot. - Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Skip all-visible pages during second HeapScan of CIC
On Tue, Mar 7, 2017 at 4:26 PM, Stephen Frost wrote: > Right, that's what I thought he was getting at and my general thinking > was that we would need a way to discover if a CIC is ongoing on the > relation and therefore heap_page_prune(), or anything else, would know > that it can't twiddle the bits in the VM due to the ongoing CIC. > Perhaps a lock isn't the right answer there, but it would have to be > some kind of cross-process communication that operates at a relation > level.. Well, I guess that's one option. I lean toward the position already taken by Andres and Peter, namely, that it's probably not a great idea to pursue this optimization. I'm not totally dead-set on that position, but it doesn't seem likely that we can add material synchronization overhead to heap_page_prune(), and I'd rather maintain the option to set visibility map bits in more places in the future than give up that opportunity by optimizing CIC now. It's nice for CIC to be fast, but setting all visible bits more aggressively sounds nicer. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Skip all-visible pages during second HeapScan of CIC
Robert, * Robert Haas (robertmh...@gmail.com) wrote: > On Fri, Mar 3, 2017 at 6:06 PM, Stephen Frost wrote: > > * Andres Freund (and...@anarazel.de) wrote: > >> On 2017-02-28 19:12:03 +0530, Pavan Deolasee wrote: > >> > Since VM bits are only set during VACUUM which conflicts with CIC on the > >> > relation lock, I don't see any risk of incorrectly skipping pages that > >> > the > >> > second scan should have scanned. > >> > >> I think that's true currently, but it'd also prevent us from doing that > >> in additional places. Which, in my opinion, we really should (and I > >> believe that's realistically achievable). Thus I really don't want to > >> base the correctness of CIC - a relatively infrequent operation - on the > >> assumption that no VM bits can be set concurrenty due to the SUE lock. > > > > That sounds like we need a lock or similar mechanism to indicate that > > CIC depends on the VM not being changed, rather than saying it shouldn't > > depend on that because it might not always be true that the only other > > operation (VACUUM) sets them and already acquires a conflicting lock. > > I don't really think that would be a useful approach. I think what > Andres is thinking about -- or at least what comes to mind for me -- > is the possibility that heap_page_prune() might someday try to mark > pages all-visible. Then it would become something that happens from > time to time during foreground processing, rather than (as now) > something that only happens from within a maintenance command. Right, that's what I thought he was getting at and my general thinking was that we would need a way to discover if a CIC is ongoing on the relation and therefore heap_page_prune(), or anything else, would know that it can't twiddle the bits in the VM due to the ongoing CIC. Perhaps a lock isn't the right answer there, but it would have to be some kind of cross-process communication that operates at a relation level.. Just to be clear, I wasn't thinking that heap_page_prune() or a foreground process would actually block on the lock, just that it would try to acquire it if it decided that it could twiddle the VM and if it wasn't able to immediately then it would just leave it alone and move on. I'll admit that I'm not super familiar with the CIC or VM internals, frankly, so I may be all wet with this entire line of thinking, but figured I'd at least explain what the idea was. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Skip all-visible pages during second HeapScan of CIC
On Fri, Mar 3, 2017 at 6:06 PM, Stephen Frost wrote: > * Andres Freund (and...@anarazel.de) wrote: >> On 2017-02-28 19:12:03 +0530, Pavan Deolasee wrote: >> > Since VM bits are only set during VACUUM which conflicts with CIC on the >> > relation lock, I don't see any risk of incorrectly skipping pages that the >> > second scan should have scanned. >> >> I think that's true currently, but it'd also prevent us from doing that >> in additional places. Which, in my opinion, we really should (and I >> believe that's realistically achievable). Thus I really don't want to >> base the correctness of CIC - a relatively infrequent operation - on the >> assumption that no VM bits can be set concurrenty due to the SUE lock. > > That sounds like we need a lock or similar mechanism to indicate that > CIC depends on the VM not being changed, rather than saying it shouldn't > depend on that because it might not always be true that the only other > operation (VACUUM) sets them and already acquires a conflicting lock. I don't really think that would be a useful approach. I think what Andres is thinking about -- or at least what comes to mind for me -- is the possibility that heap_page_prune() might someday try to mark pages all-visible. Then it would become something that happens from time to time during foreground processing, rather than (as now) something that only happens from within a maintenance command. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Skip all-visible pages during second HeapScan of CIC
On 2017-03-03 15:12:04 -0800, Peter Geoghegan wrote: > On Tue, Feb 28, 2017 at 5:42 AM, Pavan Deolasee > wrote: > > During the second heap scan of CREATE INDEX CONCURRENTLY, we're only > > interested in the tuples which were inserted after the first scan was > > started. All such tuples can only exists in pages which have their VM bit > > unset. So I propose the attached patch which consults VM during second scan > > and skip all-visible pages. We do the same trick of skipping pages only if > > certain threshold of pages can be skipped to ensure OS's read-ahead is not > > disturbed. > > BTW, is there any danger of VACUUM acquiring a lock on the heap > relation (i.e. vacuuming it) after the first CIC transaction ends, but > before the second CIC transaction begins? We hold session level locks to prevent such danger. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Skip all-visible pages during second HeapScan of CIC
On Tue, Feb 28, 2017 at 5:42 AM, Pavan Deolasee wrote: > During the second heap scan of CREATE INDEX CONCURRENTLY, we're only > interested in the tuples which were inserted after the first scan was > started. All such tuples can only exists in pages which have their VM bit > unset. So I propose the attached patch which consults VM during second scan > and skip all-visible pages. We do the same trick of skipping pages only if > certain threshold of pages can be skipped to ensure OS's read-ahead is not > disturbed. BTW, is there any danger of VACUUM acquiring a lock on the heap relation (i.e. vacuuming it) after the first CIC transaction ends, but before the second CIC transaction begins? -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Skip all-visible pages during second HeapScan of CIC
Andres, * Andres Freund (and...@anarazel.de) wrote: > On 2017-02-28 19:12:03 +0530, Pavan Deolasee wrote: > > Since VM bits are only set during VACUUM which conflicts with CIC on the > > relation lock, I don't see any risk of incorrectly skipping pages that the > > second scan should have scanned. > > I think that's true currently, but it'd also prevent us from doing that > in additional places. Which, in my opinion, we really should (and I > believe that's realistically achievable). Thus I really don't want to > base the correctness of CIC - a relatively infrequent operation - on the > assumption that no VM bits can be set concurrenty due to the SUE lock. That sounds like we need a lock or similar mechanism to indicate that CIC depends on the VM not being changed, rather than saying it shouldn't depend on that because it might not always be true that the only other operation (VACUUM) sets them and already acquires a conflicting lock. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Skip all-visible pages during second HeapScan of CIC
On Fri, Mar 3, 2017 at 2:54 PM, Andres Freund wrote: > On 2017-02-28 19:12:03 +0530, Pavan Deolasee wrote: >> Since VM bits are only set during VACUUM which conflicts with CIC on the >> relation lock, I don't see any risk of incorrectly skipping pages that the >> second scan should have scanned. > > I think that's true currently, but it'd also prevent us from doing that > in additional places. Which, in my opinion, we really should (and I > believe that's realistically achievable). Thus I really don't want to > base the correctness of CIC - a relatively infrequent operation - on the > assumption that no VM bits can be set concurrenty due to the SUE lock. I agree. FWIW, the extra time that CIC takes over a plain CI is much reduced these days. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Skip all-visible pages during second HeapScan of CIC
Hi, On 2017-02-28 19:12:03 +0530, Pavan Deolasee wrote: > Since VM bits are only set during VACUUM which conflicts with CIC on the > relation lock, I don't see any risk of incorrectly skipping pages that the > second scan should have scanned. I think that's true currently, but it'd also prevent us from doing that in additional places. Which, in my opinion, we really should (and I believe that's realistically achievable). Thus I really don't want to base the correctness of CIC - a relatively infrequent operation - on the assumption that no VM bits can be set concurrenty due to the SUE lock. Regards, Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Skip all-visible pages during second HeapScan of CIC
On Tue, Feb 28, 2017 at 10:42 PM, Pavan Deolasee wrote: > Hello All, > > During the second heap scan of CREATE INDEX CONCURRENTLY, we're only > interested in the tuples which were inserted after the first scan was > started. All such tuples can only exists in pages which have their VM bit > unset. So I propose the attached patch which consults VM during second scan > and skip all-visible pages. We do the same trick of skipping pages only if > certain threshold of pages can be skipped to ensure OS's read-ahead is not > disturbed. Great. > > The patch obviously shows significant reduction of time for building index > concurrently for very large tables, which are not being updated frequently > and which was vacuumed recently (so that VM bits are set). I can post > performance numbers if there is interest. Please share it. I'm curious. > For tables that are being updated > heavily, the threshold skipping was indeed useful and without that we saw a > slight regression. > > Since VM bits are only set during VACUUM which conflicts with CIC on the > relation lock, I don't see any risk of incorrectly skipping pages that the > second scan should have scanned. Agreed. And the patch looks good to me. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers