Re: [HACKERS] removing PD_ALL_VISIBLE
On Thu, May 30, 2013 at 09:47:22AM -0400, Robert Haas wrote: Well, as Heikki points out, I think that's unacceptably dangerous. Loss or corruption of a single visibility map page means possible loss of half a gigabyte of data. Also, if we go that route, looking at the visibility map is no longer an optimization; it's essential for correctness. We can't decide to skip it when it seems expensive, for example, as Jeff was proposing. Isn't the visibility map already required for proper return results as we use it for index-only scans. I think the optimization-only ship has sailed. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] removing PD_ALL_VISIBLE
On 2013-05-31 13:14:13 -0400, Bruce Momjian wrote: On Thu, May 30, 2013 at 09:47:22AM -0400, Robert Haas wrote: Well, as Heikki points out, I think that's unacceptably dangerous. Loss or corruption of a single visibility map page means possible loss of half a gigabyte of data. Also, if we go that route, looking at the visibility map is no longer an optimization; it's essential for correctness. We can't decide to skip it when it seems expensive, for example, as Jeff was proposing. Isn't the visibility map already required for proper return results as we use it for index-only scans. I think the optimization-only ship has sailed. At the moment we can remove it without causing corruption. If we were to use it for freezing we couldn't anymore. So there's a difference - how big it is I am not sure. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] removing PD_ALL_VISIBLE
Isn't the visibility map already required for proper return results as we use it for index-only scans. I think the optimization-only ship has sailed. At the moment we can remove it without causing corruption. If we were to use it for freezing we couldn't anymore. So there's a difference - how big it is I am not sure. Depends on your definition of corruption, really. But yes, right now, the vismap can lose bits without causing any corruption, and making all-frozen depend on it would eliminate that. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] removing PD_ALL_VISIBLE
On Fri, May 31, 2013 at 10:28:12AM -0700, Josh Berkus wrote: Isn't the visibility map already required for proper return results as we use it for index-only scans. I think the optimization-only ship has sailed. At the moment we can remove it without causing corruption. If we were to use it for freezing we couldn't anymore. So there's a difference - how big it is I am not sure. Depends on your definition of corruption, really. But yes, right now, the vismap can lose bits without causing any corruption, and making all-frozen depend on it would eliminate that. Roberts statement was: Loss or corruption of a single visibility map page means possible loss of half a gigabyte of data. Certainly unidentified corruption of a visibility map page could easily cause incorrect results. So, technically, _adding_ bits would cause corruption. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] removing PD_ALL_VISIBLE
Bruce, Roberts statement was: Loss or corruption of a single visibility map page means possible loss of half a gigabyte of data. I fail to be alarmed at this; currently losing a single page of the clog causes just as widespread corruption (worse, actually, since it's not confined to one table). It does point to the eventual need to checksum these things, though. Certainly unidentified corruption of a visibility map page could easily cause incorrect results. So, technically, _adding_ bits would cause corruption. Yes, that's already true. I'm pointing out that if we depend on the vismap for all-frozen, then losing bits *also* causes corruption, so that's something we need to test for. Right now, there is no possible corruption from losing bits; we simply end up scannning more pages than we have to. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] removing PD_ALL_VISIBLE
On Fri, May 31, 2013 at 11:00:19AM -0700, Josh Berkus wrote: Bruce, Roberts statement was: Loss or corruption of a single visibility map page means possible loss of half a gigabyte of data. I fail to be alarmed at this; currently losing a single page of the clog causes just as widespread corruption (worse, actually, since it's not confined to one table). It does point to the eventual need to checksum these things, though. Certainly unidentified corruption of a visibility map page could easily cause incorrect results. So, technically, _adding_ bits would cause corruption. Yes, that's already true. I'm pointing out that if we depend on the vismap for all-frozen, then losing bits *also* causes corruption, so that's something we need to test for. Right now, there is no possible corruption from losing bits; we simply end up scannning more pages than we have to. Right, and it is hard to see that losing and adding are somehow more/less likely, so it seems we already realy on the visibility map being correct. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] removing PD_ALL_VISIBLE
* Bruce Momjian (br...@momjian.us) wrote: Right, and it is hard to see that losing and adding are somehow more/less likely, so it seems we already realy on the visibility map being correct. Yes, but there's also a way to get *back* to a valid state if things go south with the visibility map. It's similar to an index today- sure, you might get corruption, and that might give you wrong results, but if you detect it, it's easy to correct- rebuild the index, drop the visibility map, etc. With CRCs, it'll be even easier to detect and discover when you need to do such a thing. Corruption in the heap is certainly bad too- you may lose some records, but you could zero those pages out and move on with only losing 8k or what-have-you. Clog corruption is certainly a bad thing, but if nearly everything in your DB is already frozen, it's not as bad as it *could* be. I wonder if you could rebuild a portion of clog from the WAL.. There are a lot of different tradeoffs here, certainly. It's certainly nice that single bit or single page corruption can often be recovered from by rebuilding an index or rebuilding the visbility map or just nuking a few records from a table. CRCs help with identifying when corruption has happened, but they do nothing for what to *do* when it happens, and restore completely from backup isn't a great answer when you've got terrabytes of data. Where I'm going with this whole thing is simply that I do worry a bit about using a bitmap for freeze, or similar, information and not being able to reconstruct that bitmap from the heap. Perhaps that's overly paranoid, but, well, we also write the same data out to disk in multiple places multiple times- some might call that paranoid too. ;) Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] removing PD_ALL_VISIBLE
On 05/31/2013 12:22 PM, Stephen Frost wrote: Where I'm going with this whole thing is simply that I do worry a bit about using a bitmap for freeze, or similar, information and not being able to reconstruct that bitmap from the heap. Perhaps that's overly paranoid, but, well, we also write the same data out to disk in multiple places multiple times- some might call that paranoid too. ;) On the other hand, we could combine Heikki's proposal (epoch numbers in the page header) together with using the visibility map for pages we don't need to vacuum freeze, and get vastly better behavior without increasing corruption risk that I can see. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] removing PD_ALL_VISIBLE
* Josh Berkus (j...@agliodbs.com) wrote: On 05/31/2013 12:22 PM, Stephen Frost wrote: Where I'm going with this whole thing is simply that I do worry a bit about using a bitmap for freeze, or similar, information and not being able to reconstruct that bitmap from the heap. Perhaps that's overly paranoid, but, well, we also write the same data out to disk in multiple places multiple times- some might call that paranoid too. ;) On the other hand, we could combine Heikki's proposal (epoch numbers in the page header) together with using the visibility map for pages we don't need to vacuum freeze, and get vastly better behavior without increasing corruption risk that I can see. That was actually my thinking as well.. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] removing PD_ALL_VISIBLE
On Fri, May 31, 2013 at 1:44 PM, Bruce Momjian br...@momjian.us wrote: On Fri, May 31, 2013 at 10:28:12AM -0700, Josh Berkus wrote: Isn't the visibility map already required for proper return results as we use it for index-only scans. I think the optimization-only ship has sailed. At the moment we can remove it without causing corruption. If we were to use it for freezing we couldn't anymore. So there's a difference - how big it is I am not sure. Depends on your definition of corruption, really. But yes, right now, the vismap can lose bits without causing any corruption, and making all-frozen depend on it would eliminate that. Roberts statement was: Loss or corruption of a single visibility map page means possible loss of half a gigabyte of data. Certainly unidentified corruption of a visibility map page could easily cause incorrect results. So, technically, _adding_ bits would cause corruption. Adding bits could cause tuples that ought to be invisible to be treated as visible. Currently, removing bits is harmless (except to performance), but if we used the VM bit to indicate whether the page was frozen in lieu of actually freezing it, a cleared bit would potentially cause vacuum to nuke everything on that page. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] removing PD_ALL_VISIBLE
On Fri, May 31, 2013 at 3:45 PM, Josh Berkus j...@agliodbs.com wrote: On 05/31/2013 12:22 PM, Stephen Frost wrote: Where I'm going with this whole thing is simply that I do worry a bit about using a bitmap for freeze, or similar, information and not being able to reconstruct that bitmap from the heap. Perhaps that's overly paranoid, but, well, we also write the same data out to disk in multiple places multiple times- some might call that paranoid too. ;) On the other hand, we could combine Heikki's proposal (epoch numbers in the page header) together with using the visibility map for pages we don't need to vacuum freeze, and get vastly better behavior without increasing corruption risk that I can see. Yeah, I was thinking about that as well. In fact, under either Heikki's proposal or Andres's proposal or my variants of either one of them, anti-wraparound vacuums no longer need to scan all-visible pages. Under Andres's proposal (and variants), all-visible pages are ipso facto frozen and therefore need not be scanned for freezing. But under Heikki's proposal (and variants), anti-wraparound vacuums only need to remove dead tuples; freezing live ones is a no-op. And all-visible pages don't contain any dead tuples, so we're right back in the same place.[1] Where things diverge a little is what you when an anti-wraparound scan encounters a page that isn't all-visible and can't be marked all-visible. Under the XID epoch family of proposals, we need to truncate any dead tuples on the page to line pointers, and that's it. Under the treat all-visible as frozen family of proposals, we *also* need to do old-style freezing on any aged but live tuples on the page. So the XID epoch saves write I/O in this case, because we don't dirty pages or write WAL just because we see a tuple with a high XID age. Only pages that contain actual dead tuples get dirtied. So on further reflection, I'm not seeing any advantage to combining the two proposals. The XID epoch approach has the complications of requiring page format changes and consuming space on every page, although I think I may have worked out a variant approach that will avoid that (which no one has commented on, so maybe I'm hallucinating). But in other respects it seems better all around. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company [1] I said upthread that Heikki's idea would require a separate freeze map, but now I think I was wrong. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] removing PD_ALL_VISIBLE
On Thu, May 30, 2013 at 12:06 AM, Jeff Davis pg...@j-davis.com wrote: AFAICS, the main benefit of eliminating PD_ALL_VISIBLE is that we eliminate one write cycle; that is, we won't dirty the page once to hint it and then again to mark it all-visible. But as of 9.3, that should really only be a problem in the insert-only case. And in that case, my proposal to consider all-visible pages as frozen would be a huge win, because you'd only need to emit XLOG_HEAP_VISIBLE for every page in the heap, rather than XLOG_HEAP_FREEZE. Agreed. Just to quantify that a bit more, I ran this command a couple of times: dropdb rhaas ; sleep 5 ; createdb ; sleep 5 ; pgbench -i -s 1000 -n; sleep 5 ; time psql -c checkpoint ; time psql -c 'vacuum' And also this one: dropdb rhaas ; sleep 5 ; createdb ; sleep 5 ; pgbench -i -s 1000 -n; sleep 5 ; time psql -c checkpoint ; time psql -c 'vacuum freeze' In the first one, the vacuum at the end takes about 25 seconds. In the second one, it takes about 15 minutes, during which time there's one CPU core running at about 10%; the remainder of the time is spent waiting for disk I/O. A little follow-up testing shows that the vacuum emits 88MB of WAL, while the vacuum freeze emits 13GB of WAL. This is on the 16-core, 64-thread IBM POWER box with the following non-default configuration settings: shared_buffers = 8GB maintenance_work_mem = 1GB synchronous_commit = off checkpoint_segments = 300 checkpoint_timeout = 15min checkpoint_completion_target = 0.9 log_line_prefix = '%t [%p] ' Andres' proposal for freezing at the same time we mark pages all-visible relies on emitting FPIs when we mark pages all-visible, but I hope that the test above is convincing evidence that it would be *really* expensive for some users. My proposal to consider all-visible pages as frozen avoids that cost, but as far as I can see, it also requires PD_ALL_VISIBLE to stick around. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] removing PD_ALL_VISIBLE
On 2013-05-30 07:54:38 -0400, Robert Haas wrote: On Thu, May 30, 2013 at 12:06 AM, Jeff Davis pg...@j-davis.com wrote: AFAICS, the main benefit of eliminating PD_ALL_VISIBLE is that we eliminate one write cycle; that is, we won't dirty the page once to hint it and then again to mark it all-visible. But as of 9.3, that should really only be a problem in the insert-only case. And in that case, my proposal to consider all-visible pages as frozen would be a huge win, because you'd only need to emit XLOG_HEAP_VISIBLE for every page in the heap, rather than XLOG_HEAP_FREEZE. Agreed. Just to quantify that a bit more, I ran this command a couple of times: dropdb rhaas ; sleep 5 ; createdb ; sleep 5 ; pgbench -i -s 1000 -n; sleep 5 ; time psql -c checkpoint ; time psql -c 'vacuum' And also this one: dropdb rhaas ; sleep 5 ; createdb ; sleep 5 ; pgbench -i -s 1000 -n; sleep 5 ; time psql -c checkpoint ; time psql -c 'vacuum freeze' In the first one, the vacuum at the end takes about 25 seconds. In the second one, it takes about 15 minutes, during which time there's one CPU core running at about 10%; the remainder of the time is spent waiting for disk I/O. A little follow-up testing shows that the vacuum emits 88MB of WAL, while the vacuum freeze emits 13GB of WAL. This is on the 16-core, 64-thread IBM POWER box with the following non-default configuration settings: shared_buffers = 8GB maintenance_work_mem = 1GB synchronous_commit = off checkpoint_segments = 300 checkpoint_timeout = 15min checkpoint_completion_target = 0.9 log_line_prefix = '%t [%p] ' Andres' proposal for freezing at the same time we mark pages all-visible relies on emitting FPIs when we mark pages all-visible, but I hope that the test above is convincing evidence that it would be *really* expensive for some users. My proposal to consider all-visible pages as frozen avoids that cost I think I basically suggested treating all visible as frozen, didn't I? If not, I had lost sync between my fingers and my thoughts which happens too often ;). You had noticed that my proposed was lacking a bit around when we omit FPIs for the page while setting all-visible, but we both thought that we may find a workaround that - which looking at the page level flag first basically is. As far as I understand the trick basically is that we can rely on an FPI being logged when an action unsetting ALL_VISIBLE is performed. That all-visible would then make sure the hint-bits marking indvidual tuples as frozen would hit disk. For that we need to add some more work though, consider: 1) write tuples on a page 2) freeze page by setting ALL_VISIBLE and setting hint bits. Setting ALL_VISIBLE is wall logged 3) crash 4) replay ALL_VISIBLE, set it on the page level. The individual tuples are *not* guaranteed to be marked frozen. 5) update tuple on the page unsetting all visible. Emits an FPI which does *not* have the tuples marked as frozen. Easy enough and fairly cheap to fix by having a function that checks that updates the hint bits on a page when unsetting all visible since we can just set it for all pre-existing tuples. but as far as I can see, it also requires PD_ALL_VISIBLE to stick around. Now, I am far from being convinced its a good idea to get rid of PD_ALL_VISIBLE, but I don't think it does. Except that it currently is legal for the page level ALL_VISIBLE being set while the corresponding visibilitymap one isn't there's not much prohibiting us fundamentally from looking in the vm when we need to know whether the page is all visible, is there? To the contrary, this actually seems to be a pretty good case for Jeff's proposed behaviour since it would allow freezing while only writing the vm? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] removing PD_ALL_VISIBLE
On 30.05.2013 15:12, Andres Freund wrote: Now, I am far from being convinced its a good idea to get rid of PD_ALL_VISIBLE, but I don't think it does. Except that it currently is legal for the page level ALL_VISIBLE being set while the corresponding visibilitymap one isn't there's not much prohibiting us fundamentally from looking in the vm when we need to know whether the page is all visible, is there? Hmm, so you're suggesting that the visibility map would be *required* to interpret the pages correctly. Ie. if you just zapped the visibility map, you'd lose critical information and the heap would appear to be corrupt. I guess that's possible, but it makes me quite uneasy. At the moment, it's relieving to know that it's always safe to just truncate the visibility map in case of emergency. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] removing PD_ALL_VISIBLE
On 2013-05-30 15:34:04 +0300, Heikki Linnakangas wrote: On 30.05.2013 15:12, Andres Freund wrote: Now, I am far from being convinced its a good idea to get rid of PD_ALL_VISIBLE, but I don't think it does. Except that it currently is legal for the page level ALL_VISIBLE being set while the corresponding visibilitymap one isn't there's not much prohibiting us fundamentally from looking in the vm when we need to know whether the page is all visible, is there? Hmm, so you're suggesting that the visibility map would be *required* to interpret the pages correctly. Ie. if you just zapped the visibility map, you'd lose critical information and the heap would appear to be corrupt. I guess that's possible, but it makes me quite uneasy. At the moment, it's relieving to know that it's always safe to just truncate the visibility map in case of emergency. I didn't say its a good idea, just that I don't think Robert's conclusion is necessarily valid. But requiring only the few kbytes of the vm to be written instead of all of the heap during freezeing (or whatever we would call it) has quite some allure, I admit that. But I think that should be a separate project to reeingineering how freezing works. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] removing PD_ALL_VISIBLE
On Thu, May 30, 2013 at 8:12 AM, Andres Freund and...@2ndquadrant.com wrote: As far as I understand the trick basically is that we can rely on an FPI being logged when an action unsetting ALL_VISIBLE is performed. That all-visible would then make sure the hint-bits marking indvidual tuples as frozen would hit disk. For that we need to add some more work though, consider: 1) write tuples on a page 2) freeze page by setting ALL_VISIBLE and setting hint bits. Setting ALL_VISIBLE is wall logged 3) crash 4) replay ALL_VISIBLE, set it on the page level. The individual tuples are *not* guaranteed to be marked frozen. 5) update tuple on the page unsetting all visible. Emits an FPI which does *not* have the tuples marked as frozen. Easy enough and fairly cheap to fix by having a function that checks that updates the hint bits on a page when unsetting all visible since we can just set it for all pre-existing tuples. Basically, yes, though I would say infomask bits rather than hint bits, since not all of them are only hints, and this case would not be merely a hint. but as far as I can see, it also requires PD_ALL_VISIBLE to stick around. Now, I am far from being convinced its a good idea to get rid of PD_ALL_VISIBLE, but I don't think it does. Except that it currently is legal for the page level ALL_VISIBLE being set while the corresponding visibilitymap one isn't there's not much prohibiting us fundamentally from looking in the vm when we need to know whether the page is all visible, is there? To the contrary, this actually seems to be a pretty good case for Jeff's proposed behaviour since it would allow freezing while only writing the vm? Well, as Heikki points out, I think that's unacceptably dangerous. Loss or corruption of a single visibility map page means possible loss of half a gigabyte of data. Also, if we go that route, looking at the visibility map is no longer an optimization; it's essential for correctness. We can't decide to skip it when it seems expensive, for example, as Jeff was proposing. There's another thing that's bothering me about this whole discussion, too. If looking at another page for the information we need to make visibility decisions is so cheap that we need not be concerned with it, then why do we need hint bits? I realize that it's not quite the same thing, because CLOG doesn't have as much locality of access as the visibility map; you're guaranteed to find all the information you need for a single heap page on a single VM page. Also, CLOG is per-tuple, not per-page, and we get a decent speed-up from checking once for the whole page rather than for each tuple. Nonetheless, I think the contrast between Jeff's tests, which aren't showing much impact from the increased visibility map traffic, and previous hint-bit removal tests, which have crashed and burned, may be caused in part by the fact that our algorithms and locking regimen for shared_buffers are much more sophisticated than for SLRU. I'm not eager to have our design decisions driven by that gap. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] removing PD_ALL_VISIBLE
On Wed, May 29, 2013 at 1:11 PM, Jeff Davis pg...@j-davis.com wrote: On Tue, 2013-05-28 at 19:51 -0400, Robert Haas wrote: If we just wanted to reduce read cost, why not just take a simpler approach and give the visibility map a isfrozen bit? Then we'd know which pages didn't need rescanning without nearly as much complexity. That would break pg_upgrade, which would have to remove visibility map forks when upgrading. More importantly, it would require another round of complex changes to the write-ahead logging in this area. It's not obvious to me that we'd end up ahead of where we are today, although perhaps I am a pessimist. If we removed PD_ALL_VISIBLE, then this would be very simple, right? We would just follow normal logging rules for setting the visible or frozen bit. I don't see how that makes Josh's proposal any simpler. His proposal was to change, in a backward-incompatible fashion, the contents of the visibility map. Getting rid of PD_ALL_VISIBLE will not eliminate that backward-incompatibility. Neither will it eliminate the need to keep the visibility/freeze map in sync with the heap itself. Whether we get rid of PD_ALL_VISIBLE or not, we'll still have to go look at every type of WAL record that clears the visibility map bit and make it clear both the visibility and freeze bits. We'll still need a WAL record to set the visibility map bit, just as we do today, and we'll also need a new WAL record type (or a change to the existing WAL record type) to set the all-frozen bit, when applicable. Now, independently of Josh's proposal, we could change PD_ALL_VISIBLE to emit FPIs for every heap page it touches. For pages that have been hit by updates or deletes, this would be pretty much free, in 9.3, since the PD_ALL_VISIBLE bit will probably be set at the same time we're setting dead line pointers to unused, which is a WAL-logged operation anyway. However, for pages that have been hit only by inserts, this would emit many extra FPIs. Again independently of Josh's proposal, we could eliminate PD_ALL_VISIBLE. This would require either surrendering the optimization whereby sequential scans can skip visibility checks on individual tuples within the page, or referring to the visibility map to get the bit that way. I know you tested this and couldn't measure an impact, but with all respect I find that result hard to accept. Contention around buffer locks and pins is very real; why should it matter on other workloads and not matter on this one? It would also require page modifications prior to consistency to clear the visibility map bit unconditionally, instead of only when PD_ALL_VISIBLE is set on the page (though I think it'd be OK to pay that price if it ended there). AFAICS, the main benefit of eliminating PD_ALL_VISIBLE is that we eliminate one write cycle; that is, we won't dirty the page once to hint it and then again to mark it all-visible. But as of 9.3, that should really only be a problem in the insert-only case. And in that case, my proposal to consider all-visible pages as frozen would be a huge win, because you'd only need to emit XLOG_HEAP_VISIBLE for every page in the heap, rather than XLOG_HEAP_FREEZE. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] removing PD_ALL_VISIBLE
On Wed, 2013-05-29 at 22:46 -0400, Robert Haas wrote: Again independently of Josh's proposal, we could eliminate PD_ALL_VISIBLE. This would require either surrendering the optimization whereby sequential scans can skip visibility checks on individual tuples within the page, or referring to the visibility map to get the bit that way. I know you tested this and couldn't measure an impact, but with all respect I find that result hard to accept. Contention around buffer locks and pins is very real; why should it matter on other workloads and not matter on this one? The number of pins required during a sequential scan without my patch is: PAGES_IN_HEAP The number of pins required during a sequential scan with my patch is: PAGES_IN_HEAP + ceil(PAGES_IN_HEAP/HEAP_PAGES_PER_VM_PAGE) Because HEAP_PAGES_PER_VM_PAGE is huge, the second term only matters when N is very small and the ceil is significant. So, I simply elected not to use the VM if the table is less than 32 pages in size. For such small tables, the benefit of using a page-at-a-time visibility check was not apparent in my tests anyway. AFAICS, the main benefit of eliminating PD_ALL_VISIBLE is that we eliminate one write cycle; that is, we won't dirty the page once to hint it and then again to mark it all-visible. But as of 9.3, that should really only be a problem in the insert-only case. And in that case, my proposal to consider all-visible pages as frozen would be a huge win, because you'd only need to emit XLOG_HEAP_VISIBLE for every page in the heap, rather than XLOG_HEAP_FREEZE. Agreed. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Removing PD_ALL_VISIBLE
On Mon, 2013-01-21 at 12:00 +0200, Heikki Linnakangas wrote: On 21.01.2013 11:10, Jeff Davis wrote: That confuses me. The testing was to show it didn't hurt other workloads (like scans or inserts/updates/deletes); so the best possible result is that they don't show signs either way. I went back to look at the initial test results that demonstrated that keeping the pin on the VM buffer mitigated the overhead of pinning the vm page. The obvious next question is, what is the impact when that's inefficient, ie. when you update pages across different 512 MB sections, so that the vm pin has to be dropped and reacquired repeatedly. I tried to construct a worst case scenario for that: I confirmed this result in a single connection (no concurrency). I used a shared_buffers of 2GB so that the whole table would fit. Also, I fixed a bug that I noticed along the way, which was an uninitialized variable. New version attached. FWIW, I'm considering this patch to be rejected; I just didn't want to leave a patch with a bug in it. Regards, Jeff Davis rm-pd-all-visible-20130122.patch.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Removing PD_ALL_VISIBLE
On Mon, 2013-01-21 at 12:49 +0530, Pavan Deolasee wrote: At the minimum your patch will need to have one additional buffer pinned for every K 8192 * 8 heap pages. I assume it's the same K I referred to when responding to Robert: the max number of heap buffers we read before we unpin and repin the VM buffer. Right now it's unlimited, because there is currently no need to have it at all -- I was just offering the solution in case we did want to do VM page cleanup in the future or something. For most cases, the value of K will be large enough to ignore the overhead, but for small values of K, I'm not sure if we can say that with confidence. It's a constant, and we can easily set it high enough that it wouldn't matter. There's no reason at all to choose a small value for K. Consider that an index root page pin is held for the entire scan, and we don't have a problem with that. Of course, for very small tables the real contention might be somewhere else and so this change may not show up anything on the benchmarks. Doesn't your own tests for 33 page tables confirm that there is indeed contention for this case ? I agree its not huge, but I don't know if there are workloads where it will matter. That appears to happen because of the fraction of pinned pages being too high (aside: it was fairly challenging to construct a test where that happened). I believe it was mostly solved by adding a threshold to use the VM, and I can probably solve it completely by doing some more experiments and finding a better threshold value. I understand that my patch is probably not going in, Sorry about that. I know how that feels. But we need some more reasons to justify this change, especially because the performance numbers themselves are not showing any signs either way. That confuses me. The testing was to show it didn't hurt other workloads (like scans or inserts/updates/deletes); so the best possible result is that they don't show signs either way. But yes, I see that others are not interested in the benefits offered by the patch, which is why I'm giving up on it. If people are concerned about the costs, then I can fix those; but there's nothing I can do to increase the benefits. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Removing PD_ALL_VISIBLE
On 21.01.2013 11:10, Jeff Davis wrote: That confuses me. The testing was to show it didn't hurt other workloads (like scans or inserts/updates/deletes); so the best possible result is that they don't show signs either way. I went back to look at the initial test results that demonstrated that keeping the pin on the VM buffer mitigated the overhead of pinning the vm page. The obvious next question is, what is the impact when that's inefficient, ie. when you update pages across different 512 MB sections, so that the vm pin has to be dropped and reacquired repeatedly. I tried to construct a worst case scenario for that: create unlogged table testtable (i int4); insert into testtable select generate_series(1, 1500); insert into testtable select generate_series(1, 1500); create index testtable_index on testtable (i); When you traverse tuples using that index, the tuples will come alternating from low-numbered pages and high-numbered pages, which defeats keeping the last vm page pinned. To test, I ran this: set enable_bitmapscan=off; set enable_seqscan=off; begin; delete from testtable where i = 0; rollback; I repeated a few times with and without the patch (rm-pd-all-visible-0118.patch). According to \timing, the delete takes about 12.6 seconds without the patch, and 15.3 seconds with it. This is a worst-case scenario, and the slowdown isn't huge, so maybe it's a worthwhile tradeoff. But it shows that removing PD_ALL_VISIBLE is not completely free. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Removing PD_ALL_VISIBLE
On Mon, Jan 21, 2013 at 1:52 AM, Jeff Davis pg...@j-davis.com wrote: Of course, there is an argument that this patch will simplify the code, but I'm not sure if its enough to justify the additional contention which may or may not show up in the benchmarks we are running, but we know its there. What additional contention? How do you know it's there? We know it's there because every additional page that you access has to be looked up and locked and the memory that contains it brought into the CPU cache. If you look up and lock more pages, you WILL have more contention - both for memory, and for locks - and more runtime cost. Whether or not you can currently detect that contention and/or runtime cost experimentally is another matter. Failure to do so could indicate either that you haven't got the right test case - Heikki seems to have found one that works - or that it's being masked by other things, but might not be always. To raise an example which I believe is similar to this one, consider Kevin Grittner's work on SSI. He set himself a goal that SSI should impose a performance regression of no more than 2% - and he met that goal, at the time the code was committed. But then, my read scalability work during the 9.2 cycle blew the lid off of read performance for certain workloads, and now SSI is FAR slower on those workloads. His code always had a scalability problem, but you couldn't measure it experimentally before I did that work, because there were equally bad scalability problems elsewhere. Now there aren't, and you can, and I have. We might well have refused to commit that patch with the locking scheme it uses today if my scalability work had been done first - or perhaps we would have decided that the case where the difference is large is too narrow to be worth worrying about, but I think it would have at least provoked discussion. My biggest concern about the visibility map is that it will act as a focal point for contention. Map half a gigabyte of heap down to 1 page of visibility map and it's easy to think that you could have some situation in which that page gets very, very hot. We know that cache line stealing is a significant cost of ProcArray manipulation, which is why the PGXACT patch makes a significant difference in write throughput. Now your argument seems to be that we can't measure any such effect here, but maybe we're just not measuring the right thing. Heikki's example reminded me that I was concerned about the cost of repeatedly pinning different VM buffers during an index-only scan, if the relation were very large. That's seems easy to justify on the grounds that you're saving so much I/O and memory traffic that the pins will seem cheap by comparison ... but they don't, always. IIRC, an index-only scan on a fully-cached relation is not necessarily faster than a regular index-scan, even if the heap is entirely all-visible. I realize these examples aren't directly applicable to this case, so I'm not sure if they help to advance the discussion or not. I advance them only as evidence that simple tests don't always manage to capture the real costs and benefits of a feature or optimization, and that predicting the system-wide effects of changes in this area is hard. For a large benefit I think we would perhaps bite the bullet and take our chances, but in my (human and imperfect) judgement the benefit here is not large enough to justify it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Removing PD_ALL_VISIBLE
On Thu, Jan 17, 2013 at 5:50 PM, Jeff Davis pg...@j-davis.com wrote: If the without interruption part becomes a practical problem, it seems fairly easy to fix: drop the pin and pick it up again once every K pages. Unless I'm missing something, this is a minor concern. I think probably so. Test plan: 1. Take current patch (without skip VM check for small tables optimization mentioned above). 2. Create 500 tables each about 1MB. 3. VACUUM them all. 4. Start 500 connections (one for each table) 5. Time the running of a loop that executes a COUNT(*) on that connection's table 100 times. I think shared_buffers=64MB is probably appropriate. We want some memory pressure so that it has to find and evict pages to satisfy the queries. But we don't want it to be totally exhausted and unable to even pin a new page; that really doesn't tell us much except that max_connections is too high. Sound reasonable? Well, it's certainly a data point, but each table in that case has 128 pages, so the effect is still pretty small. The place where you're going to run into trouble is when many of those tables have 4 pages each, or 2 pages each, or 1 page each. All of which is to say that I think this patch is premature. If we adopt something like this, we're likely never going to revert back to the way we do it now, and whatever cache-pressure or other costs this approach carries will be hear to stay - so we had better think awfully carefully before we do that. What about this patch makes it hard to undo/rework in the future? Well, if you have a bunch of code, and it preserves property X at all times, it is pretty easy to decide that new code need no longer preserve property X. It is essentially a no-op. The reverse, going through a bunch of existing code that does not consistently preserve property X and making it do so, is always much harder, because you've got to audit all the code you've already got. I don't want to undo the work that's been done on this over the last four years without a really good reason, and I'm not seeing one. My mistake. I believe that is already fixed, and certainly not a fundamental issue. It at least calls for a repetition of any performance testing that has already been done. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Removing PD_ALL_VISIBLE
On Fri, Jan 18, 2013 at 3:31 AM, Jeff Davis pg...@j-davis.com wrote: So, I attached a new version of the patch that doesn't look at the VM for tables with fewer than 32 pages. That's the only change. That certainly seems worthwhile, but I still don't want to get rid of this code. I'm just not seeing a reason why that's something that desperately needs to be done. I don't think this is a barrier to anything else we want to do, and it might well be that the situations where this patch would hurt us are currently masked by other bottlenecks, but won't be always. Right now, the vast majority of heap updates don't need to pin the visibility map page; with this change, all of them do. Now, I accept that your test results show that that doesn't matter, but how can that not be an artifact of some kind? Can we really credit that accessing two pages costs no more than accessing one? To what countervailing factor could we plausibly attribute that? Now, even if it costs more in some narrow sense, the difference might not be enough to matter. But without some big gain on the other side, why tinker with it? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Removing PD_ALL_VISIBLE
Robert Haas robertmh...@gmail.com writes: On Fri, Jan 18, 2013 at 3:31 AM, Jeff Davis pg...@j-davis.com wrote: So, I attached a new version of the patch that doesn't look at the VM for tables with fewer than 32 pages. That's the only change. That certainly seems worthwhile, but I still don't want to get rid of this code. I'm just not seeing a reason why that's something that desperately needs to be done. Yeah, I'm having the same problem. Despite Jeff's test results, I can't help thinking that lack of PD_ALL_VISIBLE *will* hurt us under some workloads, and it's not obvious to me what benefit we get from dropping it. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Removing PD_ALL_VISIBLE
On Mon, Jan 21, 2013 at 8:49 AM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Fri, Jan 18, 2013 at 3:31 AM, Jeff Davis pg...@j-davis.com wrote: So, I attached a new version of the patch that doesn't look at the VM for tables with fewer than 32 pages. That's the only change. That certainly seems worthwhile, but I still don't want to get rid of this code. I'm just not seeing a reason why that's something that desperately needs to be done. Yeah, I'm having the same problem. Despite Jeff's test results, I can't help thinking that lack of PD_ALL_VISIBLE *will* hurt us under some workloads, and it's not obvious to me what benefit we get from dropping it. I tend to agree. When I looked at the patch, I thought since its removing a WAL record (and associated redo logic), it has some additional value. But that was kind of broken (sorry, I haven't looked at the latest patch if Jeff fixed it without requiring to reinstate the WAL logging). I also thought that bug invalidates the performance numbers reported. Of course, there is an argument that this patch will simplify the code, but I'm not sure if its enough to justify the additional contention which may or may not show up in the benchmarks we are running, but we know its there. Thanks, Pavan -- Pavan Deolasee http://www.linkedin.com/in/pavandeolasee -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Removing PD_ALL_VISIBLE
On Sun, 2013-01-20 at 22:19 -0500, Tom Lane wrote: Robert Haas robertmh...@gmail.com writes: On Fri, Jan 18, 2013 at 3:31 AM, Jeff Davis pg...@j-davis.com wrote: So, I attached a new version of the patch that doesn't look at the VM for tables with fewer than 32 pages. That's the only change. That certainly seems worthwhile, but I still don't want to get rid of this code. I'm just not seeing a reason why that's something that desperately needs to be done. Yeah, I'm having the same problem. Despite Jeff's test results, I can't help thinking that lack of PD_ALL_VISIBLE *will* hurt us under some workloads, and it's not obvious to me what benefit we get from dropping it. OK. I respectfully disagree with the arguments I've seen so far, but we can all move on. I already had some more test results (again, thanks to Nathan Boley), so I finished them up and attached them to the bottom of this email for the archives (there's always hope, right?). Regards, Jeff Davis Test goal: is 32 is an appropriate threshold for using the VM after we remove PD_ALL_VISIBLE? Test setup: 500 31-page tables and 500 33-page tables. Test recent build against patched version, with varying shared_buffers. The first test is 500 connections each doing 20 iterations of COUNT(*) against the 500 31-page tables. The next test is the same, except against the 33-page tables. Test results: (There were a few outliers I discarded as being too fast. It always happened in the first run, and I strongly suspect the connections began unevenly, leading to lower concurrency. They didn't seem to favor one build over another.) master, 31-page (first column is shared_buffers, second is range of seconds): 32MB: 5.8 - 6.1 64MB: 3.1 - 3.7 96MB: 1.7 - 2.2 112MB: 0.6 - 1.1 128MB: 0.4 - 0.4 256MB: 0.4 - 0.4 patch, 31-page (doesn't use VM because 31 is below threshold): 32MB: 5.1 - 5.9 64MB: 3.4 - 3.8 96MB: 1.7 - 2.0 112MB: 0.7 - 1.1 128MB: 0.4 - 0.5 256MB: 0.4 - 0.5 master, 33-page: 32MB: 5.9 - 7.0 64MB: 4.2 - 4.7 96MB: 2.4 - 2.8 112MB: 1.2 - 1.6 128MB: 0.5 - 0.5 256MB: 0.4 - 0.5 patch, 33-page (does use VM because 33 is above threshold): 32MB: 6.2 - 7.2 64MB: 4.4 - 4.7 96MB: 2.8 - 3.0 112MB: 1.7 - 1.8 128MB: 0.5 - 1.0 256MB: 0.4 - 0.5 Conclusion: 32 pages is a low enough threshold that skipping the VM is insignificant. When the 500 tables are 33 pages, and it does use the VM, we do see a measurable cost under cache pressure. The penalty is fairly small (10% ballpark), and this is a worst-case, so I don't think it's a problem. From the last test results, we know it gets back to even by the time the tables are 128 pages (1MB). So it could be that there is a slightly higher threshold (between 32 and 128) that would be slightly better. But given how specific this case is and how small the penalty is, I think 32 is a fine threshold. Also, to reiterate, I focused my tests almost entirely on scans, though some of the concern was around inserts/updates/deletes. I tried, but was unable to show any difference on those tests. I suspect that the bottleneck is elsewhere. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Removing PD_ALL_VISIBLE
On Mon, 2013-01-21 at 11:27 +0530, Pavan Deolasee wrote: I tend to agree. When I looked at the patch, I thought since its removing a WAL record (and associated redo logic), it has some additional value. But that was kind of broken (sorry, I haven't looked at the latest patch if Jeff fixed it without requiring to reinstate the WAL logging). I also thought that bug invalidates the performance numbers reported. I revalidated the performance numbers after reinstating the WAL logging. No difference (which is unsurprising, since my tests were read-only). Of course, there is an argument that this patch will simplify the code, but I'm not sure if its enough to justify the additional contention which may or may not show up in the benchmarks we are running, but we know its there. What additional contention? How do you know it's there? The design is to keep the VM page pinned, and to read it without requiring locks (like index-only scans already do). So I don't see any obvious additional contention there, unless you're talking about the work the CPU does for cache coherency (which did not show up in any of my tests). I understand that my patch is probably not going in, but I would like to be clear about what is a known practical problem, what is a theoretical problem that has eluded my testing capabilities, and what is no problem at all. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Removing PD_ALL_VISIBLE
On Mon, Jan 21, 2013 at 12:22 PM, Jeff Davis pg...@j-davis.com wrote: On Mon, 2013-01-21 at 11:27 +0530, Pavan Deolasee wrote: Of course, there is an argument that this patch will simplify the code, but I'm not sure if its enough to justify the additional contention which may or may not show up in the benchmarks we are running, but we know its there. What additional contention? How do you know it's there? At the minimum your patch will need to have one additional buffer pinned for every K 8192 * 8 heap pages. For most cases, the value of K will be large enough to ignore the overhead, but for small values of K, I'm not sure if we can say that with confidence. Of course, for very small tables the real contention might be somewhere else and so this change may not show up anything on the benchmarks. Doesn't your own tests for 33 page tables confirm that there is indeed contention for this case ? I agree its not huge, but I don't know if there are workloads where it will matter. I understand that my patch is probably not going in, Sorry about that. I know how that feels. But we need some more reasons to justify this change, especially because the performance numbers themselves are not showing any signs either way. One could have argued that this saves us a valuable page level bit, but with pg_upgrade etc it has become hard to re-utilize page level bits for other purposes and we don't yet have a pressing need for more bits. Thanks, Pavan -- Pavan Deolasee http://www.linkedin.com/in/pavandeolasee
Re: [HACKERS] Removing PD_ALL_VISIBLE
On Thu, 2013-01-17 at 14:53 -0800, Jeff Davis wrote: Test plan: 1. Take current patch (without skip VM check for small tables optimization mentioned above). 2. Create 500 tables each about 1MB. 3. VACUUM them all. 4. Start 500 connections (one for each table) 5. Time the running of a loop that executes a COUNT(*) on that connection's table 100 times. Done, with a few extra variables. Again, thanks to Nathan Boley for lending me the 64-core box. Test program attached. I did both 1MB tables and 1 tuple tables, but I ended up throwing out the 1-tuple table results. First of all, as I said, that's a pretty easy problem to solve, so not really what I want to test. Second, I had to do so many iterations that I don't think I was testing anything useful. I did see what might have been a couple differences, but I would need to explore in more detail and I don't think it's worth it, so I'm just reporting on the 1MB tables. For each test, each of 500 connections runs 10 iterations of a COUNT(*) on it's own 1MB table (which is vacuumed and has the VM bit set). The query is prepared once. The table has only an int column. The variable is shared_buffers, going from 32MB (near exhaustion for 500 connections) to 2048MB (everything fits). The last column is the time range in seconds. I included the range this time, because there was more variance in the runs but I still think they are good test results. master: 32MB: 16.4 - 18.9 64MB: 16.9 - 17.3 128MB: 17.5 - 17.9 256MB: 14.7 - 15.8 384MB: 8.1 - 9.3 448MB: 4.3 - 9.2 512MB: 1.7 - 2.2 576MB: 0.6 - 0.6 1024MB: 0.6 - 0.6 2048MB: 0.6 - 0.6 patch: 32MB: 16.8 - 17.6 64MB: 17.1 - 17.5 128MB: 17.2 - 18.0 256MB: 14.8 - 16.2 384MB: 8.0 - 10.1 448MB: 4.6 - 7.2 512MB: 2.0 - 2.6 576MB: 0.6 - 0.6 1024MB: 0.6 - 0.6 2048MB: 0.6 - 0.6 Conclusion: I see about what I expect: a precipitous drop in runtime after everything fits in shared_buffers (500 1MB tables means the inflection point around 512MB makes a lot of sense). There does seem to be a measurable difference right around that inflection point, but it's not much. Considering that this is the worst case that I could devise, I am not too concerned about this. However, it is interesting to see that there really is a lot of maintenance work being done when we need to move pages in and out of shared buffers. I'm not sure that it's related to the freelists though. For the extra pins to really be a problem, I think a much higher percentage of the buffers would need to be pinned. Since the case we are worried about involves scans (if it involved indexes, that would already be using more than one pin per scan), then that means the only way to get to a high percentage of pinned buffers is by having very small tables. But we don't really need to use the VM when scanning very small tables (the overhead would be elsewhere), so I think we're OK. So, I attached a new version of the patch that doesn't look at the VM for tables with fewer than 32 pages. That's the only change. Regards, Jeff Davis #include libpq-fe.h #include stdlib.h #include stdio.h #include sys/time.h #define QSIZE 256 void test(char *query, int procnum, int niter) { PGconn *conn; PGresult *result; int i; conn = PQconnectdb(host=/tmp dbname=postgres); if (PQstatus(conn) != CONNECTION_OK) { fprintf(stderr, connection failed!\n); exit(1); } result = PQprepare(conn, q, query, 0, NULL); if (PQresultStatus(result) != PGRES_COMMAND_OK) { fprintf(stderr, PREPARE failed: %s, PQerrorMessage(conn)); PQclear(result); exit(1); } PQclear(result); for (i = 0; i niter; i++) { result = PQexecPrepared(conn, q, 0, NULL, NULL, NULL, 0); if (PQresultStatus(result) != PGRES_TUPLES_OK) { fprintf(stderr, EXECUTE PREPARED failed: %s\n, PQerrorMessage(conn)); PQclear(result); exit(1); } PQclear(result); } PQfinish(conn); } int main(int argc, char *argv[]) { int niter; int nprocs; char query[QSIZE]; int i; pid_t *procs; struct timeval tv1, tv2; if (argc != 3) { fprintf(stderr, expected 3 arguments, got %d\n, argc); exit(1); } nprocs = atoi(argv[1]); niter = atoi(argv[2]); procs = malloc(sizeof(pid_t) * nprocs); gettimeofday(tv1, NULL); for (i = 0; i nprocs; i++) { pid_t pid = fork(); if (pid == 0) { snprintf(query, QSIZE, SELECT COUNT(*) FROM mb_%d;, i); test(query, i, niter); exit(0); } else { procs[i] = pid; } } for (i = 0; i nprocs; i++) { int status; waitpid(procs[i], status, 0); if (!WIFEXITED(status)) { fprintf(stderr, child did not exit!\n, argc); exit(1); } if (WEXITSTATUS(status) != 0) { fprintf(stderr, child exited with status %d\n, WEXITSTATUS(status)); exit(1); } } gettimeofday(tv2, NULL); free(procs); if
Re: [HACKERS] Removing PD_ALL_VISIBLE
On 17 January 2013 03:02, Jeff Davis pg...@j-davis.com wrote: Rebased patch attached. No significant changes. Jeff, can you summarise/collate why we're doing this, what concerns it raises and how you've dealt with them? That will help decide whether to commit. Thanks -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Removing PD_ALL_VISIBLE
On Thu, Jan 17, 2013 at 2:11 PM, Simon Riggs si...@2ndquadrant.com wrote: On 17 January 2013 03:02, Jeff Davis pg...@j-davis.com wrote: Rebased patch attached. No significant changes. Jeff, can you summarise/collate why we're doing this, what concerns it raises and how you've dealt with them? That will help decide whether to commit. +1. On another thread Set visibility map bit after HOT prune, Robert mentioned that its not such a good idea to remove the PD_ALL_VISIBLE flag because it helps us to reduce the contention on the VM page, especially when we need to reset the VM bit. Here is an excerpt from Robert's comment that thread: Sure, but you're zipping rather blithely past the disadvantages of such an approach. Jeff Davis recently proposed getting rid of PD_ALL_VISIBLE, and Tom and I both expressed considerable skepticism about that; this proposal has the same problems. One of the major benefits of PD_ALL_VISIBLE is that, when it isn't set, inserts, updates, and deletes to the page can ignore the visibility map. That means that a server under heavy concurrency is much less likely to encounter contention on the visibility map blocks. Now, maybe that's not really a problem, but I sure haven't seen enough evidence to make me believe it. If it's really true that PD_ALL_VISIBLE needn't fill this role, then Heikki wasted an awful lot of time implementing it, and I wasted an awful lot of time keeping it working when I made the visibility map crash-safe for IOS. That could be true, but I tend to think it isn't. May be you've already addressed that concern with the proven performance numbers, but I'm not sure. Thanks, Pavan -- Pavan Deolasee http://www.linkedin.com/in/pavandeolasee -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Removing PD_ALL_VISIBLE
At 2013-01-17 08:41:37 +, si...@2ndquadrant.com wrote: Jeff, can you summarise/collate why we're doing this, what concerns it raises and how you've dealt with them? Since I was just looking at the original patch and discussion, and since Pavan has posted an excerpt from one objection to it, here's an excerpt from Jeff's original post titled Do we need so many hint bits? http://www.postgresql.org/message-id/1353026577.14335.91.camel@sussancws0025 Also, I am wondering about PD_ALL_VISIBLE. It was originally introduced in the visibility map patch, apparently as a way to know when to clear the VM bit when doing an update. It was then also used for scans, which showed a significant speedup. But I wonder: why not just use the visibilitymap directly from those places? It can be used for the scan because it is crash safe now (not possible before). And since it's only one lookup per scanned page, then I don't think it would be a measurable performance loss there. Inserts/updates/deletes also do a significant amount of work, so again, I doubt it's a big drop in performance there -- maybe under a lot of concurrency or something. The benefit of removing PD_ALL_VISIBLE would be significantly higher. It's quite common to load a lot of data, and then do some reads for a while (setting hint bits and flushing them to disk), and then do a VACUUM a while later, setting PD_ALL_VISIBLE and writing all of the pages again. Also, if I remember correctly, Robert went to significant effort when making the VM crash-safe to keep the PD_ALL_VISIBLE and VM bits consistent. Maybe this was all discussed before? There was considerable discussion after this (accessible through the archives link above), which I won't attempt to summarise. -- Abhijit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Removing PD_ALL_VISIBLE
On Thu, Jan 17, 2013 at 2:57 PM, Abhijit Menon-Sen a...@2ndquadrant.comwrote: There was considerable discussion after this (accessible through the archives link above), which I won't attempt to summarise. I thought Robert made those comments after considerable discussions on Jeff's approach. So he probably still stands by his objections or at least not satisfied/seen the numbers. Now that I look at the patch, I wonder if there is another fundamental issue with the patch. Since the patch removes WAL logging for the VM set operation, this can happen: 1. Vacuum kicks in and clears all dead tuples in a page and decides that its all-visible 2. Vacuum WAL-logs the cleanup activity and marks the page dirty 3. Vacuum sets the visibility bit and marks the VM page dirty 4. Say the VM page gets written to the disk. The heap page is not yet written neither the WAL log corresponding to the cleanup operation 5. CRASH After recovery, the VM bit will remain set because the VM page got written before the crash. But since heap page's cleanup WAL did not made to the disk, those operations won't be replayed. The heap page will be left with not-all-visible tuples in that case and its not a good state to be in. The original code does not have this problem because the VM set WAL gets written after the heap page cleanup WAL. So its guaranteed that the VM bit will be set during recovery only if the cleanup WAL is replayed too (there is more magic than what meets the eye and I think its not fully documented). Thanks, Pavan -- Pavan Deolasee http://www.linkedin.com/in/pavandeolasee
Re: [HACKERS] Removing PD_ALL_VISIBLE
On Thu, Jan 17, 2013 at 3:49 AM, Pavan Deolasee pavan.deola...@gmail.com wrote: May be you've already addressed that concern with the proven performance numbers, but I'm not sure. It would be nice to hear what Heikki's reasons were for adding PD_ALL_VISIBLE in the first place. Jeff's approach of holding the VM pins for longer certainly mitigates some of the damage, in the sense that it reduces buffer lookups and pin/unpin cycles - and might be worth doing independently of the rest of the patch if we think it's a win. Index-only scans already use a similar optimization, so extending it to inserts, updates, and deletes is surely worth considering. The main question in my mind is whether there are any negative consequences to holding a VM buffer pin for that long without interruption. The usual consideration - namely, blocking vacuum - doesn't apply here because vacuum does not take a cleanup lock on the visibility map page, only the heap page, but I'm not sure if there are others. The other part of the issue is cache pressure. I don't think I can say it better than what Tom already wrote: # I'd be worried about the case of a lot of sessions touching a lot of # unrelated tables. This change implies doubling the number of buffers # that are held pinned by any given query, and the distributed overhead # from that (eg, adding cycles to searches for free buffers) is what you # ought to be afraid of. I agree that we ought to be afraid of that. A pgbench test isn't going to find a problem in this area because there you have a bunch of sessions all accessing the same small group of tables. To find a problem of the type above, you'd need lots of backends accessing lots of different, small tables. That's not the use case we usually benchmark, but I think there are a fair number of people doing things like that - for example, imagine a hosting provider or web application with many databases or schemas on a single instance. AFAICS, Jeff hasn't tried to test this scenario. Now, on the flip side, we should also be thinking about what we would gain from this patch, and what other ways there might be to achieve the same goals. As far as I can see, the main gain is that if you bulk-load a table, don't vacuum it right away, get all the hint bits set by some other mechanism, and then vacuum the table, you'll only read the whole table instead of rewriting the whole table. So ISTM that, for example, if we adopted the idea of making HOT prune set visibility-map bits, most of the benefit of this change evaporates, but whatever costs it may have will remain. There are other possible ways of getting the same benefit as well - for example, I've been thinking for a while now that we should try to find some judicious way of vacuuming insert-only tables, perhaps only in small chunks when there's nothing else going on. That wouldn't as clearly obsolete this patch, but it would address a very similar use case and would also preset hint bits, which would help a lot of people. Some of the ideas we've had about a HEAP_XMIN_FREEZE intersect with this idea, too - if we can do an early freeze without losing forensic information, we can roll setting the hint bit, setting PD_ALL_VISIBLE, and freezing the page into a single write. All of which is to say that I think this patch is premature. If we adopt something like this, we're likely never going to revert back to the way we do it now, and whatever cache-pressure or other costs this approach carries will be hear to stay - so we had better think awfully carefully before we do that. And, FWIW, I don't believe that there is sufficient time in this release cycle to carefully test this patch and the other alternative designs that aim toward the same ends. Even if there were, this is exactly the sort of thing that should be committed at the beginning of a release cycle, not the end, so as to allow adequate time for discovery of unforeseen consequences before the code ends up out in the wild. Of course, there's another issue here too, which is that as Pavan points out, the page throws crash-safety out the window, which breaks index-only scans (if you have a crash). HEAP_XLOG_VISIBLE is intended principally to protect the VM bit, not PD_ALL_VISIBLE, but the patch rips it out even though its purpose is to remove the latter, not the former. Removing PD_ALL_VISIBLE eliminates the need to keep the visibility-map bit consist with PD_ALL_VISIBLE, but it does not eliminate the need to keep PD_ALL_VISIBLE consistent with the page contents. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Removing PD_ALL_VISIBLE
On 17.01.2013 16:53, Robert Haas wrote: On Thu, Jan 17, 2013 at 3:49 AM, Pavan Deolasee pavan.deola...@gmail.com wrote: May be you've already addressed that concern with the proven performance numbers, but I'm not sure. It would be nice to hear what Heikki's reasons were for adding PD_ALL_VISIBLE in the first place. The idea was to avoid clearing the bit in the VM page on every update, when the bit is known to not be set, ie. when the PD_ALL_VISIBLE flag is not set. I assumed the traffic and contention on the VM page would be a killer otherwise. I don't remember if I ever actually tested that though. Maybe I was worrying about nothing and hitting the VM page on every update is ok. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Removing PD_ALL_VISIBLE
On Thu, 2013-01-17 at 15:25 +0530, Pavan Deolasee wrote: Now that I look at the patch, I wonder if there is another fundamental issue with the patch. Since the patch removes WAL logging for the VM set operation, this can happen: Thank you. I think I was confused by this comment here: When we *set* a visibility map during VACUUM, we must write WAL. This may seem counterintuitive, since the bit is basically a hint: if it is clear, it may still be the case that every tuple on the page is visible to all transactions; we just don't know that for certain. The difficulty is that there are two bits which are typically set together: the PD_ALL_VISIBLE bit on the page itself, and the visibility map bit. If a crash occurs after the visibility map page makes it to disk and before the updated heap page makes it to disk, redo must set the bit on the heap page. Otherwise, the next insert, update, or delete on the heap page will fail to realize that the visibility map bit must be cleared, possibly causing index-only scans to return wrong answers. Which lead me to believe that I could just rip out the WAL-related code if PD_ALL_VISIBLE goes away, which is incorrect. But the incorrectness doesn't have to do with the WAL directly, it's because the VM page's LSN is not bumped past the LSN of the related heap page cleanup, so it can be written too early. Of course, the way to bump the LSN is to write WAL for the visibilitymap_set operation. But that would be a very simple WAL routine, rather than the complex one that exists without the patch. I suppose we could even try to bump the LSN without writing WAL somehow, but it doesn't seem worth reasoning through that (setting a VM bit is rare enough). Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Removing PD_ALL_VISIBLE
On Thu, 2013-01-17 at 10:39 -0800, Jeff Davis wrote: On Thu, 2013-01-17 at 15:25 +0530, Pavan Deolasee wrote: Now that I look at the patch, I wonder if there is another fundamental issue with the patch. Since the patch removes WAL logging for the VM set operation, this can happen: Thank you. New patch attached with simple WAL logging. Regards, Jeff Davis rm-pd-all-visible-20130117.patch.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Removing PD_ALL_VISIBLE
On 17 January 2013 15:14, Heikki Linnakangas hlinnakan...@vmware.com wrote: On 17.01.2013 16:53, Robert Haas wrote: On Thu, Jan 17, 2013 at 3:49 AM, Pavan Deolasee pavan.deola...@gmail.com wrote: May be you've already addressed that concern with the proven performance numbers, but I'm not sure. It would be nice to hear what Heikki's reasons were for adding PD_ALL_VISIBLE in the first place. The idea was to avoid clearing the bit in the VM page on every update, when the bit is known to not be set, ie. when the PD_ALL_VISIBLE flag is not set. I assumed the traffic and contention on the VM page would be a killer otherwise. I don't remember if I ever actually tested that though. Maybe I was worrying about nothing and hitting the VM page on every update is ok. Presumably we remember the state of the VM so we can skip the re-visit after every write? -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Removing PD_ALL_VISIBLE
On Thu, 2013-01-17 at 17:14 +0200, Heikki Linnakangas wrote: I don't remember if I ever actually tested that though. Maybe I was worrying about nothing and hitting the VM page on every update is ok. I tried, but was unable to show really anything at all, even without keeping the VM page pinned. I think the bottleneck is elsewhere; although I am keeping the page pinned in this patch to prevent it from becoming a bottleneck. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Removing PD_ALL_VISIBLE
On Thu, 2013-01-17 at 19:58 +, Simon Riggs wrote: Presumably we remember the state of the VM so we can skip the re-visit after every write? That was not a part of my patch, although I remember that you mentioned that previously and I thought it could be a good way to mitigate a problem if it ever came up. However, the tests I did didn't show any problem there. The tests were somewhat noisy, so perhaps I was doing something wrong, but it didn't appear that looking at the VM page for every update was a bottleneck. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Removing PD_ALL_VISIBLE
On Thu, 2013-01-17 at 09:53 -0500, Robert Haas wrote: The main question in my mind is whether there are any negative consequences to holding a VM buffer pin for that long without interruption. The usual consideration - namely, blocking vacuum - doesn't apply here because vacuum does not take a cleanup lock on the visibility map page, only the heap page, but I'm not sure if there are others. If the without interruption part becomes a practical problem, it seems fairly easy to fix: drop the pin and pick it up again once every K pages. Unless I'm missing something, this is a minor concern. The other part of the issue is cache pressure. I don't think I can say it better than what Tom already wrote: # I'd be worried about the case of a lot of sessions touching a lot of # unrelated tables. This change implies doubling the number of buffers # that are held pinned by any given query, and the distributed overhead # from that (eg, adding cycles to searches for free buffers) is what you # ought to be afraid of. I agree that we ought to be afraid of that. It's a legitimate concern, but I think being afraid goes to far (more below). A pgbench test isn't going to find a problem in this area because there you have a bunch of sessions all accessing the same small group of tables. To find a problem of the type above, you'd need lots of backends accessing lots of different, small tables. That's not the use case we usually benchmark, but I think there are a fair number of people doing things like that - for example, imagine a hosting provider or web application with many databases or schemas on a single instance. AFAICS, Jeff hasn't tried to test this scenario. The concern here is over a lot of different, small tables (e.g. multi-tenancy or something similar) as you say. If we're talking about nearly empty tables, that's easy to fix: just don't use the VM on tables less than N pages. You could say that small tables are really 1-10MB each, and you could have a zillion of those. I will try to create a worst-case here and see what numbers come out. Perhaps the extra time to look for a free buffer does add up. Test plan: 1. Take current patch (without skip VM check for small tables optimization mentioned above). 2. Create 500 tables each about 1MB. 3. VACUUM them all. 4. Start 500 connections (one for each table) 5. Time the running of a loop that executes a COUNT(*) on that connection's table 100 times. I think shared_buffers=64MB is probably appropriate. We want some memory pressure so that it has to find and evict pages to satisfy the queries. But we don't want it to be totally exhausted and unable to even pin a new page; that really doesn't tell us much except that max_connections is too high. Sound reasonable? Now, on the flip side, we should also be thinking about what we would gain from this patch, and what other ways there might be to achieve the same goals. One thing to keep in mind is that the current code to maintain a crash-safe PD_ALL_VISIBLE and VM bit is quite complex and doesn't play by the normal rules. If you want to talk about distributed costs, that has some very real distributed costs in terms of development effort. For instance, my checksums patch took me extra time to write (despite the patch being the simplest checksums design on the table) and will take others extra time to review. So, if the only things keeping that code in place are theoretical fears, let's take them one by one and see if they are real problems or not. All of which is to say that I think this patch is premature. If we adopt something like this, we're likely never going to revert back to the way we do it now, and whatever cache-pressure or other costs this approach carries will be hear to stay - so we had better think awfully carefully before we do that. What about this patch makes it hard to undo/rework in the future? Even if there were, this is exactly the sort of thing that should be committed at the beginning of a release cycle, not the end, so as to allow adequate time for discovery of unforeseen consequences before the code ends up out in the wild. I'm concerned that such a comment at this stage will cut review early, which could prevent also it from being committed early in 9.4. Of course, there's another issue here too, which is that as Pavan points out, the page throws crash-safety out the window My mistake. I believe that is already fixed, and certainly not a fundamental issue. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Removing PD_ALL_VISIBLE
Rebased patch attached. No significant changes. Regards, Jeff Davis rm-pd-all-visible-20130116.patch.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Removing PD_ALL_VISIBLE
On Fri, 2012-11-30 at 13:16 -0800, Jeff Davis wrote: I tried for quite a while to show any kind of performance difference between checking the VM and checking PD_ALL_VISIBLE on a 12-core box (24 if you count HT). Three patches in question: 1. Current unpatched master 2. patch that naively always checks the VM page, pinning and unpinning each time 3. Same as #2, but tries to keep buffers pinned (I had to fix a bug in this patch though -- new version forthcoming) New patch attached. Nathan Boley kindly lent me access to a 64-core box, and that shows a much more interesting result. The previous test (on the 12-core) basically showed no difference between any of the patches. Now, I see why on the 64 core box: the interesting region seems to be around 32 concurrent connections. The left column is the concurrency, and the right is the runtime. This test was for concurrent scans of a 350MB table (each process did 4 scans and quit). Test program attached. Patch 1 (scan test): 001 004.299533 002 004.434378 004 004.708533 008 004.518470 012 004.487033 016 004.513915 024 004.765459 032 006.425780 048 007.089146 064 007.908850 072 009.461419 096 013.098646 108 015.278592 128 019.797206 Patch 2 (scan test): 001 004.385206 002 004.596340 004 004.616684 008 004.832248 012 004.858336 016 004.689959 024 005.016797 032 006.857642 048 012.049407 064 025.774772 072 032.680710 096 059.147500 108 083.654806 128 120.350200 Patch 3 (scan test): 001 004.464991 002 004.95 004 004.562364 008 004.649633 012 004.628159 016 004.518748 024 004.768348 032 004.834177 048 007.003305 064 008.242714 072 009.732261 096 013.231056 108 014.996977 128 020.488570 As you can see, patch #2 starts to show a difference at around 32 and completely falls over by 48 connections. This is expected because it's the naive approach that pins the VM page every time it needs it. Patch #1 and #3 are effectively the same, subsequent runs (and with more measurements around concurrency 32) show that the differences are just noise (which seems to be greater around the inflection point of 32). All of the numbers that seem to show any difference can end up with patch #1 better or patch #3 better, depending on the run. I tried the delete test, too, but I still couldn't see any difference. (I neglected to mention in my last email: I aborted after each delete so that it would be repeatable). The inflection point there is significantly lower, so I assume it must be contending over something else. I tried making the table unlogged to see if that would change things, but it didn't change much. This test only scales linearly to about 8 or so. Or, there could be something wrong with my test. So, I conclude that contention is certainly a problem for scans for patch #2, but patch #3 seems to fix that completely by holding the buffer pins. The deletes are somewhat inconclusive, but I just can't see a difference. Holding more pins does have a distributed cost in theory, as Tom points out, but I don't know where to begin testing that. We'll have to make a decision between (a) maintaining the extra complexity and doing the extra page writes involved with PD_ALL_VISIBLE; or (b) holding onto one extra pin per table being scanned. Right now, if PD_ALL_VISIBLE did not exist, it would be pretty hard to justify putting it in as far as I can tell. Regards, Jeff Davis #include libpq-fe.h #include stdlib.h #include stdio.h #include sys/time.h void test(char *query, int niter) { PGconn *conn; PGresult *result; int i; conn = PQconnectdb(host=/tmp dbname=postgres); if (PQstatus(conn) != CONNECTION_OK) { fprintf(stderr, connection failed!\n); exit(1); } result = PQprepare(conn, , query, 0, NULL); if (PQresultStatus(result) != PGRES_COMMAND_OK) { fprintf(stderr, PREPARE failed: %s, PQerrorMessage(conn)); PQclear(result); exit(1); } PQclear(result); for (i = 0; i niter; i++) { result = PQexecPrepared(conn, , 0, NULL, NULL, NULL, 0); if (PQresultStatus(result) != PGRES_TUPLES_OK) { fprintf(stderr, EXECUTE PREPARED failed: %s\n, PQerrorMessage(conn)); PQclear(result); exit(1); } PQclear(result); } PQfinish(conn); } int main(int argc, char *argv[]) { int niter; int nprocs; char *query = SELECT COUNT(*) FROM foo; int i; pid_t *procs; struct timeval tv1, tv2; if (argc != 3) { fprintf(stderr, expected 3 arguments, got %d\n, argc); exit(1); } nprocs = atoi(argv[1]); niter = atoi(argv[2]); procs = malloc(sizeof(pid_t) * nprocs); gettimeofday(tv1, NULL); for (i = 0; i nprocs; i++) { pid_t pid = fork(); if (pid == 0) { test(query, niter); exit(0); } else { procs[i] = pid; } } for (i = 0; i nprocs; i++) { int status; waitpid(procs[i], status, 0); if (!WIFEXITED(status)) { fprintf(stderr, child did not exit!\n, argc); exit(1);
Re: [HACKERS] Removing PD_ALL_VISIBLE
On Mon, 2012-11-26 at 16:55 -0600, Merlin Moncure wrote: Based on previous measurements, I think there *is* contention pinning the root of an index. Currently, I believe it's largely overwhelmed by contention from other sources, such as the buffer manager lwlocks and the very-evil ProcArrayLock. However, I believe that as we fix those problems, this will start to percolate up towards the top of the heap. Yup -- it (buffer pin contention on high traffic buffers) been caught in the wild -- just maintaining the pin count was enough to do it in at least one documented case. Pathological workloads demonstrate contention today and there's no good reason to assume it's limited index root nodes -- i'm strongly suspicious buffer spinlock issues are behind some other malfeasance we've seen recently. I tried for quite a while to show any kind of performance difference between checking the VM and checking PD_ALL_VISIBLE on a 12-core box (24 if you count HT). Three patches in question: 1. Current unpatched master 2. patch that naively always checks the VM page, pinning and unpinning each time 3. Same as #2, but tries to keep buffers pinned (I had to fix a bug in this patch though -- new version forthcoming) I tested from 1 to 64 concurrent connections. One test was just concurrent scans of a ~400MB table. The other test was a DELETE FROM foo WHERE ctid BETWEEN '(N,0)' AND '(N,256)' where N is the worker number in the test program. The table here is only about 2 MB. The idea is that the scan will happen quickly, leading to many quick deletes, but the deletes won't actually touch the same pages (aside from the VM page). So, this was designed to be uncontended except for pinning the VM page. On the scan test, it was really hard to see any difference in the test noise, but I may have seen about a 3-4% degradation between patch #1 and patch #2 at higher concurrencies. It was difficult for me to reproduce this result -- it usually wouldn't show up. I didn't see any difference between patch #1 and patch #3. On the delete test I detected no difference between #1 and #2 at all. I think someone with access to a larger box may need to test this. Or, if someone has a more specific suggestion about how I can create a worst case, then let me know. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Removing PD_ALL_VISIBLE
On Sun, 2012-11-25 at 22:30 -0500, Tom Lane wrote: I'd be worried about the case of a lot of sessions touching a lot of unrelated tables. This change implies doubling the number of buffers that are held pinned by any given query, and the distributed overhead from that (eg, adding cycles to searches for free buffers) is what you ought to be afraid of. That's a good point. Doubling might be an exaggeration if indexes are involved, but it's still a concern. The cost of this might be difficult to measure though. Another possibly important point is that reducing the number of pin/unpin cycles for a given VM page might actually hurt the chances of it being found in shared buffers, because IIRC the usage_count is bumped once per pin/unpin. That algorithm is based on the assumption that buffer pins are not drastically different in lifespan, but I think you just broke that for pins on VM pages. If doing a bunch of simple key lookups using an index, then the root of the index page is only pinned once per query, but we expect that to stay in shared buffers. I see the VM page as about the same: one pin per query (or maybe a couple for large tables). I don't see how the lifetime of the pin matters a whole lot in this case; it's more about the rate of pins/unpins, right? I'm not particularly concerned about devising solutions for these problems, though, because I think this idea is a loser from the get-go; the increase in contention for VM pages is alone going to destroy any possible benefit. Your intuition here is better than mine, but I am still missing something here. If we keep the buffer pinned, then there will be very few pin/unpin cycles here, so I don't see where the contention would come from (any more than there is contention pinning the root of an index). Do you still think I need a shared lock here or something? If so, then index-only scans have a pretty big problem right now, too. I'll try to quantify some of these effects you've mentioned, and see how the numbers turn out. I'm worried that I'll need more than 4 cores to show anything though, so perhaps someone with a many-core box would be interested to test this out? Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Removing PD_ALL_VISIBLE
On Mon, Nov 26, 2012 at 3:29 PM, Jeff Davis pg...@j-davis.com wrote: Your intuition here is better than mine, but I am still missing something here. If we keep the buffer pinned, then there will be very few pin/unpin cycles here, so I don't see where the contention would come from (any more than there is contention pinning the root of an index). Based on previous measurements, I think there *is* contention pinning the root of an index. Currently, I believe it's largely overwhelmed by contention from other sources, such as the buffer manager lwlocks and the very-evil ProcArrayLock. However, I believe that as we fix those problems, this will start to percolate up towards the top of the heap. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Removing PD_ALL_VISIBLE
Jeff Davis pg...@j-davis.com writes: On Sun, 2012-11-25 at 22:30 -0500, Tom Lane wrote: Another possibly important point is that reducing the number of pin/unpin cycles for a given VM page might actually hurt the chances of it being found in shared buffers, because IIRC the usage_count is bumped once per pin/unpin. If doing a bunch of simple key lookups using an index, then the root of the index page is only pinned once per query, but we expect that to stay in shared buffers. I see the VM page as about the same: one pin per query (or maybe a couple for large tables). Hmmm ... that seems like a valid analogy. I may be worried about nothing as far as this point goes. Do you still think I need a shared lock here or something? If so, then index-only scans have a pretty big problem right now, too. There's still the issue of whether the IOS code is safe in machines with weak memory ordering. I think that it probably is safe, but the argument for it in the current code comment is wrong; most likely, a correct argument has to depend on read/write barriers associated with taking snapshots. I think what you ought to do is work through that, fix the existing comment, and then see whether the same argument works for what you want to do. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Removing PD_ALL_VISIBLE
On Mon, 2012-11-26 at 16:10 -0500, Tom Lane wrote: There's still the issue of whether the IOS code is safe in machines with weak memory ordering. I think that it probably is safe, but the argument for it in the current code comment is wrong; most likely, a correct argument has to depend on read/write barriers associated with taking snapshots. I think what you ought to do is work through that, fix the existing comment, and then see whether the same argument works for what you want to do. As a part of the patch, I did change the comment, and here's what I came up with: * Note on Memory Ordering Effects: visibilitymap_test does not lock * the visibility map buffer, and therefore the result we read here * could be slightly stale. However, it can't be stale enough to * matter. * * We need to detect clearing a VM bit due to an insert right away, * because the tuple is present in the index page but not visible. The * reading of the TID by this scan (using a shared lock on the index * buffer) is serialized with the insert of the TID into the index * (using an exclusive lock on the index buffer). Because the VM bit is * cleared before updating the index, and locking/unlocking of the * index page acts as a full memory barrier, we are sure to see the * cleared bit if we see a recently-inserted TID. * * Deletes do not update the index page (only VACUUM will clear out the * TID), so the clearing of the VM bit by a delete is not serialized * with this test below, and we may see a value that is significantly * stale. However, we don't care about the delete right away, because * the tuple is still visible until the deleting transaction commits or * the statement ends (if it's our transaction). In either case, the * lock on the VM buffer will have been released (acting as a write * barrier) after clearing the bit. And for us to have a snapshot that * includes the deleting transaction (making the tuple invisible), we * must have acquired ProcArrayLock after that time, acting as a read * barrier. * * It's worth going through this complexity to avoid needing to lock * the VM buffer, which could cause significant contention. And I updated the comment in visibilitymap.c as well (reformatted for this email): To test a bit in the visibility map, most callers should have a pin on the VM buffer, and at least a shared lock on the data buffer. Any process that clears the VM bit must have an exclusive lock on the data buffer, so that will serialize access to the appropriate bit. Because lock acquisition and release are full memory barriers, then there is no danger of seeing the state of the bit before it was last cleared. Callers that don't have the data buffer yet, such as an index only scan or a VACUUM that is skipping pages, must handle the concurrency as appropriate. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Removing PD_ALL_VISIBLE
On Mon, Nov 26, 2012 at 3:03 PM, Robert Haas robertmh...@gmail.com wrote: On Mon, Nov 26, 2012 at 3:29 PM, Jeff Davis pg...@j-davis.com wrote: Your intuition here is better than mine, but I am still missing something here. If we keep the buffer pinned, then there will be very few pin/unpin cycles here, so I don't see where the contention would come from (any more than there is contention pinning the root of an index). Based on previous measurements, I think there *is* contention pinning the root of an index. Currently, I believe it's largely overwhelmed by contention from other sources, such as the buffer manager lwlocks and the very-evil ProcArrayLock. However, I believe that as we fix those problems, this will start to percolate up towards the top of the heap. Yup -- it (buffer pin contention on high traffic buffers) been caught in the wild -- just maintaining the pin count was enough to do it in at least one documented case. Pathological workloads demonstrate contention today and there's no good reason to assume it's limited index root nodes -- i'm strongly suspicious buffer spinlock issues are behind some other malfeasance we've seen recently. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Removing PD_ALL_VISIBLE
On Wed, 2012-11-21 at 18:25 -0800, Jeff Davis wrote: Follow up to discussion: http://archives.postgresql.org/pgsql-hackers/2012-11/msg00817.php I worked out a patch that replaces PD_ALL_VISIBLE with calls to visibilitymap_test. It rips out a lot of complexity, with a net drop of about 300 lines (not a lot, but some of that code is pretty complex). Updated patch attached. Now it tries to keep the VM buffer pinned during scans, inserts, updates, and deletes. This should avoid increased contention pinning the VM pages, but performance tests are required. For updates, it currently only tries to hold a pin on the VM buffer for the page of the original tuple. For HOT updates, that's always the same as the new buffer anyway. For cold updates, we could also try to keep a pin on the buffer for the new tuple, but right now I don't see an obvious need for that complexity. It may plausibly be a problem when doing a bulk update on a freshly-loaded table. It occurred to me that it might be difficult to test this patch without a fairly large test case. A big assumption of my patch is that there will be locality of access (and the VM page you already have a pin on is likely to be needed the next time), which is obvious during a scan but not so obvious during I/U/D. But a single 8K VM page represents some 60K pages, or about 500MB of data. So anything less than that means that there is only one VM page, and locality is trivial... it seems like any test on a table less than 5GB would not be fair. Then again, if a 5GB table is being randomly accessed, an extra pin is unlikely to matter. Also, without locality, the contention would not be nearly as bad either. I'm still pretty unclear what the worst case for this patch is supposed to look like. Regards, Jeff Davis rm-pd-all-visible-20121125.patch.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Removing PD_ALL_VISIBLE
Jeff Davis pg...@j-davis.com writes: Now it tries to keep the VM buffer pinned during scans, inserts, updates, and deletes. This should avoid increased contention pinning the VM pages, but performance tests are required. ... Then again, if a 5GB table is being randomly accessed, an extra pin is unlikely to matter. Also, without locality, the contention would not be nearly as bad either. I'm still pretty unclear what the worst case for this patch is supposed to look like. I'd be worried about the case of a lot of sessions touching a lot of unrelated tables. This change implies doubling the number of buffers that are held pinned by any given query, and the distributed overhead from that (eg, adding cycles to searches for free buffers) is what you ought to be afraid of. Another possibly important point is that reducing the number of pin/unpin cycles for a given VM page might actually hurt the chances of it being found in shared buffers, because IIRC the usage_count is bumped once per pin/unpin. That algorithm is based on the assumption that buffer pins are not drastically different in lifespan, but I think you just broke that for pins on VM pages. I'm not particularly concerned about devising solutions for these problems, though, because I think this idea is a loser from the get-go; the increase in contention for VM pages is alone going to destroy any possible benefit. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Removing PD_ALL_VISIBLE
Follow up to discussion: http://archives.postgresql.org/pgsql-hackers/2012-11/msg00817.php I worked out a patch that replaces PD_ALL_VISIBLE with calls to visibilitymap_test. It rips out a lot of complexity, with a net drop of about 300 lines (not a lot, but some of that code is pretty complex). The patch is quite rough, and I haven't yet added the code to keep the VM buffer pins around longer, but I still don't see any major problems. I'm fairly certain that scans will not be adversely affected, and I think that inserts/updates/deletes should be fine as well. The main worry I have with inserts/updates/deletes is that there will be less spatial locality for allocating new buffers or for modifying existing buffers. So keeping a pinned VM buffer might not help as much, because it might need to pin a different one, anyway. Adding to January commitfest, as it's past 11/15. Regards, Jeff Davis rm-pd-all-visible-20121121.patch.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers