Re: [HACKERS] Page-level version upgrade (was: Block-level CRC checks)
David Fetter wrote: Right. There were two basic approaches to handling a patch that would expand when upgraded to the new version --- either allow the system to write the old format, or have a pre-upgrade script that moved tuples so there was guaranteed enough free space in every page for the new format. I think we agreed that the later was better than the former, and it was easy because we don't have any need for that at this time. Plus the script would not rewrite every page, just certain pages that required it. Please forgive me for barging in here, but that approach simply is untenable if it requires that the database be down while those pages are being found, marked, moved around, etc. The data volumes that really concern people who need an in-place upgrade are such that even dd if=$PGDATA of=/dev/null bs=8192 # (or whatever the optimal block size would be) would require *much* more time than such people would accept as a down time window, and while that's a lower bound, it's not a reasonable lower bound on the time. Well, you can say it is unacceptable, but if there are no other options then that is all we can offer. My main point is that we should consider writing old format pages only when we have no choice (page size might expand), and even then, we might decide to have a pre-migration script because the code impact of writing the old format would be too great. This is all hypothetical until we have a real use-case. If this re-jiggering could kick off in the background at start and work on a running PostgreSQL, the whole objection goes away. A problem that arises for any in-place upgrade system we do is that if someone's at 99% storage capacity, we can pretty well guarantee some kind of catastrophic failure. Could we create some way to get an estimate of space needed, given that the system needs to stay up while that's happening? Yea, the database would expand and hopefully have full transaction semantics. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Page-level version upgrade (was: Block-level CRC checks)
On Tue, Dec 1, 2009 at 11:45 PM, Bruce Momjian br...@momjian.us wrote: Robert Haas wrote: The key issue, as I think Heikki identified at the time, is to figure out how you're eventually going to get rid of the old pages. ?He proposed running a pre-upgrade utility on each page to reserve the right amount of free space. http://archives.postgresql.org/pgsql-hackers/2008-11/msg00208.php Right. ?There were two basic approaches to handling a patch that would expand when upgraded to the new version --- either allow the system to write the old format, or have a pre-upgrade script that moved tuples so there was guaranteed enough free space in every page for the new format. I think we agreed that the later was better than the former, and it was easy because we don't have any need for that at this time. ?Plus the script would not rewrite every page, just certain pages that required it. While I'm always willing to be proven wrong, I think it's a complete dead-end to believe that it's going to be easier to reserve space for page expansion using the upgrade-from version rather than the upgrade-to version. I am firmly of the belief that the NEW pg version must be able to operate on an unmodified heap migrated from the OLD pg version. After this set of patches was rejected, Zdenek actually Does it need to write the old version, and if it does, it has to carry around the old format structures all over the backend? That was the unclear part. I think it needs partial write support for the old version. If the page is not expanding, then you can probably just replace pages in place. But if the page is expanding, then you need to be able to move individual tuples[1]. Since you want to be up and running while that's happening, I think you probably need to be able to update xmax and probably set hit bints. But you don't need to be able to add tuples to the old page format, and I don't think you need complete vacuum support, since you don't plan to reuse the dead space - you'll just recycle the whole page once the tuples are all dead. As for carrying it around the whole backend, I'm not sure how much of the backend really needs to know. It would only be anything that looks at pages, rather than, say, tuples, but I don't really know how much code that touches. I suppose that's one of the things we need to figure out. [1] Unless, of course, you use a pre-upgrade utility. But this is about how to make it work WITHOUT a pre-upgrade utility. proposed an alternate patch that would have allowed space reservation, and it was rejected precisely because there was no clear certainty that it would solve any hypothetical future problem. True. It was solving a problem we didn't have, yet. Well, that's sort of a circular argument. If you're going to reserve space with a pre-upgrade utility, you're going to need to put the pre-upgrade utility into the version you want to upgrade FROM. If we wanted to be able to use a pre-upgrade utility to upgrade to 8.5, we would have had to put the utility into 8.4. The problem I'm referring to is that there is no guarantee that you would be able predict how much space to reserve. In a case like CRCs, it may be as simple as 4 bytes. But what if, say, we switch to a different compression algorithm for inline toast? Some pages will contract, others will expand, but there's no saying by how much - and therefore no fixed amount of reserved space is guaranteed to be adequate. It's true that we might never want to do that particular thing, but I don't think we can say categorically that we'll NEVER want to do anything that expands pages by an unpredictable amount. So it might be quite complex to figure out how much space to reserve on any given page. If we can find a way to make that the NEW PG version's problem, it's still complicated, but at least it's not complicated stuff that has to be backpatched. Another problem with a pre-upgrade utility is - how do you verify, when you fire up the new cluster, that the pre-upgrade utility has done its thing? If the new PG version requires 4 bytes of space reserved on each page, what happens when you get halfway through upgrading your 1TB database and find a page with only 2 bytes available? There aren't a lot of good options. The old PG version could try to mark the DB in some way to indicate whether it successfully completed, but what if there's a bug and something was missed? Then you have this scenario: 1. Run the pre-upgrade script. 2. pg_migrator. 3. Fire up new version. 4. Discover that pre-upgrade script forgot to reserve enough space on some page. 5. Report a bug. 6. Bug fixed, new version of pre-upgrade script is now available. 7. ??? If all the logic is in the new server, you may still be in hot water when you discover that it can't deal with a particular case. But hopefully the problem would be confined to that page, or that relation, and you could use the rest of your database. And
Re: [HACKERS] Page-level version upgrade (was: Block-level CRC checks)
On Wed, 2009-12-02 at 10:48 -0500, Robert Haas wrote: Well, that's sort of a circular argument. If you're going to reserve space with a pre-upgrade utility, you're going to need to put the pre-upgrade utility into the version you want to upgrade FROM. If we wanted to be able to use a pre-upgrade utility to upgrade to 8.5, we would have had to put the utility into 8.4. Don't see any need to reserve space at all. If this is really needed, we first run a script to prepare the 8.4 database for conversion to 8.5. The script would move things around if it finds a block that would have difficulty after upgrade. We may be able to do that simple, using fillfactor, or it may need to be more complex. Either way, its still easy to do this when required. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Page-level version upgrade (was: Block-level CRC checks)
On Wed, Dec 2, 2009 at 11:08 AM, Simon Riggs si...@2ndquadrant.com wrote: On Wed, 2009-12-02 at 10:48 -0500, Robert Haas wrote: Well, that's sort of a circular argument. If you're going to reserve space with a pre-upgrade utility, you're going to need to put the pre-upgrade utility into the version you want to upgrade FROM. If we wanted to be able to use a pre-upgrade utility to upgrade to 8.5, we would have had to put the utility into 8.4. Don't see any need to reserve space at all. If this is really needed, we first run a script to prepare the 8.4 database for conversion to 8.5. The script would move things around if it finds a block that would have difficulty after upgrade. We may be able to do that simple, using fillfactor, or it may need to be more complex. Either way, its still easy to do this when required. I discussed the problems with this, as I see them, in the same email you just quoted. You don't have to agree with my analysis, of course. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Page-level version upgrade (was: Block-level CRC checks)
Robert Haas wrote: The problem I'm referring to is that there is no guarantee that you would be able predict how much space to reserve. In a case like CRCs, it may be as simple as 4 bytes. But what if, say, we switch to a different compression algorithm for inline toast? Upthread, you made a perfectly sensible suggestion: use the CRC addition as a test case to confirm you can build something useful that allowed slightly more complicated in-place upgrades than are supported now. This requires some new code to do tuple shuffling, communicate reserved space, etc. All things that seem quite sensible to have available, useful steps toward a more comprehensive solution, and an achievable goal you wouldn't even have to argue about. Now, you're wandering us back down the path where we have to solve a migrate TOAST changes level problem in order to make progress. Starting with presuming you have to solve the hardest possible issue around is the documented path to failure here. We've seen multiple such solutions before, and they all had trade-offs deemed unacceptable: either a performance loss for everyone (not just people upgrading), or unbearable code complexity. There's every reason to believe your reinvention of the same techniques will suffer the same fate. When someone has such a change to be made, maybe you could bring this back up again and gain some traction. One of the big lessons I took from the 8.4 development's lack of progress on this class of problem: no work to make upgrades easier will get accepted unless there is such an upgrade on the table that requires it. You need a test case to make sure the upgrade approach a) works as expected, and b) is code you must commit now or in-place upgrade is lost. Anything else will be deferred; I don't think there's any interest in solving a speculative future problem left at this point, given that it will be code we can't even prove will work. Another problem with a pre-upgrade utility is - how do you verify, when you fire up the new cluster, that the pre-upgrade utility has done its thing? Some additional catalog support was suggested to mark what the pre-upgrade utility had processed. I'm sure I could find the messages about again if I had to. If all the logic is in the new server, you may still be in hot water when you discover that it can't deal with a particular case. If you can't design a pre-upgrade script without showstopper bugs, what makes you think the much more complicated code in the new server (which will be carrying around an ugly mess of old and new engine parts) will work as advertised? I think we'll be lucky to get the simplest possible scheme implemented, and that any of these more complicated ones will die under their own weight of their complexity. Also, your logic seems to presume that no backports are possible to the old server. A bug-fix to the pre-upgrade script is a completely reasonable and expected candidate for backporting, because it will be such a targeted piece of code that adjusting it shouldn't impact anything else. The same will not be even remotely true if there's a bug fix needed in a more complicated system that lives in a regularly traversed code path. Having such a tightly targeted chunk of code makes pre-upgrade *more* likely to get bug-fix backports, because you won't be touching code executed by regular users at all. The potential code impact of backporting fixes to the more complicated approaches here is another major obstacle to adopting one of them. That's an issue that we didn't even get to the last time, because showstopper issues popped up first. That problem was looming had work continued down that path though. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Page-level version upgrade (was: Block-level CRC checks)
On Wed, Dec 2, 2009 at 1:08 PM, Greg Smith g...@2ndquadrant.com wrote: Robert Haas wrote: The problem I'm referring to is that there is no guarantee that you would be able predict how much space to reserve. In a case like CRCs, it may be as simple as 4 bytes. But what if, say, we switch to a different compression algorithm for inline toast? Upthread, you made a perfectly sensible suggestion: use the CRC addition as a test case to confirm you can build something useful that allowed slightly more complicated in-place upgrades than are supported now. This requires some new code to do tuple shuffling, communicate reserved space, etc. All things that seem quite sensible to have available, useful steps toward a more comprehensive solution, and an achievable goal you wouldn't even have to argue about. Now, you're wandering us back down the path where we have to solve a migrate TOAST changes level problem in order to make progress. Starting with presuming you have to solve the hardest possible issue around is the documented path to failure here. We've seen multiple such solutions before, and they all had trade-offs deemed unacceptable: either a performance loss for everyone (not just people upgrading), or unbearable code complexity. There's every reason to believe your reinvention of the same techniques will suffer the same fate. Just to set the record straight, I don't intend to work on this problem at all (unless paid, of course). And I'm perfectly happy to go with whatever workable solution someone else comes up with. I'm just offering opinions on what I see as the advantages and disadvantages of different approaches, and anyone is working on this is more than free to ignore them. Some additional catalog support was suggested to mark what the pre-upgrade utility had processed. I'm sure I could find the messages about again if I had to. And that's a perfectly sensible solution, except that adding a catalog column to 8.4 at this point would force initdb, so that's a non-starter. I suppose we could shoehorn it into the reloptions. Also, your logic seems to presume that no backports are possible to the old server. The problem on the table at the moment is that the proposed CRC feature will expand every page by a uniform amount - so in this case a fixed-space-per-page reservation utility would be completely adequate. Does anyone think this is a realistic thing to backport to 8.4? ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Page-level version upgrade (was: Block-level CRC checks)
Robert Haas wrote: Some additional catalog support was suggested to mark what the pre-upgrade utility had processed. I'm sure I could find the messages about again if I had to. And that's a perfectly sensible solution, except that adding a catalog column to 8.4 at this point would force initdb, so that's a non-starter. I suppose we could shoehorn it into the reloptions. There's no reason the associated catalog support had to ship with the old version. You can always modify the catalog after initdb, but before running the pre-upgrade utility. pg_migrator might make that change for you. The problem on the table at the moment is that the proposed CRC feature will expand every page by a uniform amount - so in this case a fixed-space-per-page reservation utility would be completely adequate. Does anyone think this is a realistic thing to backport to 8.4? I believe the main problem here is making sure that the server doesn't turn around and fill pages right back up again. The logic that needs to show up here has two parts: 1) Don't fill new pages completely up, save the space that will be needed in the new version 2) Find old pages that are filled and free some space on them The pre-upgrade utility we've been talking about does (2), and that's easy to imagine implementing as an add-on module rather than a backport. I don't know how (1) can be done in a way such that it's easily backported to 8.4. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Page-level version upgrade (was: Block-level CRC checks)
On Wed, Dec 2, 2009 at 1:56 PM, Greg Smith g...@2ndquadrant.com wrote: Robert Haas wrote: Some additional catalog support was suggested to mark what the pre-upgrade utility had processed. I'm sure I could find the messages about again if I had to. And that's a perfectly sensible solution, except that adding a catalog column to 8.4 at this point would force initdb, so that's a non-starter. I suppose we could shoehorn it into the reloptions. There's no reason the associated catalog support had to ship with the old version. You can always modify the catalog after initdb, but before running the pre-upgrade utility. pg_migrator might make that change for you. Uh, really? I don't think that's possible at all. The problem on the table at the moment is that the proposed CRC feature will expand every page by a uniform amount - so in this case a fixed-space-per-page reservation utility would be completely adequate. Does anyone think this is a realistic thing to backport to 8.4? I believe the main problem here is making sure that the server doesn't turn around and fill pages right back up again. The logic that needs to show up here has two parts: 1) Don't fill new pages completely up, save the space that will be needed in the new version 2) Find old pages that are filled and free some space on them The pre-upgrade utility we've been talking about does (2), and that's easy to imagine implementing as an add-on module rather than a backport. I don't know how (1) can be done in a way such that it's easily backported to 8.4. Me neither. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Page-level version upgrade (was: Block-level CRC checks)
Robert Haas wrote: On Wed, Dec 2, 2009 at 1:56 PM, Greg Smith g...@2ndquadrant.com wrote: There's no reason the associated catalog support had to ship with the old version. You can always modify the catalog after initdb, but before running the pre-upgrade utility. pg_migrator might make that change for you. Uh, really? I don't think that's possible at all. Worst case just to get this bootstrapped: you install a new table with the added bits. Old version page upgrader accounts for itself there. pg_migrator dumps that data and then loads it into its new, correct home on the newer version. There's already stuff like that being done anyway--dumping things from the old catalog and inserting into the new one--and if the origin is actually an add-on rather than an original catalog page it doesn't really matter. As long as the new version can see the info it needs in its catalog it doesn't matter how it got to there; that's the one that needs to check the migration status before it can access things outside of the catalog. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com
Re: [HACKERS] Page-level version upgrade (was: Block-level CRC checks)
On Wed, Dec 2, 2009 at 2:27 PM, Greg Smith g...@2ndquadrant.com wrote: Robert Haas wrote: On Wed, Dec 2, 2009 at 1:56 PM, Greg Smith g...@2ndquadrant.com wrote: There's no reason the associated catalog support had to ship with the old version. You can always modify the catalog after initdb, but before running the pre-upgrade utility. pg_migrator might make that change for you. Uh, really? I don't think that's possible at all. Worst case just to get this bootstrapped: you install a new table with the added bits. Old version page upgrader accounts for itself there. pg_migrator dumps that data and then loads it into its new, correct home on the newer version. There's already stuff like that being done anyway--dumping things from the old catalog and inserting into the new one--and if the origin is actually an add-on rather than an original catalog page it doesn't really matter. As long as the new version can see the info it needs in its catalog it doesn't matter how it got to there; that's the one that needs to check the migration status before it can access things outside of the catalog. That might work. I think that in order to get a fixed OID for the new catalog you would need to run a backend in bootstrap mode, which might (not sure) require shutting down the database first. But it sounds doable. There remains the issue of whether it is reasonable to think about backpatching such a thing, and whether doing so is easier/better than dealing with page expansion in the new server. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Page-level version upgrade (was: Block-level CRC checks)
On Wed, Dec 2, 2009 at 6:34 PM, Robert Haas robertmh...@gmail.com wrote: Also, your logic seems to presume that no backports are possible to the old server. The problem on the table at the moment is that the proposed CRC feature will expand every page by a uniform amount - so in this case a fixed-space-per-page reservation utility would be completely adequate. Does anyone think this is a realistic thing to backport to 8.4? This whole discussion is based on assumptions which do not match my recollection of the old discussion. I would suggest people go back and read the emails but it's clear at least some people have so it seems people get different things out of those old emails. My recollection of Tom and Heikki's suggestions for Zdenek were as follows: 1) When 8.9.0 comes out we also release an 8.8.x which contains a new guc which says to prepare for an 8.9 update. If that guc is set then any new pages are guaranteed to have enough space for 8.9.0 which could be as simple as guaranteeing there are x bytes of free space, in the case of the CRC it's actually *not* a uniform amount of free space if we go with Tom's design of having a variable chunk which moves around but it's still just a simple arithmetic to determine if there's enough free space on the page for a new tuple so it would be simple enough to backport. 2) When you want to prepare a database for upgrade you run the precheck script which first of all makes sure you're running 8.8.x and that the flag is set. Then it checks the free space on every page to ensure it's satisfactory. If not then it can do a noop update to any tuple on the page which the new free space calculation would guarantee would go to a new page. Then you have to wait long enough and vacuum. 3) Then you run pg_migrator which swaps in the new catalog files. 4) Then you shut down and bring up 8.9.0 which on reading any page *immediately* converts it to 8.9.0 format. 5) You would eventually also need some program which processes every page and guarantees to write it back out in the new format. Otherwise there will be pages that you never stop reconverting every time they're read. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Page-level version upgrade (was: Block-level CRC checks)
Greg Stark gsst...@mit.edu writes: This whole discussion is based on assumptions which do not match my recollection of the old discussion. I would suggest people go back and read the emails but it's clear at least some people have so it seems people get different things out of those old emails. My recollection of Tom and Heikki's suggestions for Zdenek were as follows: 1) When 8.9.0 comes out we also release an 8.8.x which contains a new guc which says to prepare for an 8.9 update. Yeah, I think the critical point is not to assume that the behavior of the old system is completely set in stone. We can insist that you must update to at least point release .N before beginning the migration process. That gives us a chance to backpatch code that makes adjustments to the behavior of the old server, so long as the backpatch isn't invasive enough to raise stability concerns. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Page-level version upgrade (was: Block-level CRC checks)
On Wed, Dec 2, 2009 at 3:48 PM, Tom Lane t...@sss.pgh.pa.us wrote: Greg Stark gsst...@mit.edu writes: This whole discussion is based on assumptions which do not match my recollection of the old discussion. I would suggest people go back and read the emails but it's clear at least some people have so it seems people get different things out of those old emails. My recollection of Tom and Heikki's suggestions for Zdenek were as follows: 1) When 8.9.0 comes out we also release an 8.8.x which contains a new guc which says to prepare for an 8.9 update. Yeah, I think the critical point is not to assume that the behavior of the old system is completely set in stone. We can insist that you must update to at least point release .N before beginning the migration process. That gives us a chance to backpatch code that makes adjustments to the behavior of the old server, so long as the backpatch isn't invasive enough to raise stability concerns. If we have consensus on that approach, I'm fine with it. I just don't want one of the people who wants this CRC feature to go to a lot of trouble to develop a space reservation system that has to be backpatched to 8.4, and then have the patch rejected as too potentially destabilizing. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Page-level version upgrade (was: Block-level CRC checks)
On Tue, Dec 1, 2009 at 9:58 PM, decibel deci...@decibel.org wrote: What happened to the work that was being done to allow a page to be upgraded on the fly when it was read in from disk? There were no page level changes between 8.3 and 8.4. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Page-level version upgrade (was: Block-level CRC checks)
Greg Stark wrote: On Tue, Dec 1, 2009 at 9:58 PM, decibel deci...@decibel.org wrote: What happened to the work that was being done to allow a page to be upgraded on the fly when it was read in from disk? There were no page level changes between 8.3 and 8.4. Yea, we have the idea of how to do it (in cases where the page size doesn't increase), but no need to implement it in 8.3 to 8.4. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Page-level version upgrade (was: Block-level CRC checks)
On Tue, Dec 1, 2009 at 5:15 PM, Greg Stark gsst...@mit.edu wrote: On Tue, Dec 1, 2009 at 9:58 PM, decibel deci...@decibel.org wrote: What happened to the work that was being done to allow a page to be upgraded on the fly when it was read in from disk? There were no page level changes between 8.3 and 8.4. That's true, but I don't think it's the full and complete answer to the question. Zdenek submitted a page for CF 2008-11 which attempted to add support for multiple page versions. I guess we're on v4 right now, and he was attempting to add support for v3 pages, which would have allowed reading in pages from old PG versions. To put it bluntly, the code wasn't anything I would have wanted to deploy, but the reason why Zdenek gave up on fixing it was because several community members considerably senior to myself provided negative feedback on the concept. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Page-level version upgrade (was: Block-level CRC checks)
Robert Haas wrote: On Tue, Dec 1, 2009 at 5:15 PM, Greg Stark gsst...@mit.edu wrote: On Tue, Dec 1, 2009 at 9:58 PM, decibel deci...@decibel.org wrote: What happened to the work that was being done to allow a page to be upgraded on the fly when it was read in from disk? There were no page level changes between 8.3 and 8.4. That's true, but I don't think it's the full and complete answer to the question. Zdenek submitted a page for CF 2008-11 which attempted to add support for multiple page versions. I guess we're on v4 right now, and he was attempting to add support for v3 pages, which would have allowed reading in pages from old PG versions. To put it bluntly, the code wasn't anything I would have wanted to deploy, but the reason why Zdenek gave up on fixing it was because several community members considerably senior to myself provided negative feedback on the concept. Well, there were quite a number of open issues relating to page conversion: o Do we write the old version or just convert on read? o How do we write pages that get larger on conversion to the new format? As I rember the patch allowed read/wite of old versions, which greatly increased its code impact. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Page-level version upgrade (was: Block-level CRC checks)
On Tue, Dec 1, 2009 at 9:31 PM, Bruce Momjian br...@momjian.us wrote: Robert Haas wrote: On Tue, Dec 1, 2009 at 5:15 PM, Greg Stark gsst...@mit.edu wrote: On Tue, Dec 1, 2009 at 9:58 PM, decibel deci...@decibel.org wrote: What happened to the work that was being done to allow a page to be upgraded on the fly when it was read in from disk? There were no page level changes between 8.3 and 8.4. That's true, but I don't think it's the full and complete answer to the question. Zdenek submitted a page for CF 2008-11 which attempted to add support for multiple page versions. I guess we're on v4 right now, and he was attempting to add support for v3 pages, which would have allowed reading in pages from old PG versions. To put it bluntly, the code wasn't anything I would have wanted to deploy, but the reason why Zdenek gave up on fixing it was because several community members considerably senior to myself provided negative feedback on the concept. Well, there were quite a number of open issues relating to page conversion: o Do we write the old version or just convert on read? o How do we write pages that get larger on conversion to the new format? As I rember the patch allowed read/wite of old versions, which greatly increased its code impact. Oh, for sure there were plenty of issues with the patch, starting with the fact that the way it was set up led to unacceptable performance and code complexity trade-offs. Some of my comments from the time: http://archives.postgresql.org/pgsql-hackers/2008-11/msg00149.php http://archives.postgresql.org/pgsql-hackers/2008-11/msg00152.php But the point is that the concept, I think, is basically the right one: you have to be able to read and make sense of the contents of old page versions. There is room, at least in my book, for debate about which operations we should support on old pages. Totally read only? Set hit bits? Kill old tuples? Add new tuples? The key issue, as I think Heikki identified at the time, is to figure out how you're eventually going to get rid of the old pages. He proposed running a pre-upgrade utility on each page to reserve the right amount of free space. http://archives.postgresql.org/pgsql-hackers/2008-11/msg00208.php I don't like that solution. If the pre-upgrade utility is something that has to be run while the database is off-line, then it defeats the point of an in-place upgrade. If it can be run while the database is up, I fear it will need to be deeply integrated into the server. And since we can't know the requirements for how much space to reserve (and it needn't be a constant) until we design the new feature, this will likely mean backpatching a rather large chunk of complex code, which to put it mildly, is not the sort of thing we normally would even consider. I think a better approach is to support reading tuples from old pages, but to write all new tuples into new pages. A full-table rewrite (like UPDATE foo SET x = x, CLUSTER, etc.) can be used to propel everything to the new version, with the usual tricks for people who need to rewrite the table a piece at a time. But, this is not religion for me. I'm fine with some other design; I just can't presently see how to make it work. I think the present discussion of CRC checks is an excellent test-case for any and all ideas about how to solve this problem. If someone can get a patch committed than can convert the 8.4 page format to an 8.5 format with the hint bits shuffled around a (hopefully optional) CRC added, I think that'll become the de facto standard for how to handle page format upgrades. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Page-level version upgrade (was: Block-level CRC checks)
Robert Haas wrote: Well, there were quite a number of open issues relating to page conversion: ? ? ? ?o ?Do we write the old version or just convert on read? ? ? ? ?o ?How do we write pages that get larger on conversion to the ? ? ? ? ? new format? As I rember the patch allowed read/wite of old versions, which greatly increased its code impact. Oh, for sure there were plenty of issues with the patch, starting with the fact that the way it was set up led to unacceptable performance and code complexity trade-offs. Some of my comments from the time: http://archives.postgresql.org/pgsql-hackers/2008-11/msg00149.php http://archives.postgresql.org/pgsql-hackers/2008-11/msg00152.php But the point is that the concept, I think, is basically the right one: you have to be able to read and make sense of the contents of old page versions. There is room, at least in my book, for debate about which operations we should support on old pages. Totally read only? Set hit bits? Kill old tuples? Add new tuples? I think part of the problem is there was no agreement before the patch was coded and submitted, and there didn't seem to be much desire from the patch author to adjust it, nor demand from the community because we didn't need it yet. The key issue, as I think Heikki identified at the time, is to figure out how you're eventually going to get rid of the old pages. He proposed running a pre-upgrade utility on each page to reserve the right amount of free space. http://archives.postgresql.org/pgsql-hackers/2008-11/msg00208.php Right. There were two basic approaches to handling a patch that would expand when upgraded to the new version --- either allow the system to write the old format, or have a pre-upgrade script that moved tuples so there was guaranteed enough free space in every page for the new format. I think we agreed that the later was better than the former, and it was easy because we don't have any need for that at this time. Plus the script would not rewrite every page, just certain pages that required it. I don't like that solution. If the pre-upgrade utility is something that has to be run while the database is off-line, then it defeats the point of an in-place upgrade. If it can be run while the database is up, I fear it will need to be deeply integrated into the server. And since we can't know the requirements for how much space to reserve (and it needn't be a constant) until we design the new feature, this will likely mean backpatching a rather large chunk of complex code, which to put it mildly, is not the sort of thing we normally would even consider. I think a better approach is to support reading tuples from old pages, but to write all new tuples into new pages. A full-table rewrite (like UPDATE foo SET x = x, CLUSTER, etc.) can be used to propel everything to the new version, with the usual tricks for people who need to rewrite the table a piece at a time. But, this is not religion for me. I'm fine with some other design; I just can't presently see how to make it work. Well, perhaps the text I wrote above will clarify that the upgrade script is only for page expansion --- it is not to rewrite every page into the new format. I think the present discussion of CRC checks is an excellent test-case for any and all ideas about how to solve this problem. If someone can get a patch committed than can convert the 8.4 page format to an 8.5 format with the hint bits shuffled around a (hopefully optional) CRC added, I think that'll become the de facto standard for how to handle page format upgrades. Well, yea, the idea would be that the 8.5 server would either convert the page to the new format on read (assuming there is enough free space, perhaps requiring a pre-upgrade script), or have the server write the page in the old 8.4 format and not do CRC checks on the page. My guess is the former. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Page-level version upgrade (was: Block-level CRC checks)
Robert Haas wrote: If the pre-upgrade utility is something that has to be run while the database is off-line, then it defeats the point of an in-place upgrade. If it can be run while the database is up, I fear it will need to be deeply integrated into the server. And since we can't know the requirements for how much space to reserve (and it needn't be a constant) until we design the new feature, this will likely mean backpatching a rather large chunk of complex code, which to put it mildly, is not the sort of thing we normally would even consider. You're wandering into the sort of overdesign that isn't really needed yet. For now, presume it's a constant amount of overhead, and that the release notes for the new version will say configure the pre-upgrade utility and tell it you need x bytes of space reserved. That's sufficient for the CRC case, right? Needs a few more bytes per page, 8.5 release notes could say exactly how much. Solve that before making things more complicated by presuming you need to solve the variable-size increase problem, too. We'll be lucky to get the absolute simplest approach committed, you really need to have a big smoking gun to justify feature creep in this area. (If I had to shoot from the hip and design for the variable case, why not just make the thing that determines how much space a given page needs reserved a function the user can re-install with a smarter version?) I think a better approach is to support reading tuples from old pages, but to write all new tuples into new pages. A full-table rewrite (like UPDATE foo SET x = x, CLUSTER, etc.) can be used to propel everything to the new version, with the usual tricks for people who need to rewrite the table a piece at a time. I think you're oversimplifying the operational difficulty of the usual tricks. This is a painful approach for the exact people who need this the most: people with a live multi-TB installation they can't really afford to add too much load to. The beauty of the in-place upgrade tool just converting pages as it scans through looking for them is that you can dial up its intensity to exactly how much overhead you can stand, and let it loose until it's done. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Page-level version upgrade (was: Block-level CRC checks)
On Tue, Dec 1, 2009 at 10:34 PM, Bruce Momjian br...@momjian.us wrote: Robert Haas wrote: Well, there were quite a number of open issues relating to page conversion: ? ? ? ?o ?Do we write the old version or just convert on read? ? ? ? ?o ?How do we write pages that get larger on conversion to the ? ? ? ? ? new format? As I rember the patch allowed read/wite of old versions, which greatly increased its code impact. Oh, for sure there were plenty of issues with the patch, starting with the fact that the way it was set up led to unacceptable performance and code complexity trade-offs. Some of my comments from the time: http://archives.postgresql.org/pgsql-hackers/2008-11/msg00149.php http://archives.postgresql.org/pgsql-hackers/2008-11/msg00152.php But the point is that the concept, I think, is basically the right one: you have to be able to read and make sense of the contents of old page versions. There is room, at least in my book, for debate about which operations we should support on old pages. Totally read only? Set hit bits? Kill old tuples? Add new tuples? I think part of the problem is there was no agreement before the patch was coded and submitted, and there didn't seem to be much desire from the patch author to adjust it, nor demand from the community because we didn't need it yet. Could be. It's water under the bridge at this point. The key issue, as I think Heikki identified at the time, is to figure out how you're eventually going to get rid of the old pages. He proposed running a pre-upgrade utility on each page to reserve the right amount of free space. http://archives.postgresql.org/pgsql-hackers/2008-11/msg00208.php Right. There were two basic approaches to handling a patch that would expand when upgraded to the new version --- either allow the system to write the old format, or have a pre-upgrade script that moved tuples so there was guaranteed enough free space in every page for the new format. I think we agreed that the later was better than the former, and it was easy because we don't have any need for that at this time. Plus the script would not rewrite every page, just certain pages that required it. While I'm always willing to be proven wrong, I think it's a complete dead-end to believe that it's going to be easier to reserve space for page expansion using the upgrade-from version rather than the upgrade-to version. I am firmly of the belief that the NEW pg version must be able to operate on an unmodified heap migrated from the OLD pg version. After this set of patches was rejected, Zdenek actually proposed an alternate patch that would have allowed space reservation, and it was rejected precisely because there was no clear certainty that it would solve any hypothetical future problem. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Page-level version upgrade (was: Block-level CRC checks)
On Tue, Dec 1, 2009 at 10:45 PM, Greg Smith g...@2ndquadrant.com wrote: Robert Haas wrote: If the pre-upgrade utility is something that has to be run while the database is off-line, then it defeats the point of an in-place upgrade. If it can be run while the database is up, I fear it will need to be deeply integrated into the server. And since we can't know the requirements for how much space to reserve (and it needn't be a constant) until we design the new feature, this will likely mean backpatching a rather large chunk of complex code, which to put it mildly, is not the sort of thing we normally would even consider. You're wandering into the sort of overdesign that isn't really needed yet. For now, presume it's a constant amount of overhead, and that the release notes for the new version will say configure the pre-upgrade utility and tell it you need x bytes of space reserved. That's sufficient for the CRC case, right? Needs a few more bytes per page, 8.5 release notes could say exactly how much. Solve that before making things more complicated by presuming you need to solve the variable-size increase problem, too. We'll be lucky to get the absolute simplest approach committed, you really need to have a big smoking gun to justify feature creep in this area. Well, I think the best way to solve the problem is to design the system in a way that makes it unnecessary to have a pre-upgrade tool at all, by making the new PG version capable of handling page expansion where needed. I don't understand how putting that functionality into the OLD PG version can be better. But I may be misunderstanding something. (If I had to shoot from the hip and design for the variable case, why not just make the thing that determines how much space a given page needs reserved a function the user can re-install with a smarter version?) That's a pretty good idea. I have no love of this pre-upgrade concept, but if we're going to do it that way, then allowing someone to load in a function to compute the required amount of free space to reserve is a good thought. I think a better approach is to support reading tuples from old pages, but to write all new tuples into new pages. A full-table rewrite (like UPDATE foo SET x = x, CLUSTER, etc.) can be used to propel everything to the new version, with the usual tricks for people who need to rewrite the table a piece at a time. I think you're oversimplifying the operational difficulty of the usual tricks. This is a painful approach for the exact people who need this the most: people with a live multi-TB installation they can't really afford to add too much load to. The beauty of the in-place upgrade tool just converting pages as it scans through looking for them is that you can dial up its intensity to exactly how much overhead you can stand, and let it loose until it's done. Fair enough. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Page-level version upgrade (was: Block-level CRC checks)
Robert Haas wrote: The key issue, as I think Heikki identified at the time, is to figure out how you're eventually going to get rid of the old pages. ?He proposed running a pre-upgrade utility on each page to reserve the right amount of free space. http://archives.postgresql.org/pgsql-hackers/2008-11/msg00208.php Right. ?There were two basic approaches to handling a patch that would expand when upgraded to the new version --- either allow the system to write the old format, or have a pre-upgrade script that moved tuples so there was guaranteed enough free space in every page for the new format. I think we agreed that the later was better than the former, and it was easy because we don't have any need for that at this time. ?Plus the script would not rewrite every page, just certain pages that required it. While I'm always willing to be proven wrong, I think it's a complete dead-end to believe that it's going to be easier to reserve space for page expansion using the upgrade-from version rather than the upgrade-to version. I am firmly of the belief that the NEW pg version must be able to operate on an unmodified heap migrated from the OLD pg version. After this set of patches was rejected, Zdenek actually Does it need to write the old version, and if it does, it has to carry around the old format structures all over the backend? That was the unclear part. proposed an alternate patch that would have allowed space reservation, and it was rejected precisely because there was no clear certainty that it would solve any hypothetical future problem. True. It was solving a problem we didn't have, yet. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Page-level version upgrade (was: Block-level CRC checks)
On Tue, Dec 01, 2009 at 10:34:11PM -0500, Bruce Momjian wrote: Robert Haas wrote: The key issue, as I think Heikki identified at the time, is to figure out how you're eventually going to get rid of the old pages. He proposed running a pre-upgrade utility on each page to reserve the right amount of free space. http://archives.postgresql.org/pgsql-hackers/2008-11/msg00208.php Right. There were two basic approaches to handling a patch that would expand when upgraded to the new version --- either allow the system to write the old format, or have a pre-upgrade script that moved tuples so there was guaranteed enough free space in every page for the new format. I think we agreed that the later was better than the former, and it was easy because we don't have any need for that at this time. Plus the script would not rewrite every page, just certain pages that required it. Please forgive me for barging in here, but that approach simply is untenable if it requires that the database be down while those pages are being found, marked, moved around, etc. The data volumes that really concern people who need an in-place upgrade are such that even dd if=$PGDATA of=/dev/null bs=8192 # (or whatever the optimal block size would be) would require *much* more time than such people would accept as a down time window, and while that's a lower bound, it's not a reasonable lower bound on the time. If this re-jiggering could kick off in the background at start and work on a running PostgreSQL, the whole objection goes away. A problem that arises for any in-place upgrade system we do is that if someone's at 99% storage capacity, we can pretty well guarantee some kind of catastrophic failure. Could we create some way to get an estimate of space needed, given that the system needs to stay up while that's happening? Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers