The long-lost pg_upgrade (was:Re: [HACKERS] 8.2 features status)
On Friday 04 August 2006 02:20, Josh Berkus wrote: grin Aren't I, the marketing geek, supposed to be the one whining about this? [snip] * In-place upgrades (pg_upgrade) BTW, I may get Sun to contribute an engineer for this; will get you posted. Long time no post. This statement really caught my attention; bravo if true upgrading can happen, and someone can be put on it and do it right. As Tom said, a little farther down the thread, we have talked over this many times. I specifically remember, oh, about a dozen times I personally have 'gadflied' this issue. As one who now has a, let's see: [EMAIL PROTECTED] ~]# du /var/lib/pgsql/data -s 16668528/var/lib/pgsql/data [EMAIL PROTECTED] ~]# Yes, a 16GB inventory database, with in-database large object images. Anyway, as one who does not look forward to migrating this the old-fashioned way (I can just imagine how fas^H^H^Hslow a restore of all those large objects is going to be; backup is slow enough (about 50 minutes on this Xeon 2.4GHz box)), in place upgrade would cut this considerably; the database is not a complex one, just a largish one. It's, let's see, only holding a little less than 5,000 items with associated lo images (due to many factors, this is handled through ODBC from Microsoft Access; it is a kludge, and a big one, but it works very smoothly from the users' points of view, where item images are literally 'dragged and dropped' from the digital camera straight to the database). So, anyway, looking forward to seeing some progress in this department... :-) -- Lamar Owen Director of Information Technology Pisgah Astronomical Research Institute 1 PARI Drive Rosman, NC 28772 (828)862-5554 www.pari.edu ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: pg_upgrade (was: [HACKERS] 8.2 features status)
Tom Lane [EMAIL PROTECTED] writes: I don't think there is very much hope of an in-place upgrade for scenarios involving changes in contents of user tables. In particular, what of a change that requires more space than before, such as adding a locale indicator to text fields? There's no guarantee that the data on an old page will still fit, and there's certainly no hope of something operating at the xlog level being able to move tuples across pages --- if nothing else, because it's not in a position to compute new index entries. I don't see this working for page-at-a-time updates even in a full backend environment; again, indexes are the killer consideration. I don't see how to get sane behavior from an index containing some old-style entries and some new-style ones for a changed datatype. As you mentioned, the scenarios that look practical for in-place upgrade are the ones where only system catalog contents need to change. We've already discussed this (many times) and agreed that we could live with restricting user-table changes to happen only once every few releases. What of having support in the backend for two heap formats in each version. That is, 8.3 would be able to support 8.2 heaps as well as 8.3. There could be a flag in pg_class indicating the heap format for that heap. Then you would be able to upgrade without rewriting all your tables and the only requirement is that sometime prior to the next upgrade you issue a per-table ALTER TABLE that involves rewriting the table such as CLUSTER or ALTER COLUMN TYPE USING. That still requires 2x space but only for a single table at a time which is much better than 2x space for the entire database. It also lets you schedule that job for some point in time when you can arrange to have the extra space. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: pg_upgrade (was: [HACKERS] 8.2 features status)
Martijn van Oosterhout kleptog@svana.org writes: [ concerning handling a change in a single datatype's representation ] 1. Under old system, munge the system catalog to add code for new inet type with new OID. Probably needs a shared lib (if you could create type input/output function with pl/pgsql it would help here). 2. Execute ALTER COLUMN TYPE USING to switch to the new type. 3. Shutdown old server 4. Complete catalog changes 5. Startup new version Probably easier and more flexible just to include both versions of the datatype in the new release, and tell people they'd better ALTER COLUMN TYPE sometime before updating to release N+2. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: pg_upgrade (was: [HACKERS] 8.2 features status)
On Mon, Aug 07, 2006 at 08:37:14AM -0400, Tom Lane wrote: Probably easier and more flexible just to include both versions of the datatype in the new release, and tell people they'd better ALTER COLUMN TYPE sometime before updating to release N+2. Right, but I wasn't sure if that'd be considered acceptable. So, in a nutshell, if all the other issues of pg_upgrade get ironed out and the only blocker was a change in the on-disk format of a type, including both the old and the new versions in the next major release would be an option seriously considered. Well, that's got to make it easiesr. Have a nice day, -- Martijn van Oosterhout kleptog@svana.org http://svana.org/kleptog/ From each according to his ability. To each according to his ability to litigate. signature.asc Description: Digital signature
Re: pg_upgrade (was: [HACKERS] 8.2 features status)
On Mon, Aug 07, 2006 at 06:05:30AM -0400, Gregory Stark wrote: Tom Lane [EMAIL PROTECTED] writes: I don't think there is very much hope of an in-place upgrade for scenarios involving changes in contents of user tables. In particular, what of a change that requires more space than before, such as adding a locale indicator to text fields? There's no guarantee that the data on an old page will still fit, and there's certainly no hope of something operating at the xlog level being able to move tuples across pages --- if nothing else, because it's not in a position to compute new index entries. I don't see this working for page-at-a-time updates even in a full backend environment; again, indexes are the killer consideration. I don't see how to get sane behavior from an index containing some old-style entries and some new-style ones for a changed datatype. As you mentioned, the scenarios that look practical for in-place upgrade are the ones where only system catalog contents need to change. We've already discussed this (many times) and agreed that we could live with restricting user-table changes to happen only once every few releases. What of having support in the backend for two heap formats in each version. That is, 8.3 would be able to support 8.2 heaps as well as 8.3. There could be a flag in pg_class indicating the heap format for that heap. Then you would be able to upgrade without rewriting all your tables and the only requirement is that sometime prior to the next upgrade you issue a per-table ALTER TABLE that involves rewriting the table such as CLUSTER or ALTER COLUMN TYPE USING. That still requires 2x space but only for a single table at a time which is much better than 2x space for the entire database. It also lets you schedule that job for some point in time when you can arrange to have the extra space. Unless you make it so that the old-version pages get changed to the new version on-the-fly as they get dirtied. And please folks, yes, it's be insane to make the backend deal with more than one old version at a time. I think it's perfectly acceptable to tell folks that if they want to jump 4 major versions at once that they'll have to do it some other way. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: pg_upgrade (was: [HACKERS] 8.2 features status)
Hi, [ concerning handling a change in a single datatype's representation ] 1. Under old system, munge the system catalog to add code for new inet type with new OID. Probably needs a shared lib (if you could create type input/output function with pl/pgsql it would help here). 2. Execute ALTER COLUMN TYPE USING to switch to the new type. 3. Shutdown old server 4. Complete catalog changes 5. Startup new version Probably easier and more flexible just to include both versions of the datatype in the new release, and tell people they'd better ALTER COLUMN TYPE sometime before updating to release N+2. Sounds very nice to have for me! Add an easy way to see which columns are still in the old format, and upgrading will be so much easier. - Sander ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: pg_upgrade (was: [HACKERS] 8.2 features status)
Rick, The objective is to smoothly upgrade to the new version with minimal downtime. Thanks for jumping in. The different proposals as far as I can see are as follows: Proposal A - the big one time reformatting 1) shutdown the db 2) run a command that upgrades the data directory to the new format 3) start up the new postgres version with the new data dir Pros: only pg_upgrade (or whatever it's called) needs to know about the old and new formats, each version of postgres knows about it's format and that's it. The postgres code stays clean cons: your database is down while the upgrade takes place. This sucks because the people who need this are the same people who are trying to avoid downtime. It's faster than a dump/reload but it doesn't completely solve the problem, it just mitigates it. Yes, but it mitigates it *considerably.* The general idea is that doing an in-place binary upgrade should take 80% less time, and require only 10% (or so) extra disk space. This would be sufficient for most users with large databases; they can afford to be down from midnight to 3 am but not to be down for the whole weekend (as dump/reload requires for a 3tb database) nor do they have 100% + extra disk space and 20% extra CPU on the machine (as upgrade-by-replication requires). Plus, there are versions (like 8.2 looks to be) where there is *no* change in the file format. For these, pg_upgrade would need just to bootstrap the system, swap the system tables and header files, and restart. Seconds instead of hours. As PostgreSQL matures further, I predict that there will be more releases with no change in the file format, making this even more useful. Proposal B - the gradual upgrade 1) shutdown the db 2) start it back up with the new version of postgres 3) the new postgres version upgrades things in place as needed This is impractical, because the number of version-to-version compatibility libraries required will grow geometrically with time. We don't have to just accomodate the last version, but the last 4+ versions. Proposal C - PITR with in on the fly disk upgrades 1) setup PITR 2) run pg_upgrade on your latest backed up data directories 3) start up the new pg on that data directory in restartable recovery / read-only / hot-standby mode 4) update the recovery log importer so that it can update the log files on the fly as it applies them 5) failover to the hot standby as you normally would I don't think this is practical. Simon? -- Josh Berkus PostgreSQL @ Sun San Francisco ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: pg_upgrade (was: [HACKERS] 8.2 features status)
Josh Berkus josh@agliodbs.com writes: Proposal C - PITR with in on the fly disk upgrades 1) setup PITR 2) run pg_upgrade on your latest backed up data directories 3) start up the new pg on that data directory in restartable recovery / read-only / hot-standby mode 4) update the recovery log importer so that it can update the log files on the fly as it applies them 5) failover to the hot standby as you normally would I don't think this is practical. Simon? I don't think there is very much hope of an in-place upgrade for scenarios involving changes in contents of user tables. In particular, what of a change that requires more space than before, such as adding a locale indicator to text fields? There's no guarantee that the data on an old page will still fit, and there's certainly no hope of something operating at the xlog level being able to move tuples across pages --- if nothing else, because it's not in a position to compute new index entries. I don't see this working for page-at-a-time updates even in a full backend environment; again, indexes are the killer consideration. I don't see how to get sane behavior from an index containing some old-style entries and some new-style ones for a changed datatype. As you mentioned, the scenarios that look practical for in-place upgrade are the ones where only system catalog contents need to change. We've already discussed this (many times) and agreed that we could live with restricting user-table changes to happen only once every few releases. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
pg_upgrade (was: [HACKERS] 8.2 features status)
On Thu, Aug 03, 2006 at 11:20:48PM -0700, Josh Berkus wrote: * In-place upgrades (pg_upgrade) BTW, I may get Sun to contribute an engineer for this; will get you posted. How would such a thing handle changes to page formats? -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: pg_upgrade (was: [HACKERS] 8.2 features status)
* Jim C. Nasby ([EMAIL PROTECTED]) wrote: On Thu, Aug 03, 2006 at 11:20:48PM -0700, Josh Berkus wrote: * In-place upgrades (pg_upgrade) BTW, I may get Sun to contribute an engineer for this; will get you posted. How would such a thing handle changes to page formats? Couldn't this be done by converting a table/partial-table at a time? It wouldn't be something which could run while the system is live, but it'd probably take less time than dump/restore and wouldn't require double the disk space of the whole database... no? Thanks, Stephen signature.asc Description: Digital signature
Re: pg_upgrade (was: [HACKERS] 8.2 features status)
On Fri, Aug 04, 2006 at 02:12:16PM -0400, Stephen Frost wrote: * Jim C. Nasby ([EMAIL PROTECTED]) wrote: On Thu, Aug 03, 2006 at 11:20:48PM -0700, Josh Berkus wrote: * In-place upgrades (pg_upgrade) BTW, I may get Sun to contribute an engineer for this; will get you posted. How would such a thing handle changes to page formats? Couldn't this be done by converting a table/partial-table at a time? It wouldn't be something which could run while the system is live, but it'd probably take less time than dump/restore and wouldn't require double the disk space of the whole database... no? True, but if you're going to go about creating code that can deal with 2 different versions of on-disk data, why not go one better: put that code into the database itself, so that pages are converted on-the-fly as they're dirtied. That way you have *no* downtime (or almost no, anyway). -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: pg_upgrade (was: [HACKERS] 8.2 features status)
* Jim C. Nasby ([EMAIL PROTECTED]) wrote: On Fri, Aug 04, 2006 at 02:12:16PM -0400, Stephen Frost wrote: * Jim C. Nasby ([EMAIL PROTECTED]) wrote: How would such a thing handle changes to page formats? Couldn't this be done by converting a table/partial-table at a time? It wouldn't be something which could run while the system is live, but it'd probably take less time than dump/restore and wouldn't require double the disk space of the whole database... no? True, but if you're going to go about creating code that can deal with 2 different versions of on-disk data, why not go one better: put that code into the database itself, so that pages are converted on-the-fly as they're dirtied. That way you have *no* downtime (or almost no, anyway). Certainly a good idea but I'm really not sure that: a) we'd want the core server to have essentially cruft code and b) that it'd be anywhere near simple to make this actually work in the core system. Supporting two versions of the page format when the only goal is changing the formatting is quite a different thing than fully supporting two on-disk formats throughout the backend. Additionally, if you have something to convert the database wholesale then that really just needs to support 2 versions at a time, if it's piecemeal then you could end up in a situation where you have to support 3, 4, 5, however many prior page formats existed. If not then you have to have a way to force all the pages to be upgraded and you're unlikely to be able to do a whole lot more with the system while that's happening, thus adding alot of extra complication just to end up having everyone follow the 'pg_upgrade'-style path anyway... In the end though, I think once it's been done in pg_upgrade this could be revisited and considered and if enough people are interested in doing it they could probably reuse some of hte pg_upgrade code to make it happen. Thanks, Stephen signature.asc Description: Digital signature
Re: pg_upgrade (was: [HACKERS] 8.2 features status)
I had a few thoughts on this issue: The objective is to smoothly upgrade to the new version with minimal downtime. The different proposals as far as I can see are as follows: Proposal A - the big one time reformatting 1) shutdown the db 2) run a command that upgrades the data directory to the new format 3) start up the new postgres version with the new data dir Pros: only pg_upgrade (or whatever it's called) needs to know about the old and new formats, each version of postgres knows about it's format and that's it. The postgres code stays clean cons: your database is down while the upgrade takes place. This sucks because the people who need this are the same people who are trying to avoid downtime. It's faster than a dump/reload but it doesn't completely solve the problem, it just mitigates it. Proposal B - the gradual upgrade 1) shutdown the db 2) start it back up with the new version of postgres 3) the new postgres version upgrades things in place as needed Pros: very short downtime. only the time to shutdown the postgres version and start up the new one cons: postgres code gets filled with cruft. each version has to know about the old versions on disk data format and how to upgrade it. Until it is finished you will be left with a database that is part old format, part new format. This could introduce bugs for people who never needed the feature in the first place. Here is another proposal that I haven't heard anyone else suggest. My apologies in advance if it's obviously not workable or has already be discussed. Proposal C - PITR with in on the fly disk upgrades 1) setup PITR 2) run pg_upgrade on your latest backed up data directories 3) start up the new pg on that data directory in restartable recovery / read-only / hot-standby mode 4) update the recovery log importer so that it can update the log files on the fly as it applies them 5) failover to the hot standby as you normally would Pros: essentially no downtime, just any incidental time needed for the failover to occur. cruft in postgres main codebase is mimimized. It's limited to the log importer. All other parts of postgres are unaffected Cons: requires another server or double the disk space on the original server. Is this a problem for people with databases so large that a dump reload is unacceptable? Perhaps there are technical issues with postgres that I don't understand that would make this too hard. Maybe it would take to long to update each log file as it's applied so it wouldn't be able to catch up. Oh yeah there's another way Proposal D - Use slony But I figured since that's been working for a long time, if slony solved their problem then they wouldn't be looking for something else. I have no need for this feature as a dump reload is not a problem for me. I've always wondered though if that was a feasible answer to this problem. Each time it crops up people propose solutions A and B but never C. On Aug 4, 2006, at 1:30 PM, Jim C. Nasby wrote: On Fri, Aug 04, 2006 at 02:12:16PM -0400, Stephen Frost wrote: * Jim C. Nasby ([EMAIL PROTECTED]) wrote: On Thu, Aug 03, 2006 at 11:20:48PM -0700, Josh Berkus wrote: * In-place upgrades (pg_upgrade) BTW, I may get Sun to contribute an engineer for this; will get you posted. How would such a thing handle changes to page formats? Couldn't this be done by converting a table/partial-table at a time? It wouldn't be something which could run while the system is live, but it'd probably take less time than dump/restore and wouldn't require double the disk space of the whole database... no? True, but if you're going to go about creating code that can deal with 2 different versions of on-disk data, why not go one better: put that code into the database itself, so that pages are converted on-the-fly as they're dirtied. That way you have *no* downtime (or almost no, anyway). -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster ---(end of broadcast)--- TIP 6: explain analyze is your friend