Re: [HACKERS] [WIP] In-place upgrade

2008-11-27 Thread Zdenek Kotala
Robert Haas napsal(a): 1. htup and bufpage API clean up 2. HeapTuple version extension + code cleanup 3. In-place online upgrade 4. Extending pg_class info + more flexible TOAST chunk size big thanks for your review. I think #1 is still partially valid, because it contains general cleanups, but

Re: [HACKERS] [WIP] In-place upgrade

2008-11-26 Thread Zdenek Kotala
Robert, big thanks for your review. I think #1 is still partially valid, because it contains general cleanups, but part of it is not necessary now. #2, #3 and #4 you can move to return with feedback section. Thanks Zdenek Robert Haas napsal(a): Zdenek - I am a bit murky on

Re: [HACKERS] [WIP] In-place upgrade

2008-11-26 Thread Robert Haas
1. htup and bufpage API clean up 2. HeapTuple version extension + code cleanup 3. In-place online upgrade 4. Extending pg_class info + more flexible TOAST chunk size big thanks for your review. I think #1 is still partially valid, because it contains general cleanups, but part of it is not

Re: [HACKERS] [WIP] In-place upgrade

2008-11-26 Thread Alvaro Herrera
Robert Haas escribió: With respect to #4, I know that Alvaro submitted a draft patch, but I'm not clear on whether that needs to be reviewed, because: - I'm not sure whether it's close enough to being finished for a review to be a good use of time. - I'm not sure how much you and Heikki

Re: [HACKERS] [WIP] In-place upgrade

2008-11-26 Thread Zdenek Kotala
Alvaro Herrera napsal(a): Robert Haas escribió: With respect to #4, I know that Alvaro submitted a draft patch, but I'm not clear on whether that needs to be reviewed, because: - I'm not sure whether it's close enough to being finished for a review to be a good use of time. - I'm not sure how

Re: [HACKERS] [WIP] In-place upgrade

2008-11-25 Thread Robert Haas
Zdenek - I am a bit murky on where we stand with upgrade-in-place in terms of reviewing. Initially, you had submitted four patches for this commitfest: 1. htup and bufpage API clean up 2. HeapTuple version extension + code cleanup 3. In-place online upgrade 4. Extending pg_class info + more

Re: [HACKERS] [WIP] In-place upgrade

2008-11-10 Thread Matthew T. O'Connor
Tom Lane wrote: Decibel! [EMAIL PROTECTED] writes: I think that's pretty seriously un-desirable. It's not at all uncommon for databases to stick around for a very long time and then jump ahead many versions. I don't think we want to tell people they can't do that. Of course they

Re: [HACKERS] [WIP] In-place upgrade

2008-11-10 Thread Joshua D. Drake
On Mon, 2008-11-10 at 09:14 -0500, Matthew T. O'Connor wrote: Tom Lane wrote: Decibel! [EMAIL PROTECTED] writes: I think that's pretty seriously un-desirable. It's not at all uncommon for databases to stick around for a very long time and then jump ahead many versions. I don't

Re: [HACKERS] [WIP] In-place upgrade

2008-11-10 Thread Zdenek Kotala
Decibel! napsal(a): Unless I'm mistaken, there are only two cases we care about for additional space: per-page and per-tuple. Yes. And maybe special space indexes could be extended, but it is covered in per-page setting. Those requirements could also vary for different types of pg_class

Re: [HACKERS] [WIP] In-place upgrade

2008-11-10 Thread Jeff
On Nov 9, 2008, at 11:09 PM, Joshua D. Drake wrote: I think it's time for people to stop asking for the moon and realize that if we don't constrain this feature pretty darn tightly, we will have *nothing at all* for 8.4. Again. Gotta go with Tom on this one. The idea that we would somehow

Re: [HACKERS] [WIP] In-place upgrade

2008-11-09 Thread Decibel!
On Nov 6, 2008, at 1:31 PM, Bruce Momjian wrote: 3. What about multi-release upgrades? Say someone wants to upgrade from 8.3 to 8.6. 8.6 only knows how to read pages that are 8.5-and-a-half or better, 8.5 only knows how to read pages that are 8.4-and-a-half or better, and 8.4 only knows how to

Re: [HACKERS] [WIP] In-place upgrade

2008-11-09 Thread Tom Lane
Decibel! [EMAIL PROTECTED] writes: I think that's pretty seriously un-desirable. It's not at all uncommon for databases to stick around for a very long time and then jump ahead many versions. I don't think we want to tell people they can't do that. Of course they can do that --- they

Re: [HACKERS] [WIP] In-place upgrade

2008-11-09 Thread Joshua D. Drake
On Sun, 2008-11-09 at 20:02 -0500, Tom Lane wrote: Decibel! [EMAIL PROTECTED] writes: I think that's pretty seriously un-desirable. It's not at all uncommon for databases to stick around for a very long time and then jump ahead many versions. I don't think we want to tell people they

Re: [HACKERS] [WIP] In-place upgrade

2008-11-07 Thread Zdenek Kotala
Heikki Linnakangas napsal(a): Tom Lane wrote: I think we can have a notion of pre-upgrade maintenance, but it would have to be integrated into normal operations. For instance, if conversion to 8.4 requires extra free space, we'd make late releases of 8.3.x not only be able to force that to

Re: [HACKERS] [WIP] In-place upgrade

2008-11-07 Thread Zdenek Kotala
Tom Lane napsal(a): Heikki Linnakangas [EMAIL PROTECTED] writes: Adding catalog columns seems rather complicated, and not back-patchable. Agreed, we'd not be able to make them retroactively appear in 8.3. I imagined that you would have just a single cluster-wide variable, a GUC perhaps,

Re: [HACKERS] [WIP] In-place upgrade

2008-11-07 Thread Zdenek Kotala
Tom Lane napsal(a): I think we can have a notion of pre-upgrade maintenance, but it would have to be integrated into normal operations. For instance, if conversion to 8.4 requires extra free space, we'd make late releases of 8.3.x not only be able to force that to occur, but also tweak the

Re: [HACKERS] [WIP] In-place upgrade

2008-11-07 Thread Tom Lane
Zdenek Kotala [EMAIL PROTECTED] writes: Tom Lane napsal(a): * Add a format serial number column to pg_class, and probably also pg_database. Rather like the frozenxid columns, this would have the semantics that all pages in a relation or database are known to have at least the specified

Re: [HACKERS] [WIP] In-place upgrade

2008-11-06 Thread Tom Lane
Robert Haas [EMAIL PROTECTED] writes: To spell this out in more detail: Suppose page 123 is a V3 page containing 6 tuples A, B, C, D, E, and F. We examine the page and determine that if we convert this to a V4 page, only five tuples will fit. So we need to get rid of one of the tuples. We

Re: [HACKERS] [WIP] In-place upgrade

2008-11-06 Thread Zdenek Kotala
Tom Lane napsal(a): Robert Haas [EMAIL PROTECTED] writes: To spell this out in more detail: Suppose page 123 is a V3 page containing 6 tuples A, B, C, D, E, and F. We examine the page and determine that if we convert this to a V4 page, only five tuples will fit. So we need to get rid of

Re: [HACKERS] [WIP] In-place upgrade

2008-11-06 Thread Robert Haas
That's all fine and dandy, except that it presumes that you can perform SELECT/UPDATE/DELETE on V3 tuple versions; you can't just pretend that A-E aren't there until they get converted. Which is exactly the overhead we were looking to avoid. I don't understand this comment at all. Unless

Re: [HACKERS] [WIP] In-place upgrade

2008-11-06 Thread Bruce Momjian
Tom Lane wrote: Bruce Momjian [EMAIL PROTECTED] writes: I envision a similar system where we have utilities to guarantee all pages have enough free space, and all pages are the current version, before allowing an upgrade-in-place to the next version. Such a consistent API will make the

Re: [HACKERS] [WIP] In-place upgrade

2008-11-06 Thread Robert Haas
An external utility doesn't seem like the right way to approach it. For example, given the need to ensure X amount of free space in each page, the only way to guarantee that would be to shut down the database while you run the utility over all the pages --- otherwise somebody might fill some

Re: [HACKERS] [WIP] In-place upgrade

2008-11-06 Thread Bruce Momjian
Robert Haas wrote: The second point could probably be addressed with a GUC but the first one certainly can't. 3. What about multi-release upgrades? Say someone wants to upgrade from 8.3 to 8.6. 8.6 only knows how to read pages that are 8.5-and-a-half or better, 8.5 only knows how to read

Re: [HACKERS] [WIP] In-place upgrade

2008-11-06 Thread Heikki Linnakangas
Tom Lane wrote: I think we can have a notion of pre-upgrade maintenance, but it would have to be integrated into normal operations. For instance, if conversion to 8.4 requires extra free space, we'd make late releases of 8.3.x not only be able to force that to occur, but also tweak the normal

Re: [HACKERS] [WIP] In-place upgrade

2008-11-06 Thread Bruce Momjian
Robert Haas wrote: That's all fine and dandy, except that it presumes that you can perform SELECT/UPDATE/DELETE on V3 tuple versions; you can't just pretend that A-E aren't there until they get converted. Which is exactly the overhead we were looking to avoid. I don't understand this

Re: [HACKERS] [WIP] In-place upgrade

2008-11-06 Thread Robert Haas
And almost guarantee that the job will never be completed, or tested fully. Remember that in-place upgrades would be pretty painless so doing multiple major upgrades should not be a difficult requiremnt, or they can dump/reload their data to skip it. Regardless of what design is chosen,

Re: [HACKERS] [WIP] In-place upgrade

2008-11-06 Thread Tom Lane
Bruce Momjian [EMAIL PROTECTED] writes: I envision a similar system where we have utilities to guarantee all pages have enough free space, and all pages are the current version, before allowing an upgrade-in-place to the next version. Such a consistent API will make the job for users easier

Re: [HACKERS] [WIP] In-place upgrade

2008-11-06 Thread Tom Lane
Robert Haas [EMAIL PROTECTED] writes: That means, in essence, that the earliest possible version that could be in-place upgraded would be an 8.4 system - we are giving up completely on in-place upgrade to 8.4 from any earlier version (which personally I thought was the whole point of this

Re: [HACKERS] [WIP] In-place upgrade

2008-11-06 Thread Tom Lane
Heikki Linnakangas [EMAIL PROTECTED] writes: Adding catalog columns seems rather complicated, and not back-patchable. Agreed, we'd not be able to make them retroactively appear in 8.3. I imagined that you would have just a single cluster-wide variable, a GUC perhaps, indicating how much

Re: [HACKERS] [WIP] In-place upgrade

2008-11-06 Thread Greg Smith
On Thu, 6 Nov 2008, Tom Lane wrote: Another thought here is that I don't think we are yet committed to any changes that require extra space between 8.3 and 8.4, are we? The proposed addition of CRC words could be put off to 8.5, for instance. I was just staring at that code as you wrote this

Re: [HACKERS] [WIP] In-place upgrade

2008-11-06 Thread Robert Haas
The idea that you're going to get in-place upgrade all the way back to 8.2 without taking the database down for a even little bit to run such a utility is hard to pull off, and it's impressive that Zdenek and everyone else involved has gotten so close to doing it. I think we should at least

Re: [HACKERS] [WIP] In-place upgrade

2008-11-06 Thread Tom Lane
Greg Smith [EMAIL PROTECTED] writes: On Thu, 6 Nov 2008, Tom Lane wrote: Another thought here is that I don't think we are yet committed to any changes that require extra space between 8.3 and 8.4, are we? The proposed addition of CRC words could be put off to 8.5, for instance. I was just

Re: [HACKERS] [WIP] In-place upgrade

2008-11-06 Thread Greg Smith
On Thu, 6 Nov 2008, Tom Lane wrote: -Is it worth considering making CRCs an optional compile-time feature, and that (for now at least) you couldn't get them and the in-place upgrade at the same time? Hmm ... might be better than not offering them in 8.4 at all, but the thing is that then you

Re: [HACKERS] [WIP] In-place upgrade

2008-11-05 Thread Robert Haas
An old page which never goes away. New page formats are introduced for a reason -- to support new features. An old page lying around indefinitely means some pages can't support those new features. Just as an example, DBAs may be surprised to find out that large swathes of their database are

Re: [HACKERS] [WIP] In-place upgrade

2008-11-05 Thread Zdenek Kotala
Tom Lane napsal(a): I concur that I don't want to see this patch adding more than the absolute unavoidable minimum of overhead for data that meets the current layout definition. I'm disturbed by the proposal to stick overhead into tuple header access, for example. OK. I agree that it is

Re: [HACKERS] [WIP] In-place upgrade

2008-11-05 Thread Greg Stark
I don't think this really qualifies as in place upgrade since it would mean creating a whole second copy of all your data. And it's only online got read-only queries too. I think we need a way to upgrade the pages in place and deal with any overflow data as exceptional cases or else

Re: [HACKERS] [WIP] In-place upgrade

2008-11-05 Thread Zdenek Kotala
Greg Stark napsal(a): I don't think this really qualifies as in place upgrade since it would mean creating a whole second copy of all your data. And it's only online got read-only queries too. I think we need a way to upgrade the pages in place and deal with any overflow data as exceptional

Re: [HACKERS] [WIP] In-place upgrade

2008-11-05 Thread Zdenek Kotala
Heikki Linnakangas napsal(a): Zdenek Kotala wrote: We've talked about this many times before, so I'm sure you know what my opinion is. Let me phrase it one more time: 1. You *will* need a function to convert a page from old format to new format. We do want to get rid of the old format

Re: [HACKERS] [WIP] In-place upgrade

2008-11-05 Thread Martijn van Oosterhout
On Wed, Nov 05, 2008 at 03:04:42PM +0100, Zdenek Kotala wrote: Greg Stark napsal(a): It is exceptional case between V3 and V4 and only on heap, because you save in varlena. But between V4 and V5 we will lost another 4 bytes in a page header - page header will be 28 bytes long but tuple size

Re: [HACKERS] [WIP] In-place upgrade

2008-11-05 Thread Zdenek Kotala
Martijn van Oosterhout napsal(a): On Wed, Nov 05, 2008 at 03:04:42PM +0100, Zdenek Kotala wrote: Greg Stark napsal(a): It is exceptional case between V3 and V4 and only on heap, because you save in varlena. But between V4 and V5 we will lost another 4 bytes in a page header - page header will

Re: [HACKERS] [WIP] In-place upgrade

2008-11-05 Thread Tom Lane
Zdenek Kotala [EMAIL PROTECTED] writes: Martijn van Oosterhout napsal(a): Is this really such a big deal? You do the null-update on the last tuple of the page and then you do have enough room. So Phase one moves a few tuples to make room. Phase 2 actually converts the pages inplace. Problem

Re: [HACKERS] [WIP] In-place upgrade

2008-11-05 Thread Robert Haas
Problem is how to move tuple from page to another and keep indexes in sync. One solution is to perform some think like update operation on the tuple. But you need exclusive lock on the page and pin counter have to be zero. And question is where it is safe operation. But doesn't this problem

Re: [HACKERS] [WIP] In-place upgrade

2008-11-05 Thread Gregory Stark
Robert Haas [EMAIL PROTECTED] writes: Problem is how to move tuple from page to another and keep indexes in sync. One solution is to perform some think like update operation on the tuple. But you need exclusive lock on the page and pin counter have to be zero. And question is where it is safe

Re: [HACKERS] [WIP] In-place upgrade

2008-11-05 Thread Martijn van Oosterhout
On Wed, Nov 05, 2008 at 09:41:52PM +, Gregory Stark wrote: Robert Haas [EMAIL PROTECTED] writes: Problem is how to move tuple from page to another and keep indexes in sync. One solution is to perform some think like update operation on the tuple. But you need exclusive lock on the

Re: [HACKERS] [WIP] In-place upgrade

2008-11-05 Thread Gregory Stark
Martijn van Oosterhout [EMAIL PROTECTED] writes: On Wed, Nov 05, 2008 at 09:41:52PM +, Gregory Stark wrote: Robert Haas [EMAIL PROTECTED] writes: Problem is how to move tuple from page to another and keep indexes in sync. One solution is to perform some think like update operation

Re: [HACKERS] [WIP] In-place upgrade

2008-11-05 Thread Robert Haas
Problem is how to move tuple from page to another and keep indexes in sync. One solution is to perform some think like update operation on the tuple. But you need exclusive lock on the page and pin counter have to be zero. And question is where it is safe operation. But

Re: [HACKERS] [WIP] In-place upgrade

2008-11-04 Thread Zdenek Kotala
Robert Haas napsal(a): Really, what I'd ideally like to see here is a system where the V3 code is in essence error-recovery code. Everything should be V4-only unless you detect a V3 page, and then you error out (if in-place upgrade is not enabled) or jump to the appropriate V3-aware code (if

Re: [HACKERS] [WIP] In-place upgrade

2008-11-04 Thread Robert Haas
OK. It was original idea to make Convert on read which has several problems with no easy solution. One is that new data does not fit on the page and second big problem is how to convert TOAST table data. Another problem which is general is how to convert indexes... Convert on read has

Re: [HACKERS] [WIP] In-place upgrade

2008-11-04 Thread Zdenek Kotala
Robert Haas napsal(a): OK. It was original idea to make Convert on read which has several problems with no easy solution. One is that new data does not fit on the page and second big problem is how to convert TOAST table data. Another problem which is general is how to convert indexes...

Re: [HACKERS] [WIP] In-place upgrade

2008-11-04 Thread Robert Haas
I see. But Vacuum and other internals function access heap pages directly without ExecStoreTuple. Right. I don't think there's any getting around the fact that any function which accesses heap pages directly is going to need modification. The key is to make those modifications as non-invasive

Re: [HACKERS] [WIP] In-place upgrade

2008-11-04 Thread Heikki Linnakangas
Zdenek Kotala wrote: Robert Haas napsal(a): Really, what I'd ideally like to see here is a system where the V3 code is in essence error-recovery code. Everything should be V4-only unless you detect a V3 page, and then you error out (if in-place upgrade is not enabled) or jump to the

Re: [HACKERS] [WIP] In-place upgrade

2008-11-04 Thread Robert Haas
We've talked about this many times before, so I'm sure you know what my opinion is. Let me phrase it one more time: 1. You *will* need a function to convert a page from old format to new format. We do want to get rid of the old format pages eventually, whether it's during VACUUM, whenever a

Re: [HACKERS] [WIP] In-place upgrade

2008-11-04 Thread Robert Haas
That's sane *if* you can guarantee that only negligible overhead is added for accessing data that is in the up-to-date format. I don't think that will be the case if we start putting version checks into every tuple access macro. Yes, the point is that you'll read the page as V3 or V4,

Re: [HACKERS] [WIP] In-place upgrade

2008-11-04 Thread Tom Lane
Robert Haas [EMAIL PROTECTED] writes: Well, I just proposed an approach that doesn't work this way, so I guess I'll have to put myself in the disagree category, or anyway yet to be convinced. As long as you can move individual tuples onto new pages, you can eventually empty V3 pages and

Re: [HACKERS] [WIP] In-place upgrade

2008-11-04 Thread Robert Haas
Well, I just proposed an approach that doesn't work this way, so I guess I'll have to put myself in the disagree category, or anyway yet to be convinced. As long as you can move individual tuples onto new pages, you can eventually empty V3 pages and reinitialize them as new, empty V4 pages.

Re: [HACKERS] [WIP] In-place upgrade

2008-11-04 Thread Gregory Stark
Robert Haas [EMAIL PROTECTED] writes: We've talked about this many times before, so I'm sure you know what my opinion is. Let me phrase it one more time: 1. You *will* need a function to convert a page from old format to new format. We do want to get rid of the old format pages eventually,

Re: [HACKERS] [WIP] In-place upgrade

2008-11-04 Thread Gregory Stark
Robert Haas [EMAIL PROTECTED] writes: Well, I just proposed an approach that doesn't work this way, so I guess I'll have to put myself in the disagree category, or anyway yet to be convinced. As long as you can move individual tuples onto new pages, you can eventually empty V3 pages and

Re: [HACKERS] [WIP] In-place upgrade

2008-11-04 Thread Robert Haas
Maybe. The difference is that I'm talking about converting tuples, not pages, so What happens when the data doesn't fit on the new page? is a meaningless question. No it's not, because as you pointed out you still need a way for the user to force it to happen sometime. Unless you're going

Re: [HACKERS] [WIP] In-place upgrade

2008-11-04 Thread Gregory Stark
Robert Haas [EMAIL PROTECTED] writes: Maybe. The difference is that I'm talking about converting tuples, not pages, so What happens when the data doesn't fit on the new page? is a meaningless question. No it's not, because as you pointed out you still need a way for the user to force it to

Re: [HACKERS] [WIP] In-place upgrade

2008-11-04 Thread Robert Haas
No, that's not what I'm suggesting. My thought was that any V3 page would be treated as if it were completely full, with the exception of a completely empty page which can be reinitialized as a V4 page. So you would never add any tuples to a V3 page, but you would need to update xmax, hint

Re: [HACKERS] [WIP] In-place upgrade

2008-11-04 Thread Gregory Stark
Robert Haas [EMAIL PROTECTED] writes: No, that's not what I'm suggesting. My thought was that any V3 page would be treated as if it were completely full, with the exception of a completely empty page which can be reinitialized as a V4 page. So you would never add any tuples to a V3 page,

Re: [HACKERS] [WIP] In-place upgrade

2008-11-04 Thread Joshua D. Drake
Gregory Stark wrote: Robert Haas [EMAIL PROTECTED] writes: An old page which never goes away. New page formats are introduced for a reason -- to support new features. An old page lying around indefinitely means some pages can't support those new features. Just as an example, DBAs may be

Re: [HACKERS] [WIP] In-place upgrade

2008-11-04 Thread Gregory Stark
Joshua D. Drake [EMAIL PROTECTED] writes: Gregory Stark wrote: Robert Haas [EMAIL PROTECTED] writes: An old page which never goes away. New page formats are introduced for a reason -- to support new features. An old page lying around indefinitely means some pages can't support those new

Re: [HACKERS] [WIP] In-place upgrade

2008-11-04 Thread Joshua D. Drake
Gregory Stark wrote: Joshua D. Drake [EMAIL PROTECTED] writes: Gregory Stark wrote: Robert Haas [EMAIL PROTECTED] writes: An old page which never goes away. New page formats are introduced for a reason -- to support new features. An old page lying around indefinitely means some pages can't

Re: [HACKERS] [WIP] In-place upgrade

2008-11-03 Thread Zdenek Kotala
Big thanks for review. Robert Haas napsal(a): I tried to apply this patch to CVS HEAD and it blew up all over the place. It doesn't seem to be intended to apply against CVS HEAD; for example, I don't have backend/access/heap/htup.c at all, so can't apply changes to that file. You need to

Re: [HACKERS] [WIP] In-place upgrade

2008-11-03 Thread Robert Haas
You need to apply also two other patches: which are located here: http://wiki.postgresql.org/wiki/CommitFestInProgress#Upgrade-in-place_and_related_issues I moved one related patch from another category here to correct place. Just to confirm, which two?

Re: [HACKERS] [WIP] In-place upgrade

2008-11-03 Thread Tom Lane
Robert Haas [EMAIL PROTECTED] writes: Really, what I'd ideally like to see here is a system where the V3 code is in essence error-recovery code. Everything should be V4-only unless you detect a V3 page, and then you error out (if in-place upgrade is not enabled) or jump to the appropriate

Re: [HACKERS] [WIP] In-place upgrade

2008-11-03 Thread Robert Haas
We already do check the page version on read-in --- see PageHeaderIsValid. Right, but the only place this is called is in ReadBuffer_common, which doesn't seem like a suitable place to deal with the possibility of a V3 page since you don't yet know what you plan to do with it. I'm not quite sure

Re: [HACKERS] [WIP] In-place upgrade

2008-11-02 Thread Robert Haas
I tried to apply this patch to CVS HEAD and it blew up all over the place. It doesn't seem to be intended to apply against CVS HEAD; for example, I don't have backend/access/heap/htup.c at all, so can't apply changes to that file. I was able to clone the GIT repository with the following