Re: [HACKERS] Page-level version upgrade (was: Block-level CRC checks)

2009-12-02 Thread Bruce Momjian
David Fetter wrote:
  Right.  There were two basic approaches to handling a patch that
  would expand when upgraded to the new version --- either allow the
  system to write the old format, or have a pre-upgrade script that
  moved tuples so there was guaranteed enough free space in every page
  for the new format.  I think we agreed that the later was better
  than the former, and it was easy because we don't have any need for
  that at this time.  Plus the script would not rewrite every page,
  just certain pages that required it.
 
 Please forgive me for barging in here, but that approach simply is
 untenable if it requires that the database be down while those pages
 are being found, marked, moved around, etc.
 
 The data volumes that really concern people who need an in-place
 upgrade are such that even 
 
 dd if=$PGDATA of=/dev/null bs=8192 # (or whatever the optimal block size 
 would be)
 
 would require *much* more time than such people would accept as a down
 time window, and while that's a lower bound, it's not a reasonable
 lower bound on the time.

Well, you can say it is unacceptable, but if there are no other options
then that is all we can offer.  My main point is that we should consider
writing old format pages only when we have no choice (page size might
expand), and even then, we might decide to have a pre-migration script
because the code impact of writing the old format would be too great. 
This is all hypothetical until we have a real use-case.

 If this re-jiggering could kick off in the background at start and
 work on a running PostgreSQL, the whole objection goes away.
 
 A problem that arises for any in-place upgrade system we do is that if
 someone's at 99% storage capacity, we can pretty well guarantee some
 kind of catastrophic failure.  Could we create some way to get an
 estimate of space needed, given that the system needs to stay up while
 that's happening?

Yea, the database would expand and hopefully have full transaction
semantics.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page-level version upgrade (was: Block-level CRC checks)

2009-12-02 Thread Robert Haas
On Tue, Dec 1, 2009 at 11:45 PM, Bruce Momjian br...@momjian.us wrote:
 Robert Haas wrote:
  The key issue, as I think Heikki identified at the time, is to figure
  out how you're eventually going to get rid of the old pages. ?He
  proposed running a pre-upgrade utility on each page to reserve the
  right amount of free space.
 
  http://archives.postgresql.org/pgsql-hackers/2008-11/msg00208.php
 
  Right. ?There were two basic approaches to handling a patch that would
  expand when upgraded to the new version --- either allow the system to
  write the old format, or have a pre-upgrade script that moved tuples so
  there was guaranteed enough free space in every page for the new format.
  I think we agreed that the later was better than the former, and it was
  easy because we don't have any need for that at this time. ?Plus the
  script would not rewrite every page, just certain pages that required
  it.

 While I'm always willing to be proven wrong, I think it's a complete
 dead-end to believe that it's going to be easier to reserve space for
 page expansion using the upgrade-from version rather than the
 upgrade-to version.  I am firmly of the belief that the NEW pg version
 must be able to operate on an unmodified heap migrated from the OLD pg
 version.  After this set of patches was rejected, Zdenek actually

 Does it need to write the old version, and if it does, it has to carry
 around the old format structures all over the backend?  That was the
 unclear part.

I think it needs partial write support for the old version.  If the
page is not expanding, then you can probably just replace pages in
place.  But if the page is expanding, then you need to be able to move
individual tuples[1].  Since you want to be up and running while
that's happening, I think you probably need to be able to update xmax
and probably set hit bints.  But you don't need to be able to add
tuples to the old page format, and I don't think you need complete
vacuum support, since you don't plan to reuse the dead space - you'll
just recycle the whole page once the tuples are all dead.

As for carrying it around the whole backend, I'm not sure how much of
the backend really needs to know.  It would only be anything that
looks at pages, rather than, say, tuples, but I don't really know how
much code that touches.  I suppose that's one of the things we need to
figure out.

[1] Unless, of course, you use a pre-upgrade utility.  But this is
about how to make it work WITHOUT a pre-upgrade utility.

 proposed an alternate patch that would have allowed space reservation,
 and it was rejected precisely because there was no clear certainty
 that it would solve any hypothetical future problem.

 True.  It was solving a problem we didn't have, yet.

Well, that's sort of a circular argument.  If you're going to reserve
space with a pre-upgrade utility, you're going to need to put the
pre-upgrade utility into the version you want to upgrade FROM.  If we
wanted to be able to use a pre-upgrade utility to upgrade to 8.5, we
would have had to put the utility into 8.4.

The problem I'm referring to is that there is no guarantee that you
would be able predict how much space to reserve.  In a case like CRCs,
it may be as simple as 4 bytes.  But what if, say, we switch to a
different compression algorithm for inline toast?  Some pages will
contract, others will expand, but there's no saying by how much - and
therefore no fixed amount of reserved space is guaranteed to be
adequate.  It's true that we might never want to do that particular
thing, but I don't think we can say categorically that we'll NEVER
want to do anything that expands pages by an unpredictable amount.  So
it might be quite complex to figure out how much space to reserve on
any given page.  If we can find a way to make that the NEW PG
version's problem, it's still complicated, but at least it's not
complicated stuff that has to be backpatched.

Another problem with a pre-upgrade utility is - how do you verify,
when you fire up the new cluster, that the pre-upgrade utility has
done its thing?  If the new PG version requires 4 bytes of space
reserved on each page, what happens when you get halfway through
upgrading your 1TB database and find a page with only 2 bytes
available?  There aren't a lot of good options.  The old PG version
could try to mark the DB in some way to indicate whether it
successfully completed, but what if there's a bug and something was
missed?  Then you have this scenario:

1. Run the pre-upgrade script.
2. pg_migrator.
3. Fire up new version.
4. Discover that pre-upgrade script forgot to reserve enough space on some page.
5. Report a bug.
6. Bug fixed, new version of pre-upgrade script is now available.
7. ???

If all the logic is in the new server, you may still be in hot water
when you discover that it can't deal with a particular case.  But
hopefully the problem would be confined to that page, or that
relation, and you could use the rest of your database.  And 

Re: [HACKERS] Page-level version upgrade (was: Block-level CRC checks)

2009-12-02 Thread Simon Riggs
On Wed, 2009-12-02 at 10:48 -0500, Robert Haas wrote:
 Well, that's sort of a circular argument.  If you're going to reserve
 space with a pre-upgrade utility, you're going to need to put the
 pre-upgrade utility into the version you want to upgrade FROM.  If we
 wanted to be able to use a pre-upgrade utility to upgrade to 8.5, we
 would have had to put the utility into 8.4.

Don't see any need to reserve space at all.

If this is really needed, we first run a script to prepare the 8.4
database for conversion to 8.5. The script would move things around if
it finds a block that would have difficulty after upgrade. We may be
able to do that simple, using fillfactor, or it may need to be more
complex. Either way, its still easy to do this when required. 

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page-level version upgrade (was: Block-level CRC checks)

2009-12-02 Thread Robert Haas
On Wed, Dec 2, 2009 at 11:08 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On Wed, 2009-12-02 at 10:48 -0500, Robert Haas wrote:
 Well, that's sort of a circular argument.  If you're going to reserve
 space with a pre-upgrade utility, you're going to need to put the
 pre-upgrade utility into the version you want to upgrade FROM.  If we
 wanted to be able to use a pre-upgrade utility to upgrade to 8.5, we
 would have had to put the utility into 8.4.

 Don't see any need to reserve space at all.

 If this is really needed, we first run a script to prepare the 8.4
 database for conversion to 8.5. The script would move things around if
 it finds a block that would have difficulty after upgrade. We may be
 able to do that simple, using fillfactor, or it may need to be more
 complex. Either way, its still easy to do this when required.

I discussed the problems with this, as I see them, in the same email
you just quoted.  You don't have to agree with my analysis, of course.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page-level version upgrade (was: Block-level CRC checks)

2009-12-02 Thread Greg Smith

Robert Haas wrote:

The problem I'm referring to is that there is no guarantee that you
would be able predict how much space to reserve.  In a case like CRCs,
it may be as simple as 4 bytes.  But what if, say, we switch to a
different compression algorithm for inline toast?
Upthread, you made a perfectly sensible suggestion:  use the CRC 
addition as a test case to confirm you can build something useful that 
allowed slightly more complicated in-place upgrades than are supported 
now.  This requires some new code to do tuple shuffling, communicate 
reserved space, etc.  All things that seem quite sensible to have 
available, useful steps toward a more comprehensive solution, and an 
achievable goal you wouldn't even have to argue about.


Now, you're wandering us back down the path where we have to solve a 
migrate TOAST changes level problem in order to make progress.  
Starting with presuming you have to solve the hardest possible issue 
around is the documented path to failure here.  We've seen multiple such 
solutions before, and they all had trade-offs deemed unacceptable:  
either a performance loss for everyone (not just people upgrading), or 
unbearable code complexity.  There's every reason to believe your 
reinvention of the same techniques will suffer the same fate.


When someone has such a change to be made, maybe you could bring this 
back up again and gain some traction.  One of the big lessons I took 
from the 8.4 development's lack of progress on this class of problem:  
no work to make upgrades easier will get accepted unless there is such 
an upgrade on the table that requires it.  You need a test case to make 
sure the upgrade approach a) works as expected, and b) is code you must 
commit now or in-place upgrade is lost.  Anything else will be deferred; 
I don't think there's any interest in solving a speculative future 
problem left at this point, given that it will be code we can't even 
prove will work.



Another problem with a pre-upgrade utility is - how do you verify,
when you fire up the new cluster, that the pre-upgrade utility has
done its thing?
Some additional catalog support was suggested to mark what the 
pre-upgrade utility had processed.   I'm sure I could find the messages 
about again if I had to.



If all the logic is in the new server, you may still be in hot water
when you discover that it can't deal with a particular case.
If you can't design a pre-upgrade script without showstopper bugs, what 
makes you think the much more complicated code in the new server (which 
will be carrying around an ugly mess of old and new engine parts) will 
work as advertised?  I think we'll be lucky to get the simplest possible 
scheme implemented, and that any of these more complicated ones will die 
under their own weight of their complexity.


Also, your logic seems to presume that no backports are possible to the 
old server.  A bug-fix to the pre-upgrade script is a completely 
reasonable and expected candidate for backporting, because it will be 
such a targeted  piece of code that adjusting it shouldn't impact 
anything else.  The same will not be even remotely true if there's a bug 
fix needed in a more complicated system that lives in a regularly 
traversed code path.  Having such a tightly targeted chunk of code makes 
pre-upgrade *more* likely to get bug-fix backports, because you won't be 
touching code executed by regular users at all.


The potential code impact of backporting fixes to the more complicated 
approaches here is another major obstacle to adopting one of them.  
That's an issue that we didn't even get to the last time, because 
showstopper issues popped up first.  That problem was looming had work 
continued down that path though.


--
Greg Smith2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com  www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page-level version upgrade (was: Block-level CRC checks)

2009-12-02 Thread Robert Haas
On Wed, Dec 2, 2009 at 1:08 PM, Greg Smith g...@2ndquadrant.com wrote:
 Robert Haas wrote:

 The problem I'm referring to is that there is no guarantee that you
 would be able predict how much space to reserve.  In a case like CRCs,
 it may be as simple as 4 bytes.  But what if, say, we switch to a
 different compression algorithm for inline toast?

 Upthread, you made a perfectly sensible suggestion:  use the CRC addition as
 a test case to confirm you can build something useful that allowed slightly
 more complicated in-place upgrades than are supported now.  This requires
 some new code to do tuple shuffling, communicate reserved space, etc.  All
 things that seem quite sensible to have available, useful steps toward a
 more comprehensive solution, and an achievable goal you wouldn't even have
 to argue about.

 Now, you're wandering us back down the path where we have to solve a
 migrate TOAST changes level problem in order to make progress.  Starting
 with presuming you have to solve the hardest possible issue around is the
 documented path to failure here.  We've seen multiple such solutions before,
 and they all had trade-offs deemed unacceptable:  either a performance loss
 for everyone (not just people upgrading), or unbearable code complexity.
  There's every reason to believe your reinvention of the same techniques
 will suffer the same fate.

Just to set the record straight, I don't intend to work on this
problem at all (unless paid, of course).  And I'm perfectly happy to
go with whatever workable solution someone else comes up with.  I'm
just offering opinions on what I see as the advantages and
disadvantages of different approaches, and anyone is working on this
is more than free to ignore them.

 Some additional catalog support was suggested to mark what the pre-upgrade
 utility had processed.   I'm sure I could find the messages about again if I
 had to.

And that's a perfectly sensible solution, except that adding a catalog
column to 8.4 at this point would force initdb, so that's a
non-starter.  I suppose we could shoehorn it into the reloptions.

 Also, your logic seems to presume that no backports are possible to the old
 server.

The problem on the table at the moment is that the proposed CRC
feature will expand every page by a uniform amount - so in this case a
fixed-space-per-page reservation utility would be completely adequate.
 Does anyone think this is a realistic thing to backport to 8.4?

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page-level version upgrade (was: Block-level CRC checks)

2009-12-02 Thread Greg Smith

Robert Haas wrote:

Some additional catalog support was suggested to mark what the pre-upgrade
utility had processed.   I'm sure I could find the messages about again if I
had to.


And that's a perfectly sensible solution, except that adding a catalog
column to 8.4 at this point would force initdb, so that's a
non-starter.  I suppose we could shoehorn it into the reloptions.
  
There's no reason the associated catalog support had to ship with the 
old version.  You can always modify the catalog after initdb, but before 
running the pre-upgrade utility.  pg_migrator might make that change for 
you.



The problem on the table at the moment is that the proposed CRC
feature will expand every page by a uniform amount - so in this case a
fixed-space-per-page reservation utility would be completely adequate.
 Does anyone think this is a realistic thing to backport to 8.4?
  
I believe the main problem here is making sure that the server doesn't 
turn around and fill pages right back up again.  The logic that needs to 
show up here has two parts:


1) Don't fill new pages completely up, save the space that will be 
needed in the new version

2) Find old pages that are filled and free some space on them

The pre-upgrade utility we've been talking about does (2), and that's 
easy to imagine implementing as an add-on module rather than a 
backport.  I don't know how (1) can be done in a way such that it's 
easily backported to 8.4. 


--
Greg Smith2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com  www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page-level version upgrade (was: Block-level CRC checks)

2009-12-02 Thread Robert Haas
On Wed, Dec 2, 2009 at 1:56 PM, Greg Smith g...@2ndquadrant.com wrote:
 Robert Haas wrote:
 Some additional catalog support was suggested to mark what the
 pre-upgrade
 utility had processed.   I'm sure I could find the messages about again
 if I
 had to.
 And that's a perfectly sensible solution, except that adding a catalog
 column to 8.4 at this point would force initdb, so that's a
 non-starter.  I suppose we could shoehorn it into the reloptions.
 There's no reason the associated catalog support had to ship with the old
 version.  You can always modify the catalog after initdb, but before running
 the pre-upgrade utility.  pg_migrator might make that change for you.

Uh, really?  I don't think that's possible at all.

 The problem on the table at the moment is that the proposed CRC
 feature will expand every page by a uniform amount - so in this case a
 fixed-space-per-page reservation utility would be completely adequate.
  Does anyone think this is a realistic thing to backport to 8.4?

 I believe the main problem here is making sure that the server doesn't turn
 around and fill pages right back up again.  The logic that needs to show up
 here has two parts:

 1) Don't fill new pages completely up, save the space that will be needed in
 the new version
 2) Find old pages that are filled and free some space on them

 The pre-upgrade utility we've been talking about does (2), and that's easy
 to imagine implementing as an add-on module rather than a backport.  I don't
 know how (1) can be done in a way such that it's easily backported to 8.4.

Me neither.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page-level version upgrade (was: Block-level CRC checks)

2009-12-02 Thread Greg Smith

Robert Haas wrote:

On Wed, Dec 2, 2009 at 1:56 PM, Greg Smith g...@2ndquadrant.com wrote:
  

There's no reason the associated catalog support had to ship with the old
version.  You can always modify the catalog after initdb, but before running
the pre-upgrade utility.  pg_migrator might make that change for you.



Uh, really?  I don't think that's possible at all.
  
Worst case just to get this bootstrapped:  you install a new table with 
the added bits.  Old version page upgrader accounts for itself there.  
pg_migrator dumps that data and then loads it into its new, correct home 
on the newer version.  There's already stuff like that being done 
anyway--dumping things from the old catalog and inserting into the new 
one--and if the origin is actually an add-on rather than an original 
catalog page it doesn't really matter.  As long as the new version can 
see the info it needs in its catalog it doesn't matter how it got to 
there; that's the one that needs to check the migration status before it 
can access things outside of the catalog.


--
Greg Smith2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com  www.2ndQuadrant.com



Re: [HACKERS] Page-level version upgrade (was: Block-level CRC checks)

2009-12-02 Thread Robert Haas
On Wed, Dec 2, 2009 at 2:27 PM, Greg Smith g...@2ndquadrant.com wrote:
 Robert Haas wrote:

 On Wed, Dec 2, 2009 at 1:56 PM, Greg Smith g...@2ndquadrant.com wrote:


 There's no reason the associated catalog support had to ship with the old
 version.  You can always modify the catalog after initdb, but before running
 the pre-upgrade utility.  pg_migrator might make that change for you.


 Uh, really?  I don't think that's possible at all.


 Worst case just to get this bootstrapped:  you install a new table with the
 added bits.  Old version page upgrader accounts for itself there.
 pg_migrator dumps that data and then loads it into its new, correct home on
 the newer version.  There's already stuff like that being done
 anyway--dumping things from the old catalog and inserting into the new
 one--and if the origin is actually an add-on rather than an original catalog
 page it doesn't really matter.  As long as the new version can see the info
 it needs in its catalog it doesn't matter how it got to there; that's the
 one that needs to check the migration status before it can access things
 outside of the catalog.

That might work.  I think that in order to get a fixed OID for the new
catalog you would need to run a backend in bootstrap mode, which might
(not sure) require shutting down the database first.  But it sounds
doable.

There remains the issue of whether it is reasonable to think about
backpatching such a thing, and whether doing so is easier/better than
dealing with page expansion in the new server.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page-level version upgrade (was: Block-level CRC checks)

2009-12-02 Thread Greg Stark
On Wed, Dec 2, 2009 at 6:34 PM, Robert Haas robertmh...@gmail.com wrote:
 Also, your logic seems to presume that no backports are possible to the old
 server.

 The problem on the table at the moment is that the proposed CRC
 feature will expand every page by a uniform amount - so in this case a
 fixed-space-per-page reservation utility would be completely adequate.
  Does anyone think this is a realistic thing to backport to 8.4?

This whole discussion is based on assumptions which do not match my
recollection of the old discussion. I would suggest people go back and
read the emails but it's clear at least some people have so it seems
people get different things out of those old emails. My recollection
of Tom and Heikki's suggestions for Zdenek were as follows:

1) When 8.9.0 comes out we also release an 8.8.x which contains a new
guc which says to prepare for an 8.9 update. If that guc is set then
any new pages are guaranteed to have enough space for 8.9.0 which
could be as simple as guaranteeing there are x bytes of free space, in
the case of the CRC it's actually *not* a uniform amount of free space
if we go with Tom's design of having a variable chunk which moves
around but it's still just a simple arithmetic to determine if there's
enough free space on the page for a new tuple so it would be simple
enough to backport.

2) When you want to prepare a database for upgrade you run the
precheck script which first of all makes sure you're running 8.8.x and
that the flag is set. Then it checks the free space on every page to
ensure it's satisfactory. If not then it can do a noop update to any
tuple on the page which the new free space calculation would guarantee
would go to a new page. Then you have to wait long enough and vacuum.

3) Then you run pg_migrator which swaps in the new catalog files.

4) Then you shut down and bring up 8.9.0 which on reading any page
*immediately* converts it to 8.9.0 format.

5) You would eventually also need some program which processes every
page and guarantees to write it back out in the new format. Otherwise
there will be pages that you never stop reconverting every time
they're read.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page-level version upgrade (was: Block-level CRC checks)

2009-12-02 Thread Tom Lane
Greg Stark gsst...@mit.edu writes:
 This whole discussion is based on assumptions which do not match my
 recollection of the old discussion. I would suggest people go back and
 read the emails but it's clear at least some people have so it seems
 people get different things out of those old emails. My recollection
 of Tom and Heikki's suggestions for Zdenek were as follows:

 1) When 8.9.0 comes out we also release an 8.8.x which contains a new
 guc which says to prepare for an 8.9 update.

Yeah, I think the critical point is not to assume that the behavior of
the old system is completely set in stone.  We can insist that you must
update to at least point release .N before beginning the migration
process.  That gives us a chance to backpatch code that makes
adjustments to the behavior of the old server, so long as the backpatch
isn't invasive enough to raise stability concerns.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page-level version upgrade (was: Block-level CRC checks)

2009-12-02 Thread Robert Haas
On Wed, Dec 2, 2009 at 3:48 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Greg Stark gsst...@mit.edu writes:
 This whole discussion is based on assumptions which do not match my
 recollection of the old discussion. I would suggest people go back and
 read the emails but it's clear at least some people have so it seems
 people get different things out of those old emails. My recollection
 of Tom and Heikki's suggestions for Zdenek were as follows:

 1) When 8.9.0 comes out we also release an 8.8.x which contains a new
 guc which says to prepare for an 8.9 update.

 Yeah, I think the critical point is not to assume that the behavior of
 the old system is completely set in stone.  We can insist that you must
 update to at least point release .N before beginning the migration
 process.  That gives us a chance to backpatch code that makes
 adjustments to the behavior of the old server, so long as the backpatch
 isn't invasive enough to raise stability concerns.

If we have consensus on that approach, I'm fine with it.  I just don't
want one of the people who wants this CRC feature to go to a lot of
trouble to develop a space reservation system that has to be
backpatched to 8.4, and then have the patch rejected as too
potentially destabilizing.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page-level version upgrade (was: Block-level CRC checks)

2009-12-01 Thread Greg Stark
On Tue, Dec 1, 2009 at 9:58 PM, decibel deci...@decibel.org wrote:
 What happened to the work that was being done to allow a page to be upgraded
 on the fly when it was read in from disk?

There were no page level changes between 8.3 and 8.4.


-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page-level version upgrade (was: Block-level CRC checks)

2009-12-01 Thread Bruce Momjian
Greg Stark wrote:
 On Tue, Dec 1, 2009 at 9:58 PM, decibel deci...@decibel.org wrote:
  What happened to the work that was being done to allow a page to be upgraded
  on the fly when it was read in from disk?
 
 There were no page level changes between 8.3 and 8.4.

Yea, we have the idea of how to do it (in cases where the page size
doesn't increase), but no need to implement it in 8.3 to 8.4.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page-level version upgrade (was: Block-level CRC checks)

2009-12-01 Thread Robert Haas
On Tue, Dec 1, 2009 at 5:15 PM, Greg Stark gsst...@mit.edu wrote:
 On Tue, Dec 1, 2009 at 9:58 PM, decibel deci...@decibel.org wrote:
 What happened to the work that was being done to allow a page to be upgraded
 on the fly when it was read in from disk?

 There were no page level changes between 8.3 and 8.4.

That's true, but I don't think it's the full and complete answer to
the question.  Zdenek submitted a page for CF 2008-11 which attempted
to add support for multiple page versions.  I guess we're on v4 right
now, and he was attempting to add support for v3 pages, which would
have allowed reading in pages from old PG versions.  To put it
bluntly, the code wasn't anything I would have wanted to deploy, but
the reason why Zdenek gave up on fixing it was because several
community members considerably senior to myself provided negative
feedback on the concept.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page-level version upgrade (was: Block-level CRC checks)

2009-12-01 Thread Bruce Momjian
Robert Haas wrote:
 On Tue, Dec 1, 2009 at 5:15 PM, Greg Stark gsst...@mit.edu wrote:
  On Tue, Dec 1, 2009 at 9:58 PM, decibel deci...@decibel.org wrote:
  What happened to the work that was being done to allow a page to be 
  upgraded
  on the fly when it was read in from disk?
 
  There were no page level changes between 8.3 and 8.4.
 
 That's true, but I don't think it's the full and complete answer to
 the question.  Zdenek submitted a page for CF 2008-11 which attempted
 to add support for multiple page versions.  I guess we're on v4 right
 now, and he was attempting to add support for v3 pages, which would
 have allowed reading in pages from old PG versions.  To put it
 bluntly, the code wasn't anything I would have wanted to deploy, but
 the reason why Zdenek gave up on fixing it was because several
 community members considerably senior to myself provided negative
 feedback on the concept.

Well, there were quite a number of open issues relating to page
conversion:

o  Do we write the old version or just convert on read?
o  How do we write pages that get larger on conversion to the
   new format?

As I rember the patch allowed read/wite of old versions, which greatly
increased its code impact.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page-level version upgrade (was: Block-level CRC checks)

2009-12-01 Thread Robert Haas
On Tue, Dec 1, 2009 at 9:31 PM, Bruce Momjian br...@momjian.us wrote:
 Robert Haas wrote:
 On Tue, Dec 1, 2009 at 5:15 PM, Greg Stark gsst...@mit.edu wrote:
  On Tue, Dec 1, 2009 at 9:58 PM, decibel deci...@decibel.org wrote:
  What happened to the work that was being done to allow a page to be 
  upgraded
  on the fly when it was read in from disk?
 
  There were no page level changes between 8.3 and 8.4.

 That's true, but I don't think it's the full and complete answer to
 the question.  Zdenek submitted a page for CF 2008-11 which attempted
 to add support for multiple page versions.  I guess we're on v4 right
 now, and he was attempting to add support for v3 pages, which would
 have allowed reading in pages from old PG versions.  To put it
 bluntly, the code wasn't anything I would have wanted to deploy, but
 the reason why Zdenek gave up on fixing it was because several
 community members considerably senior to myself provided negative
 feedback on the concept.

 Well, there were quite a number of open issues relating to page
 conversion:

        o  Do we write the old version or just convert on read?
        o  How do we write pages that get larger on conversion to the
           new format?

 As I rember the patch allowed read/wite of old versions, which greatly
 increased its code impact.

Oh, for sure there were plenty of issues with the patch, starting with
the fact that the way it was set up led to unacceptable performance
and code complexity trade-offs.  Some of my comments from the time:

http://archives.postgresql.org/pgsql-hackers/2008-11/msg00149.php
http://archives.postgresql.org/pgsql-hackers/2008-11/msg00152.php

But the point is that the concept, I think, is basically the right
one: you have to be able to read and make sense of the contents of old
page versions.  There is room, at least in my book, for debate about
which operations we should support on old pages.  Totally read only?
Set hit bits?  Kill old tuples?  Add new tuples?

The key issue, as I think Heikki identified at the time, is to figure
out how you're eventually going to get rid of the old pages.  He
proposed running a pre-upgrade utility on each page to reserve the
right amount of free space.

http://archives.postgresql.org/pgsql-hackers/2008-11/msg00208.php

I don't like that solution.  If the pre-upgrade utility is something
that has to be run while the database is off-line, then it defeats the
point of an in-place upgrade.  If it can be run while the database is
up, I fear it will need to be deeply integrated into the server.  And
since we can't know the requirements for how much space to reserve
(and it needn't be a constant) until we design the new feature, this
will likely mean backpatching a rather large chunk of complex code,
which to put it mildly, is not the sort of thing we normally would
even consider.  I think a better approach is to support reading tuples
from old pages, but to write all new tuples into new pages.  A
full-table rewrite (like UPDATE foo SET x = x, CLUSTER, etc.) can be
used to propel everything to the new version, with the usual tricks
for people who need to rewrite the table a piece at a time.  But, this
is not religion for me.  I'm fine with some other design; I just can't
presently see how to make it work.

I think the present discussion of CRC checks is an excellent test-case
for any and all ideas about how to solve this problem.  If someone can
get a patch committed than can convert the 8.4 page format to an 8.5
format with the hint bits shuffled around a (hopefully optional) CRC
added, I think that'll become the de facto standard for how to handle
page format upgrades.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page-level version upgrade (was: Block-level CRC checks)

2009-12-01 Thread Bruce Momjian
Robert Haas wrote:
  Well, there were quite a number of open issues relating to page
  conversion:
 
  ? ? ? ?o ?Do we write the old version or just convert on read?
  ? ? ? ?o ?How do we write pages that get larger on conversion to the
  ? ? ? ? ? new format?
 
  As I rember the patch allowed read/wite of old versions, which greatly
  increased its code impact.
 
 Oh, for sure there were plenty of issues with the patch, starting with
 the fact that the way it was set up led to unacceptable performance
 and code complexity trade-offs.  Some of my comments from the time:
 
 http://archives.postgresql.org/pgsql-hackers/2008-11/msg00149.php
 http://archives.postgresql.org/pgsql-hackers/2008-11/msg00152.php
 
 But the point is that the concept, I think, is basically the right
 one: you have to be able to read and make sense of the contents of old
 page versions.  There is room, at least in my book, for debate about
 which operations we should support on old pages.  Totally read only?
 Set hit bits?  Kill old tuples?  Add new tuples?

I think part of the problem is there was no agreement before the patch
was coded and submitted, and there didn't seem to be much desire from
the patch author to adjust it, nor demand from the community because we
didn't need it yet.

 The key issue, as I think Heikki identified at the time, is to figure
 out how you're eventually going to get rid of the old pages.  He
 proposed running a pre-upgrade utility on each page to reserve the
 right amount of free space.
 
 http://archives.postgresql.org/pgsql-hackers/2008-11/msg00208.php

Right.  There were two basic approaches to handling a patch that would
expand when upgraded to the new version --- either allow the system to
write the old format, or have a pre-upgrade script that moved tuples so
there was guaranteed enough free space in every page for the new format.
I think we agreed that the later was better than the former, and it was
easy because we don't have any need for that at this time.  Plus the
script would not rewrite every page, just certain pages that required
it.

 I don't like that solution.  If the pre-upgrade utility is something
 that has to be run while the database is off-line, then it defeats the
 point of an in-place upgrade.  If it can be run while the database is
 up, I fear it will need to be deeply integrated into the server.  And
 since we can't know the requirements for how much space to reserve
 (and it needn't be a constant) until we design the new feature, this
 will likely mean backpatching a rather large chunk of complex code,
 which to put it mildly, is not the sort of thing we normally would
 even consider.  I think a better approach is to support reading tuples
 from old pages, but to write all new tuples into new pages.  A
 full-table rewrite (like UPDATE foo SET x = x, CLUSTER, etc.) can be
 used to propel everything to the new version, with the usual tricks
 for people who need to rewrite the table a piece at a time.  But, this
 is not religion for me.  I'm fine with some other design; I just can't
 presently see how to make it work.

Well, perhaps the text I wrote above will clarify that the upgrade
script is only for page expansion --- it is not to rewrite every page
into the new format.

 I think the present discussion of CRC checks is an excellent test-case
 for any and all ideas about how to solve this problem.  If someone can
 get a patch committed than can convert the 8.4 page format to an 8.5
 format with the hint bits shuffled around a (hopefully optional) CRC
 added, I think that'll become the de facto standard for how to handle
 page format upgrades.

Well, yea, the idea would be that the 8.5 server would either convert
the page to the new format on read (assuming there is enough free space,
perhaps requiring a pre-upgrade script), or have the server write the
page in the old 8.4 format and not do CRC checks on the page.  My guess
is the former.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page-level version upgrade (was: Block-level CRC checks)

2009-12-01 Thread Greg Smith

Robert Haas wrote:

If the pre-upgrade utility is something
that has to be run while the database is off-line, then it defeats the
point of an in-place upgrade.  If it can be run while the database is
up, I fear it will need to be deeply integrated into the server.  And
since we can't know the requirements for how much space to reserve
(and it needn't be a constant) until we design the new feature, this
will likely mean backpatching a rather large chunk of complex code,
which to put it mildly, is not the sort of thing we normally would
even consider.
You're wandering into the sort of overdesign that isn't really needed 
yet.  For now, presume it's a constant amount of overhead, and that the 
release notes for the new version will say configure the pre-upgrade 
utility and tell it you need x bytes of space reserved.  That's 
sufficient for the CRC case, right?  Needs a few more bytes per page, 
8.5 release notes could say exactly how much.  Solve that before making 
things more complicated by presuming you need to solve the variable-size 
increase problem, too.  We'll be lucky to get the absolute simplest 
approach committed, you really need to have a big smoking gun to justify 
feature creep in this area.


(If I had to shoot from the hip and design for the variable case, why 
not just make the thing that determines how much space a given page 
needs reserved a function the user can re-install with a smarter version?)

I think a better approach is to support reading tuples
from old pages, but to write all new tuples into new pages.  A
full-table rewrite (like UPDATE foo SET x = x, CLUSTER, etc.) can be
used to propel everything to the new version, with the usual tricks
for people who need to rewrite the table a piece at a time.
I think you're oversimplifying the operational difficulty of the usual 
tricks.  This is a painful approach for the exact people who need this 
the most:  people with a live multi-TB installation they can't really 
afford to add too much load to.  The beauty of the in-place upgrade tool 
just converting pages as it scans through looking for them is that you 
can dial up its intensity to exactly how much overhead you can stand, 
and let it loose until it's done.


--
Greg Smith2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com  www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page-level version upgrade (was: Block-level CRC checks)

2009-12-01 Thread Robert Haas
On Tue, Dec 1, 2009 at 10:34 PM, Bruce Momjian br...@momjian.us wrote:
 Robert Haas wrote:
  Well, there were quite a number of open issues relating to page
  conversion:
 
  ? ? ? ?o ?Do we write the old version or just convert on read?
  ? ? ? ?o ?How do we write pages that get larger on conversion to the
  ? ? ? ? ? new format?
 
  As I rember the patch allowed read/wite of old versions, which greatly
  increased its code impact.

 Oh, for sure there were plenty of issues with the patch, starting with
 the fact that the way it was set up led to unacceptable performance
 and code complexity trade-offs.  Some of my comments from the time:

 http://archives.postgresql.org/pgsql-hackers/2008-11/msg00149.php
 http://archives.postgresql.org/pgsql-hackers/2008-11/msg00152.php

 But the point is that the concept, I think, is basically the right
 one: you have to be able to read and make sense of the contents of old
 page versions.  There is room, at least in my book, for debate about
 which operations we should support on old pages.  Totally read only?
 Set hit bits?  Kill old tuples?  Add new tuples?

 I think part of the problem is there was no agreement before the patch
 was coded and submitted, and there didn't seem to be much desire from
 the patch author to adjust it, nor demand from the community because we
 didn't need it yet.

Could be.  It's water under the bridge at this point.

 The key issue, as I think Heikki identified at the time, is to figure
 out how you're eventually going to get rid of the old pages.  He
 proposed running a pre-upgrade utility on each page to reserve the
 right amount of free space.

 http://archives.postgresql.org/pgsql-hackers/2008-11/msg00208.php

 Right.  There were two basic approaches to handling a patch that would
 expand when upgraded to the new version --- either allow the system to
 write the old format, or have a pre-upgrade script that moved tuples so
 there was guaranteed enough free space in every page for the new format.
 I think we agreed that the later was better than the former, and it was
 easy because we don't have any need for that at this time.  Plus the
 script would not rewrite every page, just certain pages that required
 it.

While I'm always willing to be proven wrong, I think it's a complete
dead-end to believe that it's going to be easier to reserve space for
page expansion using the upgrade-from version rather than the
upgrade-to version.  I am firmly of the belief that the NEW pg version
must be able to operate on an unmodified heap migrated from the OLD pg
version.  After this set of patches was rejected, Zdenek actually
proposed an alternate patch that would have allowed space reservation,
and it was rejected precisely because there was no clear certainty
that it would solve any hypothetical future problem.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page-level version upgrade (was: Block-level CRC checks)

2009-12-01 Thread Robert Haas
On Tue, Dec 1, 2009 at 10:45 PM, Greg Smith g...@2ndquadrant.com wrote:
 Robert Haas wrote:

 If the pre-upgrade utility is something
 that has to be run while the database is off-line, then it defeats the
 point of an in-place upgrade.  If it can be run while the database is
 up, I fear it will need to be deeply integrated into the server.  And
 since we can't know the requirements for how much space to reserve
 (and it needn't be a constant) until we design the new feature, this
 will likely mean backpatching a rather large chunk of complex code,
 which to put it mildly, is not the sort of thing we normally would
 even consider.

 You're wandering into the sort of overdesign that isn't really needed yet.
  For now, presume it's a constant amount of overhead, and that the release
 notes for the new version will say configure the pre-upgrade utility and
 tell it you need x bytes of space reserved.  That's sufficient for the
 CRC case, right?  Needs a few more bytes per page, 8.5 release notes could
 say exactly how much.  Solve that before making things more complicated by
 presuming you need to solve the variable-size increase problem, too.  We'll
 be lucky to get the absolute simplest approach committed, you really need to
 have a big smoking gun to justify feature creep in this area.

Well, I think the best way to solve the problem is to design the
system in a way that makes it unnecessary to have a pre-upgrade tool
at all, by making the new PG version capable of handling page
expansion where needed.  I don't understand how putting that
functionality into the OLD PG version can be better.  But I may be
misunderstanding something.

 (If I had to shoot from the hip and design for the variable case, why not
 just make the thing that determines how much space a given page needs
 reserved a function the user can re-install with a smarter version?)

That's a pretty good idea.   I have no love of this pre-upgrade
concept, but if we're going to do it that way, then allowing someone
to load in a function to compute the required amount of free space to
reserve is a good thought.

 I think a better approach is to support reading tuples
 from old pages, but to write all new tuples into new pages.  A
 full-table rewrite (like UPDATE foo SET x = x, CLUSTER, etc.) can be
 used to propel everything to the new version, with the usual tricks
 for people who need to rewrite the table a piece at a time.

 I think you're oversimplifying the operational difficulty of the usual
 tricks.  This is a painful approach for the exact people who need this the
 most:  people with a live multi-TB installation they can't really afford to
 add too much load to.  The beauty of the in-place upgrade tool just
 converting pages as it scans through looking for them is that you can dial
 up its intensity to exactly how much overhead you can stand, and let it
 loose until it's done.

Fair enough.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page-level version upgrade (was: Block-level CRC checks)

2009-12-01 Thread Bruce Momjian
Robert Haas wrote:
  The key issue, as I think Heikki identified at the time, is to figure
  out how you're eventually going to get rid of the old pages. ?He
  proposed running a pre-upgrade utility on each page to reserve the
  right amount of free space.
 
  http://archives.postgresql.org/pgsql-hackers/2008-11/msg00208.php
 
  Right. ?There were two basic approaches to handling a patch that would
  expand when upgraded to the new version --- either allow the system to
  write the old format, or have a pre-upgrade script that moved tuples so
  there was guaranteed enough free space in every page for the new format.
  I think we agreed that the later was better than the former, and it was
  easy because we don't have any need for that at this time. ?Plus the
  script would not rewrite every page, just certain pages that required
  it.
 
 While I'm always willing to be proven wrong, I think it's a complete
 dead-end to believe that it's going to be easier to reserve space for
 page expansion using the upgrade-from version rather than the
 upgrade-to version.  I am firmly of the belief that the NEW pg version
 must be able to operate on an unmodified heap migrated from the OLD pg
 version.  After this set of patches was rejected, Zdenek actually

Does it need to write the old version, and if it does, it has to carry
around the old format structures all over the backend?  That was the
unclear part.

 proposed an alternate patch that would have allowed space reservation,
 and it was rejected precisely because there was no clear certainty
 that it would solve any hypothetical future problem.

True.  It was solving a problem we didn't have, yet.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page-level version upgrade (was: Block-level CRC checks)

2009-12-01 Thread David Fetter
On Tue, Dec 01, 2009 at 10:34:11PM -0500, Bruce Momjian wrote:
 Robert Haas wrote:
  The key issue, as I think Heikki identified at the time, is to
  figure out how you're eventually going to get rid of the old
  pages.  He proposed running a pre-upgrade utility on each page to
  reserve the right amount of free space.
  
  http://archives.postgresql.org/pgsql-hackers/2008-11/msg00208.php
 
 Right.  There were two basic approaches to handling a patch that
 would expand when upgraded to the new version --- either allow the
 system to write the old format, or have a pre-upgrade script that
 moved tuples so there was guaranteed enough free space in every page
 for the new format.  I think we agreed that the later was better
 than the former, and it was easy because we don't have any need for
 that at this time.  Plus the script would not rewrite every page,
 just certain pages that required it.

Please forgive me for barging in here, but that approach simply is
untenable if it requires that the database be down while those pages
are being found, marked, moved around, etc.

The data volumes that really concern people who need an in-place
upgrade are such that even 

dd if=$PGDATA of=/dev/null bs=8192 # (or whatever the optimal block size 
would be)

would require *much* more time than such people would accept as a down
time window, and while that's a lower bound, it's not a reasonable
lower bound on the time.

If this re-jiggering could kick off in the background at start and
work on a running PostgreSQL, the whole objection goes away.

A problem that arises for any in-place upgrade system we do is that if
someone's at 99% storage capacity, we can pretty well guarantee some
kind of catastrophic failure.  Could we create some way to get an
estimate of space needed, given that the system needs to stay up while
that's happening?

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers