On Fri, Dec 4, 2009 at 10:47 PM, Chuck McDevitt cmcdev...@greenplum.com wrote:
A curiosity question regarding torn pages: How does this work on file
systems that don't write in-place, but instead always do copy-on-write?
My example would be Sun's ZFS file system (In Solaris BSD). Because
It can save space because the line pointers have less alignment
requirements. But I don't see any point in the current state.
--
Greg
On 2009-12-04, at 3:48 PM, Tom Lane t...@sss.pgh.pa.us wrote:
Greg Stark gsst...@mit.edu writes:
I'm not sure why I said including ctid. We would have to
On Dec 3, 2009, at 1:53 PM, Jonah H. Harris wrote:
On Tue, Dec 1, 2009 at 1:27 PM, Joshua D. Drake
j...@commandprompt.com wrote:
On Tue, 2009-12-01 at 13:20 -0500, Robert Haas wrote:
Does $COMPETITOR offer this feature?
My understanding is that MSSQL does. I am not sure about Oracle. Those
On Fri, 2009-12-04 at 03:32 -0600, decibel wrote:
So... now that the upgrade discussion seems to have died down... was
any consensus reached on how to do said checksumming?
Possibly. Please can you go through the discussion and pull out a
balanced summary of how to proceed? I lost track a
Kevin,
md5sum of each tuple? As an optional system column (a la oid)?
I am mainly an application programmer working with PostgreSQL. And I
want to point out an additional usefullness of an md5sum of each
tuple: it makes comparing table-contents in replicated / related
databases MUCH more
On Fri, Dec 4, 2009 at 9:34 AM, Simon Riggs si...@2ndquadrant.com wrote:
Possibly. Please can you go through the discussion and pull out a
balanced summary of how to proceed? I lost track a while back and I'm
sure many others did also.
I summarized the three feasible plans I think I saw;
decibel wrote:
On Dec 3, 2009, at 1:53 PM, Jonah H. Harris wrote:
On Tue, Dec 1, 2009 at 1:27 PM, Joshua D. Drake
j...@commandprompt.com wrote:
On Tue, 2009-12-01 at 13:20 -0500, Robert Haas wrote:
Does $COMPETITOR offer this feature?
My understanding is that MSSQL does. I am
On Fri, 2009-12-04 at 07:12 -0500, Bruce Momjian wrote:
I think the hint bit has to be added to the item pointer, by using the
offset bits that are already zero, according to Greg Stark. That
solution leads to easy programming, no expanding hint bit array, and it
is backward compatible so
BTW with VACUUM FULL removed I assume we're going to get rid of
HEAP_MOVED_IN and HEAP_MOVED_OFF too, right?
--
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
--
Sent via pgsql-hackers mailing
Simon Riggs wrote:
On Fri, 2009-12-04 at 07:12 -0500, Bruce Momjian wrote:
I think the hint bit has to be added to the item pointer, by using the
offset bits that are already zero, according to Greg Stark. That
solution leads to easy programming, no expanding hint bit array, and it
is
On Fri, 2009-12-04 at 07:54 -0500, Bruce Momjian wrote:
I should also point out that removing 4 bits from the tuple header would
allow us to get rid of t_infomask2, reducing tuple length by a further 2
bytes.
Wow, that is a nice win. Does alignment allow us to actually use that
space?
On Fri, 2009-12-04 at 09:52 -0300, Alvaro Herrera wrote:
BTW with VACUUM FULL removed I assume we're going to get rid of
HEAP_MOVED_IN and HEAP_MOVED_OFF too, right?
Much as I would like to see those go, no. VF code should remain for some
time yet, IMHO. We could remove it, but doing so is not
Simon Riggs wrote:
On Fri, 2009-12-04 at 09:52 -0300, Alvaro Herrera wrote:
BTW with VACUUM FULL removed I assume we're going to get rid of
HEAP_MOVED_IN and HEAP_MOVED_OFF too, right?
Much as I would like to see those go, no. VF code should remain for some
time yet, IMHO.
I don't think
On Fri, Dec 4, 2009 at 12:57 PM, Simon Riggs si...@2ndquadrant.com wrote:
On Fri, 2009-12-04 at 07:54 -0500, Bruce Momjian wrote:
I should also point out that removing 4 bits from the tuple header would
allow us to get rid of t_infomask2, reducing tuple length by a further 2
bytes.
Wow,
On Fri, Dec 4, 2009 at 1:35 PM, Greg Stark gsst...@mit.edu wrote:
If we lose vacuum full then the table's open for reducing the width of
command id too if we need more bits. If we do that and we moved
everything we could to the line pointers including ctid we might just
be able to squeeze the
Greg Stark escribió:
On Fri, Dec 4, 2009 at 1:35 PM, Greg Stark gsst...@mit.edu wrote:
If we lose vacuum full then the table's open for reducing the width of
command id too if we need more bits. If we do that and we moved
everything we could to the line pointers including ctid we might
Heikki Linnakangas escribió:
Simon Riggs wrote:
On Fri, 2009-12-04 at 09:52 -0300, Alvaro Herrera wrote:
BTW with VACUUM FULL removed I assume we're going to get rid of
HEAP_MOVED_IN and HEAP_MOVED_OFF too, right?
Much as I would like to see those go, no. VF code should remain for
On Fri, Dec 4, 2009 at 9:48 AM, Alvaro Herrera
alvhe...@commandprompt.com wrote:
Heikki Linnakangas escribió:
Simon Riggs wrote:
On Fri, 2009-12-04 at 09:52 -0300, Alvaro Herrera wrote:
BTW with VACUUM FULL removed I assume we're going to get rid of
HEAP_MOVED_IN and HEAP_MOVED_OFF too,
Simon Riggs si...@2ndquadrant.com writes:
As I pointed out here
http://archives.postgresql.org/pgsql-hackers/2009-12/msg00056.php
we only need to use 3 bits not 4, but it does limit tuple length to 4096
for all block sizes. (Two different options there for doing that).
Limiting the tuple
Greg Stark gsst...@mit.edu writes:
I'm not sure why I said including ctid. We would have to move
everything transactional to the line pointer, including xmin, xmax,
ctid, all the hint bits, the updated flags, hot flags, etc. The only
things left in the tuple header would be things that have to
On Fri, 2009-12-04 at 10:43 -0500, Tom Lane wrote:
Simon Riggs si...@2ndquadrant.com writes:
As I pointed out here
http://archives.postgresql.org/pgsql-hackers/2009-12/msg00056.php
we only need to use 3 bits not 4, but it does limit tuple length to 4096
for all block sizes. (Two different
Robert Haas robertmh...@gmail.com writes:
Have we thought about what other things have changed between 8.4 and
8.5 that might cause problems for in-place upgrade?
So far, nothing. We even made Andrew Gierth jump through hoops to
keep hstore's on-disk representation upwards compatible.
On Fri, 2009-12-04 at 13:35 +, Greg Stark wrote:
I don't think getting rid of infomask2 wins us 2 bytes so fast. The
rest of those two bytes is natts which of course we still need.
err, yes, OK.
--
Simon Riggs www.2ndQuadrant.com
--
Sent via pgsql-hackers mailing list
Robert Haas wrote:
On Fri, Dec 4, 2009 at 9:48 AM, Alvaro Herrera
alvhe...@commandprompt.com wrote:
Heikki Linnakangas escribi?:
Simon Riggs wrote:
On Fri, 2009-12-04 at 09:52 -0300, Alvaro Herrera wrote:
BTW with VACUUM FULL removed I assume we're going to get rid of
On Fri, Dec 4, 2009 at 2:04 PM, Bruce Momjian br...@momjian.us wrote:
Robert Haas wrote:
On Fri, Dec 4, 2009 at 9:48 AM, Alvaro Herrera
alvhe...@commandprompt.com wrote:
Heikki Linnakangas escribi?:
Simon Riggs wrote:
On Fri, 2009-12-04 at 09:52 -0300, Alvaro Herrera wrote:
BTW
Massa, Harald Armin wrote:
I am in the process of adding a user-space myhash column to all my
applications tables, filled by a trigger on insert / update. It really
speeds up table comparison across databases; and it is very helpfull
in debugging replications.
Have you seen pg_comparator?
Robert Haas wrote:
Well, I am not sure how we would turn the _space_ used for CRC on and
off because you would have to rewrite the entire table/database to turn
it on, which seems unfortunate.
Well, presumably you're going to have to do some of that work anyway,
because even if the space
A curiosity question regarding torn pages: How does this work on file systems
that don't write in-place, but instead always do copy-on-write?
My example would be Sun's ZFS file system (In Solaris BSD). Because of its
snapshot rollback functionality, it never writes a page in-place, but
On Fri, 2009-12-04 at 14:47 -0800, Chuck McDevitt wrote:
A curiosity question regarding torn pages: How does this work on file
systems that don't write in-place, but instead always do
copy-on-write?
My example would be Sun's ZFS file system (In Solaris BSD). Because
of its snapshot
I am in the process of adding a user-space myhash column to all my
applications tables, filled by a trigger on insert / update. It really
speeds up table comparison across databases; and it is very helpfull
in debugging replications.
Have you seen pg_comparator?
yes, saw the lightning talk
On Tue, Dec 1, 2009 at 1:27 PM, Joshua D. Drake j...@commandprompt.comwrote:
On Tue, 2009-12-01 at 13:20 -0500, Robert Haas wrote:
Does $COMPETITOR offer this feature?
My understanding is that MSSQL does. I am not sure about Oracle. Those
are the only two I run into (I don't run into
On tis, 2009-12-01 at 19:41 +, Greg Stark wrote:
Also, it would
require reading back each page as it's written to disk, which is OK
for
a bunch of single-row writes, but for bulk data loads a significant
problem.
Not sure what that really means for Postgres. It would just mean
On tis, 2009-12-01 at 17:47 -0500, Tom Lane wrote:
Bruce Momjian br...@momjian.us writes:
I also like the idea that we don't need to CRC check the line pointers
because any corruption there is going to appear immediately. However,
the bad news is that we wouldn't find the corruption until
Simon Riggs wrote:
There is no creation of corruption events. This scheme detects
corruption events that *have* occurred. Now I understand that we
previously would have recovered seamlessly from such events, but they
were corruption events nonetheless and I think they need to be reported.
On Tue, 2009-12-01 at 10:04 +0200, Heikki Linnakangas wrote:
Simon Riggs wrote:
There is no creation of corruption events. This scheme detects
corruption events that *have* occurred. Now I understand that we
previously would have recovered seamlessly from such events, but they
were
Simon Riggs wrote:
The way we handle torn page corruptions *hides* actual corruptions from
us. The frequency of true positives and false positives is important
here. If the false positive ratio is very small, then reporting them is
not a problem because of the benefit we get from having
On Tue, 2009-12-01 at 06:35 -0500, Bruce Momjian wrote:
Simon Riggs wrote:
The way we handle torn page corruptions *hides* actual corruptions from
us. The frequency of true positives and false positives is important
here. If the false positive ratio is very small, then reporting them is
Simon Riggs wrote:
I think
the problem is that the existing proposal can't distinguish between
these two cases so the user has no idea how to respond to the report.
If 99.5% of cases are real corruption then there is little need to
distinguish between the cases, nor much value in doing
On Mon, Nov 30, 2009 at 3:27 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
Simon Riggs wrote:
Proposal
* We reserve enough space on a disk block for a CRC check. When a dirty
block is written to disk we calculate and annotate the CRC value, though
this is *not* WAL
Robert Haas wrote:
On Mon, Nov 30, 2009 at 3:27 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
Simon Riggs wrote:
Proposal
* We reserve enough space on a disk block for a CRC check. When a dirty
block is written to disk we calculate and annotate the CRC value, though
On Tue, 2009-12-01 at 07:05 -0500, Bruce Momjian wrote:
I assume torn pages are 99% of the reported problem, which are
expected and are fixed, and bad hardware 1%, quite the opposite of your
numbers above.
On what basis do you make that assumption?
--
Simon Riggs
Simon Riggs wrote:
On Tue, 2009-12-01 at 07:05 -0500, Bruce Momjian wrote:
I assume torn pages are 99% of the reported problem, which are
expected and are fixed, and bad hardware 1%, quite the opposite of your
numbers above.
On what basis do you make that assumption?
Because we added
bruce wrote:
What might be interesting is to report CRC mismatches if the database
was shut down cleanly previously; I think in those cases we shouldn't
have torn pages.
Sorry, stupid idea on my part. We don't WAL log hit bit changes so
there is no guarantee the page is in WAL on recovery.
On Tue, 2009-12-01 at 07:58 -0500, Bruce Momjian wrote:
bruce wrote:
What might be interesting is to report CRC mismatches if the database
was shut down cleanly previously; I think in those cases we shouldn't
have torn pages.
Sorry, stupid idea on my part. We don't WAL log hit bit
On Tue, 2009-12-01 at 07:42 -0500, Bruce Momjian wrote:
Simon Riggs wrote:
On Tue, 2009-12-01 at 07:05 -0500, Bruce Momjian wrote:
I assume torn pages are 99% of the reported problem, which are
expected and are fixed, and bad hardware 1%, quite the opposite of your
numbers above.
Bruce Momjian wrote:
What might be interesting is to report CRC mismatches if the database
was shut down cleanly previously; I think in those cases we shouldn't
have torn pages.
Unfortunately that's not true. You can crash, leading to a torn page,
and then start up the database and shut it
On Mon, Nov 30, 2009 at 9:27 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
Simon Riggs wrote:
Proposal
* We reserve enough space on a disk block for a CRC check. When a dirty
block is written to disk we calculate and annotate the CRC value, though
this is *not* WAL
On Tuesday 01 December 2009 14:38:26 marcin mank wrote:
On Mon, Nov 30, 2009 at 9:27 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
Simon Riggs wrote:
Proposal
* We reserve enough space on a disk block for a CRC check. When a dirty
block is written to disk we
* Andres Freund and...@anarazel.de [091201 08:42]:
On Tuesday 01 December 2009 14:38:26 marcin mank wrote:
On Mon, Nov 30, 2009 at 9:27 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
Simon Riggs wrote:
Proposal
* We reserve enough space on a disk block for a
On Tue, Dec 1, 2009 at 8:30 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
Bruce Momjian wrote:
What might be interesting is to report CRC mismatches if the database
was shut down cleanly previously; I think in those cases we shouldn't
have torn pages.
Unfortunately
On Tuesday 01 December 2009 15:26:21 Aidan Van Dyk wrote:
* Andres Freund and...@anarazel.de [091201 08:42]:
On Tuesday 01 December 2009 14:38:26 marcin mank wrote:
On Mon, Nov 30, 2009 at 9:27 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
Simon Riggs wrote:
Robert Haas wrote:
On Tue, Dec 1, 2009 at 8:30 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
Bruce Momjian wrote:
What might be interesting is to report CRC mismatches if the database
was shut down cleanly previously; I think in those cases we shouldn't
have torn pages.
On Tue, 2009-12-01 at 15:30 +0200, Heikki Linnakangas wrote:
Bruce Momjian wrote:
What might be interesting is to report CRC mismatches if the database
was shut down cleanly previously; I think in those cases we shouldn't
have torn pages.
Unfortunately that's not true. You can crash,
On Tue, Dec 1, 2009 at 9:40 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
Even if rescanning every page in the cluster was feasible from a
performance point-of-view, it would make the CRC checking a lot less
useful. It's not hard to imagine that when a hardware glitch
* Simon Riggs:
* Put all hint bits in the block header to allow them to be excluded
more easily from CRC checking. If we used 3 more bits from
ItemIdData.lp_len (limiting tuple length to 4096) then we could store
some hints in the item pointer. HEAP_XMIN_INVALID can be stored as
LP_DEAD,
Florian Weimer fwei...@bfk.de writes:
What about putting the whole visibility information out-of-line, into
its own B-tree, indexed by page number?
Hint bits need to be *cheap* to examine. Otherwise there's little
point in having them at all.
regards, tom lane
--
On Tue, 2009-12-01 at 16:40 +0200, Heikki Linnakangas wrote:
It's not hard to imagine that when a hardware glitch happens
causing corruption, it also causes the system to crash. Recalculating
the CRCs after crash would mask the corruption.
They are already masked from us, so continuing to
Simon Riggs si...@2ndquadrant.com writes:
On Tue, 2009-12-01 at 16:40 +0200, Heikki Linnakangas wrote:
It's not hard to imagine that when a hardware glitch happens
causing corruption, it also causes the system to crash. Recalculating
the CRCs after crash would mask the corruption.
They are
On Tue, Dec 1, 2009 at 10:35 AM, Simon Riggs si...@2ndquadrant.com wrote:
On Tue, 2009-12-01 at 16:40 +0200, Heikki Linnakangas wrote:
It's not hard to imagine that when a hardware glitch happens
causing corruption, it also causes the system to crash. Recalculating
the CRCs after crash would
On Tue, 2009-12-01 at 07:05 -0500, Bruce Momjian wrote:
All useful detection mechanisms have non-zero false positives because we
would rather sometimes ring the bell for no reason than to let bad
things through silently, as we do now.
OK, but what happens if someone gets the failure
On Tue, 2009-12-01 at 10:55 -0500, Tom Lane wrote:
Simon Riggs si...@2ndquadrant.com writes:
On Tue, 2009-12-01 at 16:40 +0200, Heikki Linnakangas wrote:
It's not hard to imagine that when a hardware glitch happens
causing corruption, it also causes the system to crash. Recalculating
the
On Tue, 2009-12-01 at 10:55 -0500, Tom Lane wrote:
Simon Riggs si...@2ndquadrant.com writes:
On Tue, 2009-12-01 at 16:40 +0200, Heikki Linnakangas wrote:
It's not hard to imagine that when a hardware glitch happens
causing corruption, it also causes the system to crash. Recalculating
the
Simon Riggs wrote:
Also, we might
* Put all hint bits in the block header to allow them to be excluded
more easily from CRC checking. If we used 3 more bits from
ItemIdData.lp_len (limiting tuple length to 4096) then we could store
some hints in the item pointer. HEAP_XMIN_INVALID can be
On Tue, Dec 1, 2009 at 1:02 PM, Joshua D. Drake j...@commandprompt.com wrote:
The hard core reality is this. *IF* it is one of the goals of this
project to insure that the software can be safely, effectively, and
responsibly operated in a manner that is acceptable to C* level people
in a
On Tue, 2009-12-01 at 13:05 -0500, Bruce Momjian wrote:
Simon Riggs wrote:
Also, we might
* Put all hint bits in the block header to allow them to be excluded
more easily from CRC checking. If we used 3 more bits from
ItemIdData.lp_len (limiting tuple length to 4096) then we could
On Tue, 2009-12-01 at 13:20 -0500, Robert Haas wrote:
On Tue, Dec 1, 2009 at 1:02 PM, Joshua D. Drake j...@commandprompt.com
wrote:
The hard core reality is this. *IF* it is one of the goals of this
project to insure that the software can be safely, effectively, and
responsibly operated
Bruce Momjian br...@momjian.us writes:
OK, here is another idea, maybe crazy:
When we read in a page that has an invalid CRC, we check the page to see
which hint bits are _not_ set, and we try setting them to see if can get
a matching CRC. If there no missing hint bits and the CRC doesn't
Simon Riggs si...@2ndquadrant.com writes:
On Tue, 2009-12-01 at 13:05 -0500, Bruce Momjian wrote:
When we read in a page that has an invalid CRC, we check the page to see
which hint bits are _not_ set, and we try setting them to see if can get
a matching CRC.
Perhaps we could store a
* Tom Lane t...@sss.pgh.pa.us [091201 13:58]:
Actually, the killer problem with *any* scheme involving guessing
is that each bit you guess translates directly to removing one bit
of confidence from the CRC value. If you try to guess at as many
as 32 bits, it is practically guaranteed that
On Tue, Dec 1, 2009 at 6:41 PM, Tom Lane t...@sss.pgh.pa.us wrote:
Bruce Momjian br...@momjian.us writes:
OK, here is another idea, maybe crazy:
When we read in a page that has an invalid CRC, we check the page to see
which hint bits are _not_ set, and we try setting them to see if can get
a
Greg Stark gsst...@mit.edu writes:
Another thought is that would could use the MSSQL-style torn page
detection of including a counter (or even a bit?) in every 512-byte
chunk which gets incremented every time the page is written.
I think we can dismiss that idea, or any idea involving a
All,
I feel strongly that we should be verifying pages on write, or at least
providing the option to do so, because hardware is simply not reliable.
And a lot of our biggest users are having issues; it seems pretty much
guarenteed that if you have more than 20 postgres servers, at least one
of
Josh Berkus j...@agliodbs.com wrote:
And a lot of our biggest users are having issues; it seems pretty
much guarenteed that if you have more than 20 postgres servers, at
least one of them will have bad memory, bad RAID and/or a bad
driver.
Huh?!? We have about 200 clusters running on
On Tue, Dec 1, 2009 at 7:19 PM, Josh Berkus j...@agliodbs.com wrote:
However, that solution would not detect subtle corruption, like
single-bit-flipping issues caused by quantum errors.
Well there is a solution for this, ECC RAM. There's *no* software
solution for it. The corruption can just as
On Tue, Dec 1, 2009 at 2:06 PM, Aidan Van Dyk ai...@highrise.ca wrote:
Well, *I* think if we're ever going to have really reliable in-place
upgrades that we can expect to function release after release, we're
going to need to be able to read in old version pages, and convert
them to current
On Tue, Dec 1, 2009 at 7:51 PM, Robert Haas robertmh...@gmail.com wrote:
On Tue, Dec 1, 2009 at 2:06 PM, Aidan Van Dyk ai...@highrise.ca wrote:
Well, *I* think if we're ever going to have really reliable in-place
upgrades that we can expect to function release after release, we're
going to
On Tue, Dec 1, 2009 at 8:04 PM, Greg Stark gsst...@mit.edu wrote:
And there was no point writing it for previously releases because there
was **no** pg_migrator anyways.
oops
--
greg
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
On Tue, Dec 1, 2009 at 3:04 PM, Greg Stark gsst...@mit.edu wrote:
I find that hard to understand. I believe the consensus is that an
on-demand page-level migration statregy like Aidan described is
precisely the plan when it's necessary to handle page format changes.
There were no page format
Robert Haas robertmh...@gmail.com writes:
OK, fair enough. My implication that only page formats were at issue
was off-base. My underlying point was that I think we have to be
prepared to write code that can understand old binary formats (on the
tuple, page, or relation level) if we want
Tom Lane wrote:
Robert Haas robertmh...@gmail.com writes:
OK, fair enough. My implication that only page formats were at issue
was off-base. My underlying point was that I think we have to be
prepared to write code that can understand old binary formats (on the
tuple, page, or relation
Andrew Dunstan wrote:
Tom Lane wrote:
Robert Haas robertmh...@gmail.com writes:
OK, fair enough. My implication that only page formats were at issue
was off-base. My underlying point was that I think we have to be
prepared to write code that can understand old binary formats
Tom Lane wrote:
Bruce Momjian br...@momjian.us writes:
OK, here is another idea, maybe crazy:
When we read in a page that has an invalid CRC, we check the page to see
which hint bits are _not_ set, and we try setting them to see if can get
a matching CRC. If there no missing hint bits
Bruce Momjian br...@momjian.us writes:
OK, crazy idea #3. What if we had a per-page counter of the number of
hint bits set --- that way, we would only consider a CRC check failure
to be corruption if the count matched the hint bit count on the page.
Seems like rather a large hole in the
Bruce Momjian wrote:
Tom Lane wrote:
The suggestions that were made upthread about moving the hint bits
could resolve the second objection, but once you do that you might
as well just exclude them from the CRC and eliminate the guessing.
OK, crazy idea #3. What if we had a per-page
On Dec 1, 2009, at 12:58 PM, Tom Lane wrote:
The bottom line here seems to be that the only practical way to do
anything like this is to move the hint bits into their own area of
the page, and then exclude them from the CRC. Are we prepared to
once again blow off any hope of in-place update for
Tom Lane wrote:
Bruce Momjian br...@momjian.us writes:
OK, crazy idea #3. What if we had a per-page counter of the number of
hint bits set --- that way, we would only consider a CRC check failure
to be corruption if the count matched the hint bit count on the page.
Seems like rather a
On Tue, Dec 1, 2009 at 9:57 PM, Richard Huxton d...@archonet.com wrote:
Why are we writing out the hint bits to disk anyway? Is it really so
slow to calculate them on read + cache them that it's worth all this
trouble? Are they not also to blame for the write my import data twice
feature?
It
On Dec 1, 2009, at 1:39 PM, Kevin Grittner wrote:
Josh Berkus j...@agliodbs.com wrote:
And a lot of our biggest users are having issues; it seems pretty
much guarenteed that if you have more than 20 postgres servers, at
least one of them will have bad memory, bad RAID and/or a bad
driver.
Greg Stark wrote:
On Tue, Dec 1, 2009 at 6:41 PM, Tom Lane t...@sss.pgh.pa.us wrote:
Bruce Momjian br...@momjian.us writes:
OK, here is another idea, maybe crazy:
When we read in a page that has an invalid CRC, we check the page to see
which hint bits are _not_ set, and we try setting
Greg Stark wrote:
On Tue, Dec 1, 2009 at 9:57 PM, Richard Huxton d...@archonet.com wrote:
Why are we writing out the hint bits to disk anyway? Is it really so
slow to calculate them on read + cache them that it's worth all this
trouble? Are they not also to blame for the write my import data
Bruce Momjian br...@momjian.us writes:
Greg Stark wrote:
It should be relatively cheap to skip the hint bits in the line
pointers since they'll be the same bits of every 16-bit value for a
whole range. Alternatively we could just CRC the tuples and assume a
corrupted line pointer will show
Tom Lane wrote:
Bruce Momjian br...@momjian.us writes:
Greg Stark wrote:
It should be relatively cheap to skip the hint bits in the line
pointers since they'll be the same bits of every 16-bit value for a
whole range. Alternatively we could just CRC the tuples and assume a
corrupted
Richard Huxton d...@archonet.com writes:
So what is the cost of calculating the hint-bits for a whole block of
tuples in one go vs reading that block from actual spinning disk?
Potentially a couple of hundred times worse, if you're unlucky and each
XID on the page requires visiting a different
Bruce Momjian br...@momjian.us writes:
Tom Lane wrote:
I don't think relatively cheap is the right criterion here --- the
question to me is how many assumptions are you making in order to
compute the page's CRC. Each assumption degrades the reliability
of the check, not to mention creating
On Tue, Dec 1, 2009 at 10:47 PM, Tom Lane t...@sss.pgh.pa.us wrote:
Bruce Momjian br...@momjian.us writes:
Greg Stark wrote:
It should be relatively cheap to skip the hint bits in the line
pointers since they'll be the same bits of every 16-bit value for a
whole range. Alternatively we could
On Dec 1, 2009, at 4:13 PM, Greg Stark wrote:
On Tue, Dec 1, 2009 at 9:57 PM, Richard Huxton d...@archonet.com
wrote:
Why are we writing out the hint bits to disk anyway? Is it really so
slow to calculate them on read + cache them that it's worth all this
trouble? Are they not also to blame
Greg Stark gsst...@mit.edu writes:
On Tue, Dec 1, 2009 at 10:47 PM, Tom Lane t...@sss.pgh.pa.us wrote:
I don't think relatively cheap is the right criterion here --- the
question to me is how many assumptions are you making in order to
compute the page's CRC. Each assumption degrades the
On Wed, Dec 2, 2009 at 12:03 AM, Tom Lane t...@sss.pgh.pa.us wrote:
Greg Stark gsst...@mit.edu writes:
On Tue, Dec 1, 2009 at 10:47 PM, Tom Lane t...@sss.pgh.pa.us wrote:
I don't think relatively cheap is the right criterion here --- the
question to me is how many assumptions are you making in
* Greg Stark gsst...@mit.edu [091201 20:14]:
I'm not sure we're on the same page. As I understand it there are
three proposals on the table now:
1) set aside a section of the page to contain only non-checksummed
hint bits. That section has to be relocatable so the crc check would
have to
On Fri, 2008-10-17 at 12:26 -0300, Alvaro Herrera wrote:
So this discussion died with no solution arising to the
hint-bit-setting-invalidates-the-CRC problem.
Apparently the only solution in sight is to WAL-log hint bits. Simon
opines it would be horrible from a performance standpoint to
1 - 100 of 341 matches
Mail list logo