On Mon, Nov 8, 2010 at 5:59 PM, Aidan Van Dyk ai...@highrise.ca wrote:
The problem that putting checksums in a different place solves is the
page layout (binary upgrade) problem. You're still doing to need to
buffer the page as you calculate the checksum and write it out.
buffering that page
On Tue, Nov 9, 2010 at 8:45 AM, Greg Stark gsst...@mit.edu wrote:
But buffering the page only means you've got some consistent view of
the page. It doesn't mean the checksum will actually match the data in
the page that gets written out. So when you read it back in the
checksum may be
On Tue, Nov 9, 2010 at 2:28 PM, Aidan Van Dyk ai...@highrise.ca wrote:
On Tue, Nov 9, 2010 at 8:45 AM, Greg Stark gsst...@mit.edu wrote:
But buffering the page only means you've got some consistent view of
the page. It doesn't mean the checksum will actually match the data in
the page that
On Tue, Nov 9, 2010 at 3:25 PM, Greg Stark gsst...@mit.edu wrote:
Oh, I'm mistaken. The problem was that buffering the writes was
insufficient to deal with torn pages. Even if you buffer the writes if
the machine crashes while only having written half the buffer out then
the checksum won't
On Nov 9, 2010, at 9:27 AM, Greg Stark wrote:
On Tue, Nov 9, 2010 at 3:25 PM, Greg Stark gsst...@mit.edu wrote:
Oh, I'm mistaken. The problem was that buffering the writes was
insufficient to deal with torn pages. Even if you buffer the writes if
the machine crashes while only having written
On Tue, Nov 9, 2010 at 12:32 AM, Tom Lane t...@sss.pgh.pa.us wrote:
There are also crosschecks that you can apply: if it's a heap page, are
there any index pages with pointers to it? If it's an index page, are
there downlink or sibling links to it from elsewhere in the index?
A page that
On Tue, Nov 9, 2010 at 4:26 PM, Jim Nasby j...@nasby.net wrote:
On Tue, Nov 9, 2010 at 3:25 PM, Greg Stark gsst...@mit.edu wrote:
Oh, I'm mistaken. The problem was that buffering the writes was
insufficient to deal with torn pages. Even if you buffer the writes if
the machine crashes while
On Tue, Nov 9, 2010 at 11:26 AM, Jim Nasby j...@nasby.net wrote:
Huh, this implies that if we did go through all the work of
segregating the hint bits and could arrange that they all appear on
the same 512-byte sector and if we buffered them so that we were
writing the same bits we
Gurjeet Singh singh.gurj...@gmail.com writes:
On Tue, Nov 9, 2010 at 12:32 AM, Tom Lane t...@sss.pgh.pa.us wrote:
IMO there are a lot of methods that can separate filesystem misfeasance
from Postgres errors, probably with greater reliability than this hack.
Doing this postmortem on a regular
On Tue, Nov 9, 2010 at 5:06 PM, Aidan Van Dyk ai...@highrise.ca wrote:
So, for getting checksums, we have to offer up a few things:
1) zero-copy writes, we need to buffer the write to get a consistent
checksum (or lock the buffer tight)
2) saving hint-bits on an otherwise unchanged page. We
On Tue, Nov 9, 2010 at 12:31 PM, Greg Stark gsst...@mit.edu wrote:
On Tue, Nov 9, 2010 at 5:06 PM, Aidan Van Dyk ai...@highrise.ca wrote:
So, for getting checksums, we have to offer up a few things:
1) zero-copy writes, we need to buffer the write to get a consistent
checksum (or lock the
On Tue, Nov 09, 2010 at 02:05:57PM -0500, Robert Haas wrote:
On Tue, Nov 9, 2010 at 12:31 PM, Greg Stark gsst...@mit.edu wrote:
On Tue, Nov 9, 2010 at 5:06 PM, Aidan Van Dyk ai...@highrise.ca wrote:
So, for getting checksums, we have to offer up a few things:
1) zero-copy writes, we need to
Excerpts from Robert Haas's message of mar nov 09 16:05:57 -0300 2010:
And it still allows silent data corruption, because bogusly clearing a
hint bit is, at the moment, harmless, but bogusly setting one is not.
I really have to wonder how other products handle this. PostgreSQL
isn't the
PostgreSQL
isn't the only database product that uses MVCC - not by a long shot -
and the problem of detecting whether an XID is visible to the current
snapshot can't be ours alone. So what do other people do about this?
They either don't cache the information about whether the XID is
On Tue, Nov 9, 2010 at 7:37 PM, Josh Berkus j...@agliodbs.com wrote:
Well, most of the other MVCC-in-table DBMSes simply don't deal with
large, on-disk databases. In fact, I can't think of one which does,
currently; while MVCC has been popular for the New Databases, they're
all focused on
The whole point of the hint bits is that it's in the same place as the data.
Yes, but the hint bits are currently causing us trouble on several
features or potential features:
* page-level CRC checks
* eliminating vacuum freeze for cold data
* index-only access
* replication
* this patch
*
On Tue, Nov 9, 2010 at 8:12 PM, Josh Berkus j...@agliodbs.com wrote:
The whole point of the hint bits is that it's in the same place as the data.
Yes, but the hint bits are currently causing us trouble on several
features or potential features:
Then we might have to get rid of hint bits. But
On Tue, Nov 9, 2010 at 3:25 PM, Greg Stark gsst...@mit.edu wrote:
Then we might have to get rid of hint bits. But they're hint bits for
a metadata file that already exists, creating another metadata file
doesn't solve anything.
Is there any way to instrument the writes of dirty buffers from
Though incidentally all of the other items you mentioned are generic
problems caused by with MVCC, not hint bits.
Yes, but the hint bits prevent us from implementing workarounds.
--
-- Josh Berkus
PostgreSQL Experts Inc.
On Tue, Nov 9, 2010 at 2:05 PM, Robert Haas robertmh...@gmail.com wrote:
On Tue, Nov 9, 2010 at 12:31 PM, Greg Stark gsst...@mit.edu wrote:
On Tue, Nov 9, 2010 at 5:06 PM, Aidan Van Dyk ai...@highrise.ca wrote:
So, for getting checksums, we have to offer up a few things:
1) zero-copy writes,
On 11/9/10 1:50 PM, Robert Haas wrote:
5. It would be pretty much impossible to run with autovacuum turned
off, and in fact you would likely need to make it a good deal more
aggressive in the specific case of aborted transactions, to mitigate
problems #1, #3, and #4.
6. This would require us
Josh Berkus j...@agliodbs.com wrote:
6. This would require us to be more aggressive about VACUUMing
old-cold relations/page, e.g. VACUUM FREEZE. This it would make
one of our worst issues for data warehousing even worse.
I continue to feel that it is insane that when a table is populated
On Tue, Nov 9, 2010 at 5:03 PM, Josh Berkus j...@agliodbs.com wrote:
On 11/9/10 1:50 PM, Robert Haas wrote:
5. It would be pretty much impossible to run with autovacuum turned
off, and in fact you would likely need to make it a good deal more
aggressive in the specific case of aborted
On Tue, Nov 9, 2010 at 5:15 PM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
Josh Berkus j...@agliodbs.com wrote:
6. This would require us to be more aggressive about VACUUMing
old-cold relations/page, e.g. VACUUM FREEZE. This it would make
one of our worst issues for data warehousing
On Tue, Nov 9, 2010 at 3:05 PM, Greg Stark gsst...@mit.edu wrote:
On Tue, Nov 9, 2010 at 7:37 PM, Josh Berkus j...@agliodbs.com wrote:
Well, most of the other MVCC-in-table DBMSes simply don't deal with
large, on-disk databases. In fact, I can't think of one which does,
currently; while MVCC
Robert,
Uh, no it doesn't. It only requires you to be more aggressive about
vacuuming the transactions that are in the aborted-XIDs array. It
doesn't affect transaction wraparound vacuuming at all, either
positively or negatively. You still have to freeze xmins before they
flip from being
Josh Berkus j...@agliodbs.com writes:
Though incidentally all of the other items you mentioned are generic
problems caused by with MVCC, not hint bits.
Yes, but the hint bits prevent us from implementing workarounds.
If we got rid of hint bits, we'd need workarounds for the ensuing
massive
Robert Haas robertmh...@gmail.com writes:
dons asbestos underpants
4. There would presumably be some finite limit on the size of the
shared memory structure for aborted transactions. I don't think
there'd be any reason to make it particularly small, but if you sat
there and aborted
On Tue, Nov 9, 2010 at 5:45 PM, Josh Berkus j...@agliodbs.com wrote:
Robert,
Uh, no it doesn't. It only requires you to be more aggressive about
vacuuming the transactions that are in the aborted-XIDs array. It
doesn't affect transaction wraparound vacuuming at all, either
positively or
On Wed, Nov 10, 2010 at 1:15 AM, Tom Lane t...@sss.pgh.pa.us wrote:
Once you know that there is, or isn't,
a filesystem-level error involved, what are you going to do next?
You're going to go try to debug the component you know is at fault,
that's what. And that problem is still AI-complete.
On Tue, Nov 9, 2010 at 6:42 PM, Tom Lane t...@sss.pgh.pa.us wrote:
Robert Haas robertmh...@gmail.com writes:
dons asbestos underpants
4. There would presumably be some finite limit on the size of the
shared memory structure for aborted transactions. I don't think
there'd be any reason to
On Tue, Nov 9, 2010 at 7:04 PM, Robert Haas robertmh...@gmail.com wrote:
On Tue, Nov 9, 2010 at 5:45 PM, Josh Berkus j...@agliodbs.com wrote:
Robert,
Uh, no it doesn't. It only requires you to be more aggressive about
vacuuming the transactions that are in the aborted-XIDs array. It
On Sun, Nov 7, 2010 at 1:04 AM, Greg Stark gsst...@mit.edu wrote:
It does seem like this is kind of part and parcel of adding checksums
to blocks. It's arguably kind of silly to add checksums to blocks but
have an commonly produced bitpattern in corruption cases go
undetected.
Getting back to
Aidan Van Dyk ai...@highrise.ca writes:
Getting back to the checksum debate (and this seems like a
semi-version of the checksum debate), now that we have forks, could we
easily add block checksumming to a fork? IT would mean writing to 2
files but that shouldn't be a problem, because until
Gurjeet Singh singh.gurj...@gmail.com writes:
On Sat, Nov 6, 2010 at 11:48 PM, Tom Lane t...@sss.pgh.pa.us wrote:
Um ... and exactly how does that differ from the existing behavior?
Right now a zero filled page considered valid, and is treated as a new page;
PageHeaderIsValid()-/* Check
I wrote:
Aidan Van Dyk ai...@highrise.ca writes:
Getting back to the checksum debate (and this seems like a
semi-version of the checksum debate), now that we have forks, could we
easily add block checksumming to a fork?
More generally, this re-opens the question of whether data in secondary
On Mon, Nov 8, 2010 at 5:00 PM, Tom Lane t...@sss.pgh.pa.us wrote:
So maybe Aidan's got a good idea here. It would sure be a lot easier
to shoehorn checksum checking in as an optional feature if the checksums
were kept someplace else.
Would it? I thought the only problem was the hint bits
On Mon, Nov 8, 2010 at 12:53 PM, Greg Stark gsst...@mit.edu wrote:
On Mon, Nov 8, 2010 at 5:00 PM, Tom Lane t...@sss.pgh.pa.us wrote:
So maybe Aidan's got a good idea here. It would sure be a lot easier
to shoehorn checksum checking in as an optional feature if the checksums
were kept
A customer of ours is quite bothered about finding zero pages in an index
after
a system crash. The task now is to improve the diagnosability of such an
issue
and be able to definitively point to the source of zero pages.
The proposed solution below has been vetted in-house at EnterpriseDB and am
Gurjeet Singh singh.gurj...@gmail.com writes:
.) The basic idea is to have a magic number in every PageHeader before it is
written to disk, and check for this magic number when performing page
validity
checks.
Um ... and exactly how does that differ from the existing behavior?
.) To avoid
On Sat, Nov 6, 2010 at 11:48 PM, Tom Lane t...@sss.pgh.pa.us wrote:
Gurjeet Singh singh.gurj...@gmail.com writes:
.) The basic idea is to have a magic number in every PageHeader before it
is
written to disk, and check for this magic number when performing page
validity
checks.
Um ...
On Sun, Nov 7, 2010 at 4:23 AM, Gurjeet Singh singh.gurj...@gmail.com wrote:
I understand that it is a pretty low-level change, but IMHO the change is
minimal and is being applied in well understood places. All the assumptions
listed have been effective for quite a while, and I don't see these
42 matches
Mail list logo