On 2013-03-06 13:34:21 +0200, Heikki Linnakangas wrote:
On 06.03.2013 10:41, Simon Riggs wrote:
On 5 March 2013 18:02, Jeff Davispg...@j-davis.com wrote:
Fletcher is probably significantly faster than CRC-16, because I'm just
doing int32 addition in a tight loop.
Simon originally chose
On Wed, Mar 06, 2013 at 01:34:21PM +0200, Heikki Linnakangas wrote:
On 06.03.2013 10:41, Simon Riggs wrote:
On 5 March 2013 18:02, Jeff Davispg...@j-davis.com wrote:
Fletcher is probably significantly faster than CRC-16, because I'm just
doing int32 addition in a tight loop.
Simon
On 2013-03-06 11:21:21 -0500, Garick Hamlin wrote:
If picking a CRC why not a short optimal one rather than truncate CRC32C?
CRC32C is available in hardware since SSE4.2.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7
On Mon, Mar 4, 2013 at 3:13 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
On 04.03.2013 20:58, Greg Smith wrote:
There
is no such thing as a stable release of btrfs, and no timetable for when
there will be one. I could do some benchmarks of that but I didn't think
they were very
There may be good reasons to reject this patch. Or there may not.
But I completely disagree with the idea that asking them to solve the
problem at the filesystem level is sensible.
Yes, can we get back to the main issues with the patch?
1) argument over whether the checksum is sufficient to
Robert,
We've had a few EnterpriseDB customers who have had fantastically
painful experiences with PostgreSQL + ZFS. Supposedly, aligning the
ZFS block size to the PostgreSQL block size is supposed to make these
problems go away, but in my experience it does not have that effect.
So I think
On Wed, Mar 6, 2013 at 2:14 PM, Josh Berkus j...@agliodbs.com wrote:
Based on Smith's report, I consider (2) to be a deal-killer right now.
I was pretty depressed by those numbers, too.
The level of overhead reported by him would prevent the users I work
with from ever employing checksums on
On Wed, Mar 6, 2013 at 6:00 PM, Josh Berkus j...@agliodbs.com wrote:
We've had a few EnterpriseDB customers who have had fantastically
painful experiences with PostgreSQL + ZFS. Supposedly, aligning the
ZFS block size to the PostgreSQL block size is supposed to make these
problems go away,
On 03/06/2013 03:06 PM, Robert Haas wrote:
On Wed, Mar 6, 2013 at 6:00 PM, Josh Berkus j...@agliodbs.com wrote:
We've had a few EnterpriseDB customers who have had fantastically
painful experiences with PostgreSQL + ZFS. Supposedly, aligning the
ZFS block size to the PostgreSQL block size is
Andres Freund and...@2ndquadrant.com writes:
On 2013-03-06 11:21:21 -0500, Garick Hamlin wrote:
If picking a CRC why not a short optimal one rather than truncate CRC32C?
CRC32C is available in hardware since SSE4.2.
I think that should be at most a fourth-order consideration, since we
are not
On 03/06/2013 07:34 PM, Heikki Linnakangas wrote:
It'd be difficult to change the algorithm in a future release without
breaking on-disk compatibility,
On-disk compatibility is broken with major releases anyway, so I don't
see this as a huge barrier.
--
Craig Ringer
On 2013-03-07 08:37:40 +0800, Craig Ringer wrote:
On 03/06/2013 07:34 PM, Heikki Linnakangas wrote:
It'd be difficult to change the algorithm in a future release without
breaking on-disk compatibility,
On-disk compatibility is broken with major releases anyway, so I don't
see this as a huge
On 3/6/13 1:34 PM, Robert Haas wrote:
We've had a few EnterpriseDB customers who have had fantastically
painful experiences with PostgreSQL + ZFS. Supposedly, aligning the
ZFS block size to the PostgreSQL block size is supposed to make these
problems go away, but in my experience it does not
On 03/07/2013 08:41 AM, Andres Freund wrote:
On 2013-03-07 08:37:40 +0800, Craig Ringer wrote:
On 03/06/2013 07:34 PM, Heikki Linnakangas wrote:
It'd be difficult to change the algorithm in a future release without
breaking on-disk compatibility,
On-disk compatibility is broken with major
On 3/4/13 7:04 PM, Daniel Farina wrote:
Corruption has easily occupied more than one person-month of time last
year for us.
Just FYI for anyone that's experienced corruption... we've looked into doing
row-level checksums at work. The only challenge we ran into was how to check
them when
On 3/6/13 1:14 PM, Josh Berkus wrote:
There may be good reasons to reject this patch. Or there may not.
But I completely disagree with the idea that asking them to solve the
problem at the filesystem level is sensible.
Yes, can we get back to the main issues with the patch?
1) argument
On 3/6/13 6:34 AM, Heikki Linnakangas wrote:
Another thought is that perhaps something like CRC32C would be faster to
calculate on modern hardware, and could be safely truncated to 16-bits
using the same technique you're using to truncate the Fletcher's
Checksum. Greg's tests showed that the
On 3/6/13 1:24 PM, Tom Lane wrote:
Andres Freund and...@2ndquadrant.com writes:
On 2013-03-06 11:21:21 -0500, Garick Hamlin wrote:
If picking a CRC why not a short optimal one rather than truncate CRC32C?
CRC32C is available in hardware since SSE4.2.
I think that should be at most a
On Wed, Mar 6, 2013 at 11:04 PM, Robert Haas robertmh...@gmail.com wrote:
When we first talked about this feature for
9.2, we were going to exclude hint bits from checksums, in order to
avoid this issue; what happened to that?
I don't think anyone ever thought that was a particularly
TL;DR summary: on a system I thought was a fair middle of the road
server, pgbench tests are averaging about a 2% increase in WAL writes
and a 2% slowdown when I turn on checksums. There are a small number of
troublesome cases where that overhead rises to closer to 20%, an upper
limit that's
On Wed, Mar 6, 2013 at 8:17 PM, Greg Smith g...@2ndquadrant.com wrote:
TL;DR summary: on a system I thought was a fair middle of the road server,
pgbench tests are averaging about a 2% increase in WAL writes and a 2%
slowdown when I turn on checksums. There are a small number of troublesome
On 3/7/13 12:15 AM, Daniel Farina wrote:
I have only done some cursory research, but cpu-time of 20% seem to
expected for InnoDB's CRC computation[0]. Although a galling number,
this comparison with other systems may be a way to see how much of
that overhead is avoidable or just the price of
On 5 March 2013 01:04, Daniel Farina dan...@heroku.com wrote:
Corruption has easily occupied more than one person-month of time last
year for us. This year to date I've burned two weeks, although
admittedly this was probably the result of statistical clustering.
Other colleagues of mine have
On 04.03.2013 09:11, Simon Riggs wrote:
On 3 March 2013 18:24, Greg Smithg...@2ndquadrant.com wrote:
The 16-bit checksum feature seems functional, with two sources of overhead.
There's some CPU time burned to compute checksums when pages enter the
system. And there's extra overhead for WAL
Thank you for the review.
On Tue, 2013-03-05 at 11:35 +0200, Heikki Linnakangas wrote:
If you enable checksums, the free space map never gets updated in a
standby. It will slowly drift to be completely out of sync with reality,
which could lead to significant slowdown and bloat after
On 04.03.2013 09:11, Simon Riggs wrote:
Are there objectors?
FWIW, I still think that checksumming belongs in the filesystem, not
PostgreSQL. If you go ahead with this anyway, at the very least I'd like
to see some sort of a comparison with e.g btrfs. How do performance,
error-detection
On Mon, 2013-03-04 at 10:36 +0200, Heikki Linnakangas wrote:
On 04.03.2013 09:11, Simon Riggs wrote:
Are there objectors?
FWIW, I still think that checksumming belongs in the filesystem, not
PostgreSQL.
Doing checksums in the filesystem has some downsides. One is that you
need to use a
On 3/4/13 2:11 AM, Simon Riggs wrote:
It's crunch time. Do you and Jeff believe this patch should be
committed to Postgres core?
I want to see a GUC to allow turning this off, to avoid the problem I
saw where a non-critical header corruption problem can cause an entire
page to be unreadable.
On Mon, 2013-02-25 at 01:30 -0500, Greg Smith wrote:
Attached is some bit rot updates to the checksums patches. The
replace-tli one still works fine. I fixed a number of conflicts in the
larger patch. The one I've attached here isn't 100% to project
standards--I don't have all the
On 04.03.2013 20:58, Greg Smith wrote:
There
is no such thing as a stable release of btrfs, and no timetable for when
there will be one. I could do some benchmarks of that but I didn't think
they were very relevant. Who cares how fast something might run when it
may not work correctly? btrfs
On Mon, 2013-03-04 at 11:52 +0800, Craig Ringer wrote:
I also suspect that at least in the first release it might be desirable
to have an option that essentially says something's gone horribly wrong
and we no longer want to check or write checksums, we want a
non-checksummed DB that can still
On Mon, 2013-03-04 at 22:13 +0200, Heikki Linnakangas wrote:
On 04.03.2013 20:58, Greg Smith wrote:
There
is no such thing as a stable release of btrfs, and no timetable for when
there will be one. I could do some benchmarks of that but I didn't think
they were very relevant. Who cares
On 04.03.2013 18:00, Jeff Davis wrote:
On Mon, 2013-03-04 at 10:36 +0200, Heikki Linnakangas wrote:
On 04.03.2013 09:11, Simon Riggs wrote:
Are there objectors?
FWIW, I still think that checksumming belongs in the filesystem, not
PostgreSQL.
Doing checksums in the filesystem has some
On Sun, 2013-03-03 at 22:18 -0500, Greg Smith wrote:
As for a design of a GUC that might be useful here, the option itself
strikes me as being like archive_mode in its general use. There is an
element of parameters like wal_sync_method or enable_cassert though,
where the options available
On Mon, 2013-03-04 at 13:58 -0500, Greg Smith wrote:
On 3/4/13 2:11 AM, Simon Riggs wrote:
It's crunch time. Do you and Jeff believe this patch should be
committed to Postgres core?
I want to see a GUC to allow turning this off, to avoid the problem I
saw where a non-critical header
On 3/4/13 10:00 AM, Jeff Davis wrote:
On Mon, 2013-03-04 at 10:36 +0200, Heikki Linnakangas wrote:
On 04.03.2013 09:11, Simon Riggs wrote:
Are there objectors?
FWIW, I still think that checksumming belongs in the filesystem, not
PostgreSQL.
Doing checksums in the filesystem has some
On 3/4/13 2:48 PM, Jeff Davis wrote:
On Mon, 2013-03-04 at 13:58 -0500, Greg Smith wrote:
On 3/4/13 2:11 AM, Simon Riggs wrote:
It's crunch time. Do you and Jeff believe this patch should be
committed to Postgres core?
I want to see a GUC to allow turning this off, to avoid the problem I
On Mon, 2013-03-04 at 22:27 +0200, Heikki Linnakangas wrote:
Yeah, fragmentation will certainly hurt some workloads. But how badly,
and which workloads, and how does that compare with the work that
PostgreSQL has to do to maintain the checksums? I'd like to see some
data on those things.
I
On 04.03.2013 22:51, Jim Nasby wrote:
The time to
object to the concept of a checksuming feature was a long time ago,
before a ton of development effort went into this... :(
I did. Development went ahead anyway.
- Heikki
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
On 04.03.2013 22:40, Jeff Davis wrote:
Is there any reason why we can't have both postgres and filesystem
checksums?
Of course not. But if we can get away without checksums in Postgres,
that's better, because then we don't need to maintain that feature in
Postgres. If the patch gets
On Mon, Mar 04, 2013 at 01:00:09PM -0800, Jeff Davis wrote:
On Mon, 2013-03-04 at 22:27 +0200, Heikki Linnakangas wrote:
If you're serious enough about your data that you want checksums, you
should be able to choose your filesystem.
I simply disagree. I am targeting my feature at casual
On 04.03.2013 23:00, Jeff Davis wrote:
On Mon, 2013-03-04 at 22:27 +0200, Heikki Linnakangas wrote:
Yeah, fragmentation will certainly hurt some workloads. But how badly,
and which workloads, and how does that compare with the work that
PostgreSQL has to do to maintain the checksums? I'd like
On 3/4/13 3:00 PM, Heikki Linnakangas wrote:
On 04.03.2013 22:51, Jim Nasby wrote:
The time to
object to the concept of a checksuming feature was a long time ago,
before a ton of development effort went into this... :(
I did. Development went ahead anyway.
Right, because the community felt
On 04.03.2013 22:51, Jim Nasby wrote:
Additionally, no filesystem I'm aware of checksums the data in the
filesystem cache. A PG checksum would.
The patch says:
+ * IMPORTANT NOTE -
+ * The checksum is not valid at all times on a data page. We set it before we
+ * flush page/buffer, and
* Heikki Linnakangas (hlinnakan...@vmware.com) wrote:
Perhaps we should just wait a few years? If we suspect that this
becomes obsolete in a few years, it's probably better to just wait,
than add a feature we'll have to keep maintaining. Assuming it gets
committed today, it's going to take a
On 03/05/2013 04:48 AM, Jeff Davis wrote:
We would still calculate the checksum and print the warning; and then
pass it through the rest of the header checks. If the header checks
pass, then it proceeds. If the header checks fail, and if
zero_damaged_pages is off, then it would still generate
On 3/4/13 3:13 PM, Heikki Linnakangas wrote:
This PostgreSQL patch hasn't seen any production use, either. In fact,
I'd consider btrfs to be more mature than this patch. Unless you think
that there will be some major changes to the worse in performance in
btrfs, it's perfectly valid and useful
On 3/4/13 5:20 PM, Craig Ringer wrote:
On 03/05/2013 04:48 AM, Jeff Davis wrote:
We would still calculate the checksum and print the warning; and then
pass it through the rest of the header checks. If the header checks
pass, then it proceeds. If the header checks fail, and if
zero_damaged_pages
On 03/05/2013 08:15 AM, Jim Nasby wrote:
Would it be better to do checksum_logging_level = valid elog levels
? That way someone could set the notification to anything from DEBUG
up to PANIC. ISTM the default should be ERROR.
That seems nice at first brush, but I don't think it holds up.
All
On 3/4/13 6:22 PM, Craig Ringer wrote:
On 03/05/2013 08:15 AM, Jim Nasby wrote:
Would it be better to do checksum_logging_level = valid elog levels
? That way someone could set the notification to anything from DEBUG
up to PANIC. ISTM the default should be ERROR.
That seems nice at first
Heikki,
Perhaps we should just wait a few years? If we suspect that this becomes
obsolete in a few years, it's probably better to just wait, than add a
feature we'll have to keep maintaining. Assuming it gets committed
today, it's going to take a year or two for 9.3 to get released and all
On Mon, Mar 4, 2013 at 1:22 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
On 04.03.2013 23:00, Jeff Davis wrote:
On Mon, 2013-03-04 at 22:27 +0200, Heikki Linnakangas wrote:
Yeah, fragmentation will certainly hurt some workloads. But how badly,
and which workloads, and how does that
On Mon, 2013-03-04 at 14:57 -0600, Jim Nasby wrote:
I suggest we paint that GUC along the lines of
checksum_failure_log_level, defaulting to ERROR. That way if someone
wanted completely bury the elogs to like DEBUG they could.
The reason I didn't want to do that is because it's essentially a
On Mon, 2013-03-04 at 23:22 +0200, Heikki Linnakangas wrote:
On 04.03.2013 23:00, Jeff Davis wrote:
On Mon, 2013-03-04 at 22:27 +0200, Heikki Linnakangas wrote:
Yeah, fragmentation will certainly hurt some workloads. But how badly,
and which workloads, and how does that compare with the
On Mon, 2013-03-04 at 23:11 +0200, Heikki Linnakangas wrote:
Of course not. But if we can get away without checksums in Postgres,
that's better, because then we don't need to maintain that feature in
Postgres. If the patch gets committed, it's not mission accomplished.
There will be
On Sun, 2013-03-03 at 18:05 -0500, Greg Smith wrote:
= Test 1 - find worst-case overhead for the checksum calculation on write =
This can hit 25% of runtime when you isolate it out. I'm not sure if
how I'm running this multiple times makes sense yet. This one is so
much slower on my Mac
On 03/02/2013 12:48 AM, Daniel Farina wrote:
On Sun, Feb 24, 2013 at 10:30 PM, Greg Smith g...@2ndquadrant.com wrote:
Attached is some bit rot updates to the checksums patches. The replace-tli
one still works fine
I rather badly want this feature, and if the open issues with the
patch
And here's an updated version of the checksum corruption testing wrapper
script already. This includes an additional safety check that you've
set PGDATA to a location that can be erased. Presumably no one else
would like to accidentally do this:
rm -rf /*
Like I just did.
--
Greg Smith
On 12/19/12 6:30 PM, Jeff Davis wrote:
I ran a few tests.
Test 1 - find worst-case overhead for the checksum calculation on write:
Test 2 - worst-case overhead for calculating checksum while reading data
Test 3 - worst-case WAL overhead
What I've done is wrap all of these tests into a shell
On 3/3/13 9:22 AM, Craig Ringer wrote:
Did you get a chance to see whether you can run it in
checksum-validation-and-update-off backward compatible mode? This seems
like an important thing to have working (and tested for) in case of
bugs, performance issues or other unforseen circumstances.
On 03/04/2013 11:18 AM, Greg Smith wrote:
On 3/3/13 9:22 AM, Craig Ringer wrote:
Did you get a chance to see whether you can run it in
checksum-validation-and-update-off backward compatible mode? This seems
like an important thing to have working (and tested for) in case of
bugs, performance
On 3/3/13 10:52 PM, Craig Ringer wrote:
I also suspect that at least in the first release it might be desirable
to have an option that essentially says something's gone horribly wrong
and we no longer want to check or write checksums, we want a
non-checksummed DB that can still read our data
On 03/04/2013 12:19 PM, Greg Smith wrote:
On 3/3/13 10:52 PM, Craig Ringer wrote:
I also suspect that at least in the first release it might be desirable
to have an option that essentially says something's gone horribly wrong
and we no longer want to check or write checksums, we want a
On 3 March 2013 18:24, Greg Smith g...@2ndquadrant.com wrote:
The 16-bit checksum feature seems functional, with two sources of overhead.
There's some CPU time burned to compute checksums when pages enter the
system. And there's extra overhead for WAL logging hint bits. I'll
quantify both
On Sun, Feb 24, 2013 at 10:30 PM, Greg Smith g...@2ndquadrant.com wrote:
Attached is some bit rot updates to the checksums patches. The replace-tli
one still works fine
I rather badly want this feature, and if the open issues with the
patch has hit zero, I'm thinking about applying it,
On Sun, Jan 27, 2013 at 5:28 PM, Jeff Davis pg...@j-davis.com wrote:
There's a maximum of one FPI per page per cycle, and we need the FPI for
any modified page in this design regardless.
So, deferring the XLOG_HINT WAL record doesn't change the total number
of FPIs emitted. The only savings
On 25 January 2013 20:29, Robert Haas robertmh...@gmail.com wrote:
The checksums patch also introduces another behavior into
SetBufferCommitInfoNeedsSave, which is to write an XLOG_HINT WAL record
if checksums are enabled (to avoid torn page hazards). That's only
necessary for changes where
On Sun, Jan 27, 2013 at 3:50 AM, Simon Riggs si...@2ndquadrant.com wrote:
If we attempted to defer the FPI last thing before write, we'd need to
cope with the case that writes at checkpoint occur after the logical
start of the checkpoint, and also with the overhead of additional
writes at
On Sat, 2013-01-26 at 23:23 -0500, Robert Haas wrote:
If we were to try to defer writing the WAL until the page was being
written, the most it would possibly save is the small XLOG_HINT WAL
record; it would not save any FPIs.
How is the XLOG_HINT_WAL record kept small and why does it not
On Fri, Jan 25, 2013 at 9:35 PM, Jeff Davis pg...@j-davis.com wrote:
On Fri, 2013-01-25 at 15:29 -0500, Robert Haas wrote:
I thought Simon had the idea, at some stage, of writing a WAL record
to cover hint-bit changes only at the time we *write* the buffer and
only if no FPI had already been
On Thu, Jan 10, 2013 at 1:06 AM, Jeff Davis pg...@j-davis.com wrote:
On Tue, 2012-12-04 at 01:03 -0800, Jeff Davis wrote:
For now, I rebased the patches against master, and did some very minor
cleanup.
I think there is a problem here when setting PD_ALL_VISIBLE. I thought I
had analyzed that
On Fri, 2013-01-25 at 15:29 -0500, Robert Haas wrote:
I thought Simon had the idea, at some stage, of writing a WAL record
to cover hint-bit changes only at the time we *write* the buffer and
only if no FPI had already been emitted that checkpoint cycle. I'm
not sure whether that approach was
On Wed, 2013-01-16 at 17:38 -0800, Jeff Davis wrote:
New version of checksums patch.
And another new version of both patches.
Changes:
* Rebased.
* Rename SetBufferCommitInfoNeedsSave to MarkBufferDirtyHint. Now that
it's being used more places, it makes sense to give it a more generic
name.
New version of checksums patch.
Changes:
* rebased
* removed two duplicate lines; apparently the result of a bad merge
* Added heap page to WAL chain when logging an XLOG_HEAP2_VISIBLE to
avoid torn page issues updating PD_ALL_VISIBLE. This is the most
significant change.
* minor comment
On Tue, 2013-01-15 at 19:36 -0500, Greg Smith wrote:
First rev of a simple corruption program is attached, in very C-ish
Python.
Great. Did you verify that my patch works as you expect at least in the
simple case?
The parameters I settled on are to accept a relation name, byte
offset,
First rev of a simple corruption program is attached, in very C-ish
Python. The parameters I settled on are to accept a relation name, byte
offset, byte value, and what sort of operation to do: overwrite, AND,
OR, XOR. I like XOR here because you can fix it just by running the
program
On 12/19/12 6:30 PM, Jeff Davis wrote:
The idea is to prevent interference from the bgwriter or autovacuum.
Also, I turn of fsync so that it's measuring the calculation overhead,
not the effort of actually writing to disk.
With my test server issues sorted, what I did was setup a single
On 10 January 2013 06:06, Jeff Davis pg...@j-davis.com wrote:
The checksums patch also introduces another behavior into
SetBufferCommitInfoNeedsSave, which is to write an XLOG_HINT WAL record
if checksums are enabled (to avoid torn page hazards). That's only
necessary for changes where the
The checksums patch also introduces another behavior into
SetBufferCommitInfoNeedsSave, which is to write an XLOG_HINT WAL record
if checksums are enabled (to avoid torn page hazards). That's only
necessary for changes where the caller does not write WAL itself and
doesn't bump the LSN
On Tue, 2012-12-04 at 01:03 -0800, Jeff Davis wrote:
For now, I rebased the patches against master, and did some very minor
cleanup.
I think there is a problem here when setting PD_ALL_VISIBLE. I thought I
had analyzed that before, but upon review, it doesn't look right.
Setting PD_ALL_VISIBLE
On Tue, Dec 18, 2012 at 04:06:02AM -0500, Greg Smith wrote:
On 12/18/12 3:17 AM, Simon Riggs wrote:
Clearly part of the response could involve pg_dump on the damaged
structure, at some point.
This is the main thing I wanted to try out more, once I have a
decent corruption generation tool.
On Tue, 2012-12-04 at 01:03 -0800, Jeff Davis wrote:
4. We need some general performance testing to show whether this is
insane or not.
I ran a few tests.
Test 1 - find worst-case overhead for the checksum calculation on write:
fsync = off
bgwriter_lru_maxpages = 0
shared_buffers =
On 18 December 2012 02:21, Jeff Davis pg...@j-davis.com wrote:
On Mon, 2012-12-17 at 19:14 +, Simon Riggs wrote:
We'll need a way of expressing some form of corruption tolerance.
zero_damaged_pages is just insane,
The main problem I see with zero_damaged_pages is that it could
On 12/18/12 3:17 AM, Simon Riggs wrote:
Clearly part of the response could involve pg_dump on the damaged
structure, at some point.
This is the main thing I wanted to try out more, once I have a decent
corruption generation tool. If you've corrupted a single record but can
still pg_dump the
Greg Smith wrote:
In general, what I hope people will be able to do is switch over to
their standby server, and then investigate further. I think it's
unlikely that people willing to pay for block checksums will only have
one server. Having some way to nail down if the same block is bad on
There is no good way to make the poor soul who has no standby server
happy here. You're just choosing between bad alternatives. The first
block error is often just that--the first one, to be joined by others
soon afterward. My experience at how drives fail says the second error
is a lot more
On Tue, 2012-12-18 at 08:17 +, Simon Riggs wrote:
I think we should discuss whether we accept my premise? Checksums will
actually detect more errors than we see now, and people will want to
do something about that. Returning to backup is one way of handling
it, but on a busy production
On Tue, 2012-12-18 at 04:06 -0500, Greg Smith wrote:
Having some way to nail down if the same block is bad on a
given standby seems like a useful interface we should offer, and it
shouldn't take too much work. Ideally you won't find the same
corruption there. I'd like a way to check the
Jeff Davis pg...@j-davis.com writes:
-A relation name
-Corruption type (an entry from this list)
-How many blocks to touch
I'll just loop based on the count, randomly selecting a block each time
and messing with it in that way.
For the messing with it part, did you consider zzuf?
On 14 December 2012 20:15, Greg Smith g...@2ndquadrant.com wrote:
On 12/14/12 3:00 PM, Jeff Davis wrote:
After some thought, I don't see much value in introducing multiple
instances of corruption at a time. I would think that the smallest unit
of corruption would be the hardest to detect, so
Simon Riggs si...@2ndquadrant.com writes:
Discussing this makes me realise that we need a more useful response
than just your data is corrupt, so user can respond yes, I know,
I'm trying to save whats left.
We'll need a way of expressing some form of corruption tolerance.
zero_damaged_pages
On 17 December 2012 19:29, Tom Lane t...@sss.pgh.pa.us wrote:
Simon Riggs si...@2ndquadrant.com writes:
Discussing this makes me realise that we need a more useful response
than just your data is corrupt, so user can respond yes, I know,
I'm trying to save whats left.
We'll need a way of
On Mon, 2012-12-17 at 19:14 +, Simon Riggs wrote:
We'll need a way of expressing some form of corruption tolerance.
zero_damaged_pages is just insane,
The main problem I see with zero_damaged_pages is that it could
potentially write out the zero page, thereby really losing your data if
it
On Wed, 2012-12-12 at 17:52 -0500, Greg Smith wrote:
I can take this on, as part of the QA around checksums working as
expected. The result would be a Python program; I don't have quite
enough time to write this in C or re-learn Perl to do it right now. But
this won't be a lot of code.
On 12/14/12 3:00 PM, Jeff Davis wrote:
After some thought, I don't see much value in introducing multiple
instances of corruption at a time. I would think that the smallest unit
of corruption would be the hardest to detect, so by introducing many of
them in one pass makes it easier to detect.
On 12/5/12 6:49 PM, Simon Riggs wrote:
* Zeroing pages, making pages all 1s
* Transposing pages
* Moving chunks of data sideways in a block
* Flipping bits randomly
* Flipping data endianness
* Destroying particular catalog tables or structures
I can take this on, as part of the QA around
Robert Haas wrote:
Jeff Davis pg...@j-davis.com wrote:
Or, I could write up a test framework in ruby or python, using
the appropriate pg driver, and some not-so-portable shell
commands to start and stop the server. Then, I can publish that
on this list, and that would at least make it easier
On Tue, Dec 4, 2012 at 6:17 PM, Jeff Davis pg...@j-davis.com wrote:
Or, I could write up a test framework in ruby or python, using the
appropriate pg driver, and some not-so-portable shell commands to start
and stop the server. Then, I can publish that on this list, and that
would at least
On 5 December 2012 23:40, Robert Haas robertmh...@gmail.com wrote:
On Tue, Dec 4, 2012 at 6:17 PM, Jeff Davis pg...@j-davis.com wrote:
Or, I could write up a test framework in ruby or python, using the
appropriate pg driver, and some not-so-portable shell commands to start
and stop the server.
On Mon, 2012-12-03 at 13:16 +, Simon Riggs wrote:
On 3 December 2012 09:56, Simon Riggs si...@2ndquadrant.com wrote:
I think the way forwards for this is...
1. Break out the changes around inCommit flag, since that is just
uncontroversial refactoring. I can do that. That reduces the
201 - 300 of 359 matches
Mail list logo