On Wed, Mar 6, 2013 at 11:04 PM, Robert Haas wrote:
>> When we first talked about this feature for
>> 9.2, we were going to exclude hint bits from checksums, in order to
>> avoid this issue; what happened to that?
>
> I don't think anyone ever thought that was a particularly practical
> design. I
On 3/6/13 1:24 PM, Tom Lane wrote:
Andres Freund writes:
On 2013-03-06 11:21:21 -0500, Garick Hamlin wrote:
If picking a CRC why not a short optimal one rather than truncate CRC32C?
CRC32C is available in hardware since SSE4.2.
I think that should be at most a fourth-order consideration,
On 3/6/13 6:34 AM, Heikki Linnakangas wrote:
Another thought is that perhaps something like CRC32C would be faster to
calculate on modern hardware, and could be safely truncated to 16-bits
using the same technique you're using to truncate the Fletcher's
Checksum. Greg's tests showed that the over
On 3/6/13 1:14 PM, Josh Berkus wrote:
There may be good reasons to reject this patch. Or there may not.
But I completely disagree with the idea that asking them to solve the
problem at the filesystem level is sensible.
Yes, can we get back to the main issues with the patch?
1) argument over
On 3/4/13 7:04 PM, Daniel Farina wrote:
Corruption has easily occupied more than one person-month of time last
year for us.
Just FYI for anyone that's experienced corruption... we've looked into doing
row-level checksums at work. The only challenge we ran into was how to check
them when readi
On 03/07/2013 08:41 AM, Andres Freund wrote:
> On 2013-03-07 08:37:40 +0800, Craig Ringer wrote:
>> On 03/06/2013 07:34 PM, Heikki Linnakangas wrote:
>>> It'd be difficult to change the algorithm in a future release without
>>> breaking on-disk compatibility,
>> On-disk compatibility is broken with
On 3/6/13 1:34 PM, Robert Haas wrote:
We've had a few EnterpriseDB customers who have had fantastically
painful experiences with PostgreSQL + ZFS. Supposedly, aligning the
ZFS block size to the PostgreSQL block size is supposed to make these
problems go away, but in my experience it does not hav
On 2013-03-07 08:37:40 +0800, Craig Ringer wrote:
> On 03/06/2013 07:34 PM, Heikki Linnakangas wrote:
> > It'd be difficult to change the algorithm in a future release without
> > breaking on-disk compatibility,
> On-disk compatibility is broken with major releases anyway, so I don't
> see this as
On 03/06/2013 07:34 PM, Heikki Linnakangas wrote:
> It'd be difficult to change the algorithm in a future release without
> breaking on-disk compatibility,
On-disk compatibility is broken with major releases anyway, so I don't
see this as a huge barrier.
--
Craig Ringer http://
Andres Freund writes:
> On 2013-03-06 11:21:21 -0500, Garick Hamlin wrote:
>> If picking a CRC why not a short optimal one rather than truncate CRC32C?
> CRC32C is available in hardware since SSE4.2.
I think that should be at most a fourth-order consideration, since we
are not interested solely
On 03/06/2013 03:06 PM, Robert Haas wrote:
On Wed, Mar 6, 2013 at 6:00 PM, Josh Berkus wrote:
We've had a few EnterpriseDB customers who have had fantastically
painful experiences with PostgreSQL + ZFS. Supposedly, aligning the
ZFS block size to the PostgreSQL block size is supposed to make
On Wed, Mar 6, 2013 at 6:00 PM, Josh Berkus wrote:
>> We've had a few EnterpriseDB customers who have had fantastically
>> painful experiences with PostgreSQL + ZFS. Supposedly, aligning the
>> ZFS block size to the PostgreSQL block size is supposed to make these
>> problems go away, but in my ex
On Wed, Mar 6, 2013 at 2:14 PM, Josh Berkus wrote:
> Based on Smith's report, I consider (2) to be a deal-killer right now.
I was pretty depressed by those numbers, too.
> The level of overhead reported by him would prevent the users I work
> with from ever employing checksums on production syst
Robert,
> We've had a few EnterpriseDB customers who have had fantastically
> painful experiences with PostgreSQL + ZFS. Supposedly, aligning the
> ZFS block size to the PostgreSQL block size is supposed to make these
> problems go away, but in my experience it does not have that effect.
> So I t
> There may be good reasons to reject this patch. Or there may not.
> But I completely disagree with the idea that asking them to solve the
> problem at the filesystem level is sensible.
Yes, can we get back to the main issues with the patch?
1) argument over whether the checksum is sufficient
On Mon, Mar 4, 2013 at 3:13 PM, Heikki Linnakangas
wrote:
> On 04.03.2013 20:58, Greg Smith wrote:
>>
>> There
>> is no such thing as a stable release of btrfs, and no timetable for when
>> there will be one. I could do some benchmarks of that but I didn't think
>> they were very relevant. Who car
On 2013-03-06 11:21:21 -0500, Garick Hamlin wrote:
> If picking a CRC why not a short optimal one rather than truncate CRC32C?
CRC32C is available in hardware since SSE4.2.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7
On Wed, Mar 06, 2013 at 01:34:21PM +0200, Heikki Linnakangas wrote:
> On 06.03.2013 10:41, Simon Riggs wrote:
>> On 5 March 2013 18:02, Jeff Davis wrote:
>>
>>> Fletcher is probably significantly faster than CRC-16, because I'm just
>>> doing int32 addition in a tight loop.
>>>
>>> Simon originall
On 2013-03-06 13:34:21 +0200, Heikki Linnakangas wrote:
> On 06.03.2013 10:41, Simon Riggs wrote:
> >On 5 March 2013 18:02, Jeff Davis wrote:
> >
> >>Fletcher is probably significantly faster than CRC-16, because I'm just
> >>doing int32 addition in a tight loop.
> >>
> >>Simon originally chose Fl
On 06.03.2013 10:41, Simon Riggs wrote:
On 5 March 2013 18:02, Jeff Davis wrote:
Fletcher is probably significantly faster than CRC-16, because I'm just
doing int32 addition in a tight loop.
Simon originally chose Fletcher, so perhaps he has more to say.
IIRC the research showed Fletcher wa
On 5 March 2013 18:02, Jeff Davis wrote:
> Fletcher is probably significantly faster than CRC-16, because I'm just
> doing int32 addition in a tight loop.
>
> Simon originally chose Fletcher, so perhaps he has more to say.
IIRC the research showed Fletcher was significantly faster for only a
sma
On 5 March 2013 09:35, Heikki Linnakangas wrote:
>> Are there objectors?
>
>
> In addition to my hostility towards this patch in general, there are some
> specifics in the patch I'd like to raise (read out in a grumpy voice):
;-) We all want to make the right choice here, so all viewpoints
grat
Thank you for the review.
On Tue, 2013-03-05 at 11:35 +0200, Heikki Linnakangas wrote:
> If you enable checksums, the free space map never gets updated in a
> standby. It will slowly drift to be completely out of sync with reality,
> which could lead to significant slowdown and bloat after fail
On 04.03.2013 09:11, Simon Riggs wrote:
On 3 March 2013 18:24, Greg Smith wrote:
The 16-bit checksum feature seems functional, with two sources of overhead.
There's some CPU time burned to compute checksums when pages enter the
system. And there's extra overhead for WAL logging hint bits. I'
On 5 March 2013 01:04, Daniel Farina wrote:
> Corruption has easily occupied more than one person-month of time last
> year for us. This year to date I've burned two weeks, although
> admittedly this was probably the result of statistical clustering.
> Other colleagues of mine have probably put
On Sun, 2013-03-03 at 18:05 -0500, Greg Smith wrote:
> = Test 1 - find worst-case overhead for the checksum calculation on write =
>
> This can hit 25% of runtime when you isolate it out. I'm not sure if
> how I'm running this multiple times makes sense yet. This one is so
> much slower on my
On Mon, 2013-03-04 at 23:11 +0200, Heikki Linnakangas wrote:
> Of course not. But if we can get away without checksums in Postgres,
> that's better, because then we don't need to maintain that feature in
> Postgres. If the patch gets committed, it's not mission accomplished.
> There will be disc
On Mon, 2013-03-04 at 23:22 +0200, Heikki Linnakangas wrote:
> On 04.03.2013 23:00, Jeff Davis wrote:
> > On Mon, 2013-03-04 at 22:27 +0200, Heikki Linnakangas wrote:
> >> Yeah, fragmentation will certainly hurt some workloads. But how badly,
> >> and which workloads, and how does that compare with
On Mon, 2013-03-04 at 14:57 -0600, Jim Nasby wrote:
> I suggest we paint that GUC along the lines of
> "checksum_failure_log_level", defaulting to ERROR. That way if someone
> wanted completely bury the elogs to like DEBUG they could.
The reason I didn't want to do that is because it's essentially
On Mon, Mar 4, 2013 at 1:22 PM, Heikki Linnakangas
wrote:
> On 04.03.2013 23:00, Jeff Davis wrote:
>>
>> On Mon, 2013-03-04 at 22:27 +0200, Heikki Linnakangas wrote:
>>>
>>> Yeah, fragmentation will certainly hurt some workloads. But how badly,
>>> and which workloads, and how does that compare wi
Heikki,
> Perhaps we should just wait a few years? If we suspect that this becomes
> obsolete in a few years, it's probably better to just wait, than add a
> feature we'll have to keep maintaining. Assuming it gets committed
> today, it's going to take a year or two for 9.3 to get released and all
On 3/4/13 6:22 PM, Craig Ringer wrote:
On 03/05/2013 08:15 AM, Jim Nasby wrote:
Would it be better to do checksum_logging_level =
? That way someone could set the notification to anything from DEBUG
up to PANIC. ISTM the default should be ERROR.
That seems nice at first brush, but I don't thi
On 03/05/2013 08:15 AM, Jim Nasby wrote:
>
> Would it be better to do checksum_logging_level =
> ? That way someone could set the notification to anything from DEBUG
> up to PANIC. ISTM the default should be ERROR.
That seems nice at first brush, but I don't think it holds up.
All our other log_
On 3/4/13 5:20 PM, Craig Ringer wrote:
On 03/05/2013 04:48 AM, Jeff Davis wrote:
We would still calculate the checksum and print the warning; and then
pass it through the rest of the header checks. If the header checks
pass, then it proceeds. If the header checks fail, and if
zero_damaged_pages
On 3/4/13 3:13 PM, Heikki Linnakangas wrote:
This PostgreSQL patch hasn't seen any production use, either. In fact,
I'd consider btrfs to be more mature than this patch. Unless you think
that there will be some major changes to the worse in performance in
btrfs, it's perfectly valid and useful to
On 03/05/2013 04:48 AM, Jeff Davis wrote:
> We would still calculate the checksum and print the warning; and then
> pass it through the rest of the header checks. If the header checks
> pass, then it proceeds. If the header checks fail, and if
> zero_damaged_pages is off, then it would still genera
* Heikki Linnakangas (hlinnakan...@vmware.com) wrote:
> Perhaps we should just wait a few years? If we suspect that this
> becomes obsolete in a few years, it's probably better to just wait,
> than add a feature we'll have to keep maintaining. Assuming it gets
> committed today, it's going to take
On 04.03.2013 22:51, Jim Nasby wrote:
Additionally, no filesystem I'm aware of checksums the data in the
filesystem cache. A PG checksum would.
The patch says:
+ * IMPORTANT NOTE -
+ * The checksum is not valid at all times on a data page. We set it before we
+ * flush page/buffer, and implic
On 3/4/13 3:00 PM, Heikki Linnakangas wrote:
On 04.03.2013 22:51, Jim Nasby wrote:
The time to
object to the concept of a checksuming feature was a long time ago,
before a ton of development effort went into this... :(
I did. Development went ahead anyway.
Right, because the community felt t
On 04.03.2013 23:00, Jeff Davis wrote:
On Mon, 2013-03-04 at 22:27 +0200, Heikki Linnakangas wrote:
Yeah, fragmentation will certainly hurt some workloads. But how badly,
and which workloads, and how does that compare with the work that
PostgreSQL has to do to maintain the checksums? I'd like to
On Mon, Mar 04, 2013 at 01:00:09PM -0800, Jeff Davis wrote:
> On Mon, 2013-03-04 at 22:27 +0200, Heikki Linnakangas wrote:
> > If you're serious enough about your data that you want checksums, you
> > should be able to choose your filesystem.
>
> I simply disagree. I am targeting my feature at ca
On 04.03.2013 22:40, Jeff Davis wrote:
Is there any reason why we can't have both postgres and filesystem
checksums?
Of course not. But if we can get away without checksums in Postgres,
that's better, because then we don't need to maintain that feature in
Postgres. If the patch gets committed
On 04.03.2013 22:51, Jim Nasby wrote:
The time to
object to the concept of a checksuming feature was a long time ago,
before a ton of development effort went into this... :(
I did. Development went ahead anyway.
- Heikki
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
T
On Mon, 2013-03-04 at 22:27 +0200, Heikki Linnakangas wrote:
> Yeah, fragmentation will certainly hurt some workloads. But how badly,
> and which workloads, and how does that compare with the work that
> PostgreSQL has to do to maintain the checksums? I'd like to see some
> data on those things.
On 3/4/13 2:48 PM, Jeff Davis wrote:
On Mon, 2013-03-04 at 13:58 -0500, Greg Smith wrote:
>On 3/4/13 2:11 AM, Simon Riggs wrote:
> >It's crunch time. Do you and Jeff believe this patch should be
> >committed to Postgres core?
>
>I want to see a GUC to allow turning this off, to avoid the probl
On 3/4/13 10:00 AM, Jeff Davis wrote:
On Mon, 2013-03-04 at 10:36 +0200, Heikki Linnakangas wrote:
>On 04.03.2013 09:11, Simon Riggs wrote:
> >Are there objectors?
>
>FWIW, I still think that checksumming belongs in the filesystem, not
>PostgreSQL.
Doing checksums in the filesystem has some d
On Mon, 2013-03-04 at 13:58 -0500, Greg Smith wrote:
> On 3/4/13 2:11 AM, Simon Riggs wrote:
> > It's crunch time. Do you and Jeff believe this patch should be
> > committed to Postgres core?
>
> I want to see a GUC to allow turning this off, to avoid the problem I
> saw where a non-critical head
On Sun, 2013-03-03 at 22:18 -0500, Greg Smith wrote:
> As for a design of a GUC that might be useful here, the option itself
> strikes me as being like archive_mode in its general use. There is an
> element of parameters like wal_sync_method or enable_cassert though,
> where the options availab
On 04.03.2013 18:00, Jeff Davis wrote:
On Mon, 2013-03-04 at 10:36 +0200, Heikki Linnakangas wrote:
On 04.03.2013 09:11, Simon Riggs wrote:
Are there objectors?
FWIW, I still think that checksumming belongs in the filesystem, not
PostgreSQL.
Doing checksums in the filesystem has some downsi
On Mon, 2013-03-04 at 22:13 +0200, Heikki Linnakangas wrote:
> On 04.03.2013 20:58, Greg Smith wrote:
> > There
> > is no such thing as a stable release of btrfs, and no timetable for when
> > there will be one. I could do some benchmarks of that but I didn't think
> > they were very relevant. Who
On Mon, 2013-03-04 at 11:52 +0800, Craig Ringer wrote:
> I also suspect that at least in the first release it might be desirable
> to have an option that essentially says "something's gone horribly wrong
> and we no longer want to check or write checksums, we want a
> non-checksummed DB that can st
On 04.03.2013 20:58, Greg Smith wrote:
There
is no such thing as a stable release of btrfs, and no timetable for when
there will be one. I could do some benchmarks of that but I didn't think
they were very relevant. Who cares how fast something might run when it
may not work correctly? btrfs migh
On Mon, 2013-02-25 at 01:30 -0500, Greg Smith wrote:
> Attached is some bit rot updates to the checksums patches. The
> replace-tli one still works fine. I fixed a number of conflicts in the
> larger patch. The one I've attached here isn't 100% to project
> standards--I don't have all the con
On 3/4/13 2:11 AM, Simon Riggs wrote:
It's crunch time. Do you and Jeff believe this patch should be
committed to Postgres core?
I want to see a GUC to allow turning this off, to avoid the problem I
saw where a non-critical header corruption problem can cause an entire
page to be unreadable.
On Mon, 2013-03-04 at 10:36 +0200, Heikki Linnakangas wrote:
> On 04.03.2013 09:11, Simon Riggs wrote:
> > Are there objectors?
>
> FWIW, I still think that checksumming belongs in the filesystem, not
> PostgreSQL.
Doing checksums in the filesystem has some downsides. One is that you
need to use
On 04.03.2013 09:11, Simon Riggs wrote:
Are there objectors?
FWIW, I still think that checksumming belongs in the filesystem, not
PostgreSQL. If you go ahead with this anyway, at the very least I'd like
to see some sort of a comparison with e.g btrfs. How do performance,
error-detection rate
On 3 March 2013 18:24, Greg Smith wrote:
> The 16-bit checksum feature seems functional, with two sources of overhead.
> There's some CPU time burned to compute checksums when pages enter the
> system. And there's extra overhead for WAL logging hint bits. I'll
> quantify both of those better in
On 03/04/2013 12:19 PM, Greg Smith wrote:
> On 3/3/13 10:52 PM, Craig Ringer wrote:
>> I also suspect that at least in the first release it might be desirable
>> to have an option that essentially says "something's gone horribly wrong
>> and we no longer want to check or write checksums, we want a
On 3/3/13 10:52 PM, Craig Ringer wrote:
I also suspect that at least in the first release it might be desirable
to have an option that essentially says "something's gone horribly wrong
and we no longer want to check or write checksums, we want a
non-checksummed DB that can still read our data fro
On 03/04/2013 11:18 AM, Greg Smith wrote:
> On 3/3/13 9:22 AM, Craig Ringer wrote:
>> Did you get a chance to see whether you can run it in
>> checksum-validation-and-update-off backward compatible mode? This seems
>> like an important thing to have working (and tested for) in case of
>> bugs, perf
On 3/3/13 9:22 AM, Craig Ringer wrote:
Did you get a chance to see whether you can run it in
checksum-validation-and-update-off backward compatible mode? This seems
like an important thing to have working (and tested for) in case of
bugs, performance issues or other unforseen circumstances.
The
On 12/19/12 6:30 PM, Jeff Davis wrote:
I ran a few tests.
Test 1 - find worst-case overhead for the checksum calculation on write:
Test 2 - worst-case overhead for calculating checksum while reading data
Test 3 - worst-case WAL overhead
What I've done is wrap all of these tests into a shell scr
And here's an updated version of the checksum corruption testing wrapper
script already. This includes an additional safety check that you've
set PGDATA to a location that can be erased. Presumably no one else
would like to accidentally do this:
rm -rf /*
Like I just did.
--
Greg Smith 2
On 03/02/2013 12:48 AM, Daniel Farina wrote:
> On Sun, Feb 24, 2013 at 10:30 PM, Greg Smith wrote:
>> Attached is some bit rot updates to the checksums patches. The replace-tli
>> one still works fine
> I rather badly want this feature, and if the open issues with the
> patch has hit zero, I'
On Sun, Feb 24, 2013 at 10:30 PM, Greg Smith wrote:
> Attached is some bit rot updates to the checksums patches. The replace-tli
> one still works fine
I rather badly want this feature, and if the open issues with the
patch has hit zero, I'm thinking about applying it, shipping it, and
turni
On Sun, Jan 27, 2013 at 5:28 PM, Jeff Davis wrote:
> There's a maximum of one FPI per page per cycle, and we need the FPI for
> any modified page in this design regardless.
>
> So, deferring the XLOG_HINT WAL record doesn't change the total number
> of FPIs emitted. The only savings would be on th
On Sat, 2013-01-26 at 23:23 -0500, Robert Haas wrote:
> > If we were to try to defer writing the WAL until the page was being
> > written, the most it would possibly save is the small XLOG_HINT WAL
> > record; it would not save any FPIs.
>
> How is the XLOG_HINT_WAL record kept small and why does
On Sun, Jan 27, 2013 at 3:50 AM, Simon Riggs wrote:
> If we attempted to defer the FPI last thing before write, we'd need to
> cope with the case that writes at checkpoint occur after the logical
> start of the checkpoint, and also with the overhead of additional
> writes at checkpoint time.
Oh,
On 25 January 2013 20:29, Robert Haas wrote:
>> The checksums patch also introduces another behavior into
>> SetBufferCommitInfoNeedsSave, which is to write an XLOG_HINT WAL record
>> if checksums are enabled (to avoid torn page hazards). That's only
>> necessary for changes where the caller does
On Fri, Jan 25, 2013 at 9:35 PM, Jeff Davis wrote:
> On Fri, 2013-01-25 at 15:29 -0500, Robert Haas wrote:
>> I thought Simon had the idea, at some stage, of writing a WAL record
>> to cover hint-bit changes only at the time we *write* the buffer and
>> only if no FPI had already been emitted that
On Fri, 2013-01-25 at 15:29 -0500, Robert Haas wrote:
> I thought Simon had the idea, at some stage, of writing a WAL record
> to cover hint-bit changes only at the time we *write* the buffer and
> only if no FPI had already been emitted that checkpoint cycle. I'm
> not sure whether that approach
On Thu, Jan 10, 2013 at 1:06 AM, Jeff Davis wrote:
> On Tue, 2012-12-04 at 01:03 -0800, Jeff Davis wrote:
>> For now, I rebased the patches against master, and did some very minor
>> cleanup.
>
> I think there is a problem here when setting PD_ALL_VISIBLE. I thought I
> had analyzed that before, b
On Wed, 2013-01-16 at 17:38 -0800, Jeff Davis wrote:
> New version of checksums patch.
And another new version of both patches.
Changes:
* Rebased.
* Rename SetBufferCommitInfoNeedsSave to MarkBufferDirtyHint. Now that
it's being used more places, it makes sense to give it a more generic
name.
On Tue, 2013-01-15 at 19:36 -0500, Greg Smith wrote:
> First rev of a simple corruption program is attached, in very C-ish
> Python.
Great. Did you verify that my patch works as you expect at least in the
simple case?
> The parameters I settled on are to accept a relation name, byte
> offset,
New version of checksums patch.
Changes:
* rebased
* removed two duplicate lines; apparently the result of a bad merge
* Added heap page to WAL chain when logging an XLOG_HEAP2_VISIBLE to
avoid torn page issues updating PD_ALL_VISIBLE. This is the most
significant change.
* minor comment c
First rev of a simple corruption program is attached, in very C-ish
Python. The parameters I settled on are to accept a relation name, byte
offset, byte value, and what sort of operation to do: overwrite, AND,
OR, XOR. I like XOR here because you can fix it just by running the
program again.
On 12/19/12 6:30 PM, Jeff Davis wrote:
The idea is to prevent interference from the bgwriter or autovacuum.
Also, I turn of fsync so that it's measuring the calculation overhead,
not the effort of actually writing to disk.
With my test server issues sorted, what I did was setup a single 7200RPM
> > The checksums patch also introduces another behavior into
> > SetBufferCommitInfoNeedsSave, which is to write an XLOG_HINT WAL record
> > if checksums are enabled (to avoid torn page hazards). That's only
> > necessary for changes where the caller does not write WAL itself and
> > doesn't bump
On 10 January 2013 06:06, Jeff Davis wrote:
> The checksums patch also introduces another behavior into
> SetBufferCommitInfoNeedsSave, which is to write an XLOG_HINT WAL record
> if checksums are enabled (to avoid torn page hazards). That's only
> necessary for changes where the caller does not
On Tue, 2012-12-04 at 01:03 -0800, Jeff Davis wrote:
> For now, I rebased the patches against master, and did some very minor
> cleanup.
I think there is a problem here when setting PD_ALL_VISIBLE. I thought I
had analyzed that before, but upon review, it doesn't look right.
Setting PD_ALL_VISIBLE
On Tue, Dec 18, 2012 at 04:06:02AM -0500, Greg Smith wrote:
> On 12/18/12 3:17 AM, Simon Riggs wrote:
> >Clearly part of the response could involve pg_dump on the damaged
> >structure, at some point.
>
> This is the main thing I wanted to try out more, once I have a
> decent corruption generation
On Tue, 2012-12-04 at 01:03 -0800, Jeff Davis wrote:
> > 4. We need some general performance testing to show whether this is
> > insane or not.
I ran a few tests.
Test 1 - find worst-case overhead for the checksum calculation on write:
fsync = off
bgwriter_lru_maxpages = 0
shared_buffer
On Tue, 2012-12-18 at 04:06 -0500, Greg Smith wrote:
> Having some way to nail down if the same block is bad on a
> given standby seems like a useful interface we should offer, and it
> shouldn't take too much work. Ideally you won't find the same
> corruption there. I'd like a way to check th
On Tue, 2012-12-18 at 08:17 +, Simon Riggs wrote:
> I think we should discuss whether we accept my premise? Checksums will
> actually detect more errors than we see now, and people will want to
> do something about that. Returning to backup is one way of handling
> it, but on a busy production
>> There is no good way to make the poor soul who has no standby server
>> happy here. You're just choosing between bad alternatives. The first
>> block error is often just that--the first one, to be joined by others
>> soon afterward. My experience at how drives fail says the second error
>> is a
Greg Smith wrote:
> In general, what I hope people will be able to do is switch over to
> their standby server, and then investigate further. I think it's
> unlikely that people willing to pay for block checksums will only have
> one server. Having some way to nail down if the same block is bad
On 12/18/12 3:17 AM, Simon Riggs wrote:
Clearly part of the response could involve pg_dump on the damaged
structure, at some point.
This is the main thing I wanted to try out more, once I have a decent
corruption generation tool. If you've corrupted a single record but can
still pg_dump the
On 18 December 2012 02:21, Jeff Davis wrote:
> On Mon, 2012-12-17 at 19:14 +, Simon Riggs wrote:
>> We'll need a way of expressing some form of corruption tolerance.
>> zero_damaged_pages is just insane,
>
> The main problem I see with zero_damaged_pages is that it could
> potentially write ou
On Mon, 2012-12-17 at 19:14 +, Simon Riggs wrote:
> We'll need a way of expressing some form of corruption tolerance.
> zero_damaged_pages is just insane,
The main problem I see with zero_damaged_pages is that it could
potentially write out the zero page, thereby really losing your data if
it
On 17 December 2012 19:29, Tom Lane wrote:
> Simon Riggs writes:
>> Discussing this makes me realise that we need a more useful response
>> than just "your data is corrupt", so user can respond "yes, I know,
>> I'm trying to save whats left".
>
>> We'll need a way of expressing some form of corru
Simon Riggs writes:
> Discussing this makes me realise that we need a more useful response
> than just "your data is corrupt", so user can respond "yes, I know,
> I'm trying to save whats left".
> We'll need a way of expressing some form of corruption tolerance.
> zero_damaged_pages is just insan
On 14 December 2012 20:15, Greg Smith wrote:
> On 12/14/12 3:00 PM, Jeff Davis wrote:
>>
>> After some thought, I don't see much value in introducing multiple
>> instances of corruption at a time. I would think that the smallest unit
>> of corruption would be the hardest to detect, so by introduci
Jeff Davis writes:
>> -A relation name
>> -Corruption type (an entry from this list)
>> -How many blocks to touch
>>
>> I'll just loop based on the count, randomly selecting a block each time
>> and messing with it in that way.
For the messing with it part, did you consider zzuf?
http://caca
On 12/14/12 3:00 PM, Jeff Davis wrote:
After some thought, I don't see much value in introducing multiple
instances of corruption at a time. I would think that the smallest unit
of corruption would be the hardest to detect, so by introducing many of
them in one pass makes it easier to detect.
T
On Wed, 2012-12-12 at 17:52 -0500, Greg Smith wrote:
> I can take this on, as part of the QA around checksums working as
> expected. The result would be a Python program; I don't have quite
> enough time to write this in C or re-learn Perl to do it right now. But
> this won't be a lot of code.
On 12/5/12 6:49 PM, Simon Riggs wrote:
* Zeroing pages, making pages all 1s
* Transposing pages
* Moving chunks of data sideways in a block
* Flipping bits randomly
* Flipping data endianness
* Destroying particular catalog tables or structures
I can take this on, as part of the QA around check
Robert Haas wrote:
> Jeff Davis wrote:
>> Or, I could write up a test framework in ruby or python, using
>> the appropriate pg driver, and some not-so-portable shell
>> commands to start and stop the server. Then, I can publish that
>> on this list, and that would at least make it easier to test
>
On 5 December 2012 23:40, Robert Haas wrote:
> On Tue, Dec 4, 2012 at 6:17 PM, Jeff Davis wrote:
>> Or, I could write up a test framework in ruby or python, using the
>> appropriate pg driver, and some not-so-portable shell commands to start
>> and stop the server. Then, I can publish that on thi
On Tue, Dec 4, 2012 at 6:17 PM, Jeff Davis wrote:
> Or, I could write up a test framework in ruby or python, using the
> appropriate pg driver, and some not-so-portable shell commands to start
> and stop the server. Then, I can publish that on this list, and that
> would at least make it easier to
On Tue, 2012-12-04 at 01:03 -0800, Jeff Davis wrote:
> > 3. I think we need an explicit test of this feature (as you describe
> > above), rather than manual testing. corruptiontester?
>
> I agree, but I'm not 100% sure how to proceed. I'll look at Kevin's
> tests for SSI and see if I can do someth
201 - 300 of 359 matches
Mail list logo