Re: [HACKERS] Enabling Checksums

2012-12-04 Thread Jeff Davis
On Mon, 2012-12-03 at 09:56 +, Simon Riggs wrote: 1. Break out the changes around inCommit flag, since that is just uncontroversial refactoring. I can do that. That reduces the noise level in the patch and makes it easier to understand the meaningful changes. Done by you. 2. Produce an

Re: [HACKERS] Enabling Checksums

2012-12-04 Thread Jeff Davis
On Tue, 2012-12-04 at 01:03 -0800, Jeff Davis wrote: 3. I think we need an explicit test of this feature (as you describe above), rather than manual testing. corruptiontester? I agree, but I'm not 100% sure how to proceed. I'll look at Kevin's tests for SSI and see if I can do something

Re: [HACKERS] Enabling Checksums

2012-12-03 Thread Simon Riggs
On 26 November 2012 02:32, Jeff Davis pg...@j-davis.com wrote: Updated both patches. Changes: * Moved the changes to pageinspect into the TLI patch, because it makes more sense to be a part of that patch and it also reduces the size of the main checksums patch. * Fix off-by-one bug in

Re: [HACKERS] Enabling Checksums

2012-12-03 Thread Simon Riggs
On 3 December 2012 09:56, Simon Riggs si...@2ndquadrant.com wrote: I think the way forwards for this is... 1. Break out the changes around inCommit flag, since that is just uncontroversial refactoring. I can do that. That reduces the noise level in the patch and makes it easier to understand

Re: [HACKERS] Enabling Checksums

2012-12-03 Thread Simon Riggs
On 26 November 2012 02:32, Jeff Davis pg...@j-davis.com wrote: * Make the checksum algorithm process 4 bytes at a time and sum into a signed 64-bit int, which is faster than byte-at-a-time. Also, forbid zero in either byte of the checksum, because that seems like a good idea. Like that,

Re: [HACKERS] Enabling Checksums

2012-11-25 Thread Jeff Davis
Updated both patches. Changes: * Moved the changes to pageinspect into the TLI patch, because it makes more sense to be a part of that patch and it also reduces the size of the main checksums patch. * Fix off-by-one bug in checksum calculation * Replace VerificationInfo in the function

Re: [HACKERS] Enabling Checksums

2012-11-19 Thread Robert Haas
On Thu, Nov 15, 2012 at 2:44 PM, Jeff Davis pg...@j-davis.com wrote: The issue about external utilities is a bigger problem than I realized at first. Originally, I thought that it was just a matter of code to associate the checksum with the data. However, an external utility will never see a

Re: [HACKERS] Enabling Checksums

2012-11-19 Thread Jeff Davis
On Mon, 2012-11-19 at 11:48 -0500, Robert Haas wrote: I agree that the hazards are not equivalent, but I'm not sure I agree that an external utility will never see a torn page while the system is on-line. We have a bunch of code that essentially forces full_page_writes=on during a base backup

Re: [HACKERS] Enabling Checksums

2012-11-19 Thread Andres Freund
On 2012-11-19 09:22:45 -0800, Jeff Davis wrote: On Mon, 2012-11-19 at 11:48 -0500, Robert Haas wrote: I agree that the hazards are not equivalent, but I'm not sure I agree that an external utility will never see a torn page while the system is on-line. We have a bunch of code that

Re: [HACKERS] Enabling Checksums

2012-11-19 Thread Jeff Davis
On Mon, 2012-11-19 at 18:30 +0100, Andres Freund wrote: Yes, definitely. OK. I suppose that makes sense for large writes. If that is not true, then I'm concerned about replicating corruption, or backing up corrupt blocks over good ones. How do we prevent that? It seems like a pretty major

Re: [HACKERS] Enabling Checksums

2012-11-19 Thread Jeff Davis
On Mon, 2012-11-19 at 10:35 -0800, Jeff Davis wrote: Yes, the blocks written *after* the checkpoint might have a bad checksum that will be fixed during recovery. But the blocks written *before* the checkpoint should have a valid checksum, but if they don't, then recovery doesn't know about

Re: [HACKERS] Enabling Checksums

2012-11-18 Thread Jeff Davis
On Wed, 2012-11-14 at 17:40 -0800, Jeff Davis wrote: I'll do another pass to make sure I update all of the comments, and try to self review it. Updated patches attached (the TLI patch wasn't changed though, only the main checksums patch). Changes: * A lot of cleanup * More testing *

Re: [HACKERS] Enabling Checksums

2012-11-15 Thread Jeff Davis
On Wed, 2012-11-14 at 21:22 -0500, Robert Haas wrote: But there are some practical issues, as Tom points out. Another one is that it's harder for external utilities (like pg_basebackup) to verify checksums. Well, I think the invariant we'd need to maintain is as follows: every page for

Re: [HACKERS] Enabling Checksums

2012-11-14 Thread Robert Haas
On Tue, Nov 13, 2012 at 4:48 PM, Tom Lane t...@sss.pgh.pa.us wrote: What happens when you get an I/O failure on the checksum fork? Assuming you're using 8K pages there, that would mean you can no longer verify the integrity of between one and four thousand pages of data. True... but you'll

Re: [HACKERS] Enabling Checksums

2012-11-14 Thread Bruce Momjian
On Mon, Nov 12, 2012 at 04:42:57PM -0800, Josh Berkus wrote: Jeff, OK, so here's my proposal for a first patch (changes from Simon's patch): * Add a flag to the postgres executable indicating that it should use checksums on everything. This would only be valid if bootstrap mode is

Re: [HACKERS] Enabling Checksums

2012-11-14 Thread Peter Eisentraut
On 11/11/12 6:59 PM, Andrew Dunstan wrote: I haven't followed this too closely, but I did wonder several days ago why this wasn't being made an initdb-time decision. One problem I see with this is that it would make regression testing much more cumbersome. Basically, to do a proper job, you'd

Re: [HACKERS] Enabling Checksums

2012-11-14 Thread Alvaro Herrera
Peter Eisentraut escribió: On 11/11/12 6:59 PM, Andrew Dunstan wrote: I haven't followed this too closely, but I did wonder several days ago why this wasn't being made an initdb-time decision. One problem I see with this is that it would make regression testing much more cumbersome.

Re: [HACKERS] Enabling Checksums

2012-11-14 Thread Andrew Dunstan
On 11/14/2012 02:01 PM, Alvaro Herrera wrote: Peter Eisentraut escribió: On 11/11/12 6:59 PM, Andrew Dunstan wrote: I haven't followed this too closely, but I did wonder several days ago why this wasn't being made an initdb-time decision. One problem I see with this is that it would make

Re: [HACKERS] Enabling Checksums

2012-11-14 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes: Regarding checksums, I can add an option for the initdb that the buildfarm script runs. We already run different tests for different encodings. Of course, constant expanding like this won't scale, so we need to pick the options we want to exrecise

Re: [HACKERS] Enabling Checksums

2012-11-14 Thread Andrew Dunstan
On 11/14/2012 03:06 PM, Tom Lane wrote: Andrew Dunstan and...@dunslane.net writes: Regarding checksums, I can add an option for the initdb that the buildfarm script runs. We already run different tests for different encodings. Of course, constant expanding like this won't scale, so we need to

Re: [HACKERS] Enabling Checksums

2012-11-14 Thread Jeff Davis
Hmm... what if we took this a step further and actually stored the checksums in a separate relation fork? That would make it pretty simple to support enabling/disabling checksums for particular relations. It would also allow us to have a wider checksum, like 32 or 64 bits rather than 16.

Re: [HACKERS] Enabling Checksums

2012-11-14 Thread Jeff Davis
On Tue, 2012-11-13 at 15:27 -0500, Robert Haas wrote: A small patch that gets committed is better than a big one that doesn't. Here's a small patch (two, actually, because the TLI one is uninteresting and noisy). It's based on Simon's patch, but with some significant changes: * I ripped out

Re: [HACKERS] Enabling Checksums

2012-11-14 Thread Robert Haas
On Wed, Nov 14, 2012 at 6:24 PM, Jeff Davis pg...@j-davis.com wrote: Hmm... what if we took this a step further and actually stored the checksums in a separate relation fork? That would make it pretty simple to support enabling/disabling checksums for particular relations. It would also

Re: [HACKERS] Enabling Checksums

2012-11-13 Thread Markus Wanner
On 11/13/2012 01:22 AM, Greg Smith wrote: Once you accept that eventually there need to be online conversion tools, there needs to be some easy way to distinguish which pages have been processed for several potential implementations. Agreed. What I'm saying is that this identification doesn't

Re: [HACKERS] Enabling Checksums

2012-11-13 Thread Robert Haas
On Sun, Nov 11, 2012 at 5:52 PM, Jeff Davis pg...@j-davis.com wrote: Per-database does sound easier than per-table. I'd have to think about how that would affect shared catalogs though. For now, I'm leaning toward an offline utility to turn checksums on or off, called pg_checksums. It could

Re: [HACKERS] Enabling Checksums

2012-11-13 Thread Robert Haas
On Mon, Nov 12, 2012 at 4:44 AM, Craig Ringer cr...@2ndquadrant.com wrote: That'll make it hard for VACUUM, hint-bit setting, etc to opportunistically checksum pages whenever they're doing a page write anyway. Is it absurd to suggest using another bitmap, like the FSM or visibility map, to

Re: [HACKERS] Enabling Checksums

2012-11-13 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes: Hmm... what if we took this a step further and actually stored the checksums in a separate relation fork? That would make it pretty simple to support enabling/disabling checksums for particular relations. It would also allow us to have a wider

Re: [HACKERS] Enabling Checksums

2012-11-12 Thread Markus Wanner
Jeff, On 11/10/2012 12:08 AM, Jeff Davis wrote: The bit indicating that a checksum is present may be lost due to corruption. Hm.. I see. Sorry if that has been discussed before, but can't we do without that bit at all? It adds a checksum switch to each page, where we just agreed we don't

Re: [HACKERS] Enabling Checksums

2012-11-12 Thread Markus Wanner
On 11/12/2012 05:55 AM, Greg Smith wrote: Adding an initdb option to start out with everything checksummed seems an uncontroversial good first thing to have available. +1 So the following discussion really is for a future patch extending on that initial checkpoint support. One of the really

Re: [HACKERS] Enabling Checksums

2012-11-12 Thread Craig Ringer
On 11/12/2012 04:44 PM, Markus Wanner wrote: Jeff, On 11/10/2012 12:08 AM, Jeff Davis wrote: The bit indicating that a checksum is present may be lost due to corruption. Hm.. I see. Sorry if that has been discussed before, but can't we do without that bit at all? It adds a checksum switch

Re: [HACKERS] Enabling Checksums

2012-11-12 Thread Markus Wanner
On 11/12/2012 10:44 AM, Craig Ringer wrote: That'll make it hard for VACUUM, hint-bit setting, etc to opportunistically checksum pages whenever they're doing a page write anyway. It *is* a hard problem, yes. And the single bit doesn't really solve it. So I'm arguing against opportunistically

Re: [HACKERS] Enabling Checksums

2012-11-12 Thread Alvaro Herrera
Greg Smith wrote: On 11/11/12 2:56 PM, Jeff Davis wrote: We could have a separate utility, pg_checksums, that can alter the state and/or do an offline verification. And initdb would take an option that would start everything out fully protected with checksums. Adding an initdb option to

Re: [HACKERS] Enabling Checksums

2012-11-12 Thread Jeff Davis
On Mon, 2012-11-12 at 09:44 +0100, Markus Wanner wrote: Can we simply write a progress indicator to pg_control or someplace saying that all pages up to X of relation Y are supposed to have valid checksums? pg_control would not be the right place for that structure. It's intended to be

Re: [HACKERS] Enabling Checksums

2012-11-12 Thread Jeff Davis
On Sun, 2012-11-11 at 23:55 -0500, Greg Smith wrote: Adding an initdb option to start out with everything checksummed seems an uncontroversial good first thing to have available. OK, so here's my proposal for a first patch (changes from Simon's patch): * Add a flag to the postgres

Re: [HACKERS] Enabling Checksums

2012-11-12 Thread Markus Wanner
Jeff, On 11/12/2012 06:52 PM, Jeff Davis wrote: OK, so here's my proposal for a first patch (changes from Simon's patch): * Add a flag to the postgres executable indicating that it should use checksums on everything. This would only be valid if bootstrap mode is also specified. * Add

Re: [HACKERS] Enabling Checksums

2012-11-12 Thread Greg Smith
On 11/12/12 4:44 AM, Craig Ringer wrote: Is it absurd to suggest using another bitmap, like the FSM or visibility map, to store information on page checksumming while checksumming is enabled but incomplete? I spent some time thinking about that last week. One problem with it is that the

Re: [HACKERS] Enabling Checksums

2012-11-12 Thread Jeff Davis
On Mon, 2012-11-12 at 20:44 +0100, Markus Wanner wrote: As described before in this thread, I think we might be able to do without the has checksum-bit, as yet another simplification. But I don't object to adding it, either. I see. For a first patch, I guess that's OK. Might as well make it as

Re: [HACKERS] Enabling Checksums

2012-11-12 Thread Greg Smith
On 11/12/12 3:44 AM, Markus Wanner wrote: Sorry if that has been discussed before, but can't we do without that bit at all? It adds a checksum switch to each page, where we just agreed we don't event want a per-database switch. Once you accept that eventually there need to be online conversion

Re: [HACKERS] Enabling Checksums

2012-11-12 Thread Josh Berkus
Jeff, OK, so here's my proposal for a first patch (changes from Simon's patch): * Add a flag to the postgres executable indicating that it should use checksums on everything. This would only be valid if bootstrap mode is also specified. * Add a multi-state checksums flag in

Re: [HACKERS] Enabling Checksums

2012-11-11 Thread Jeff Davis
On Sat, 2012-11-10 at 14:46 +0100, Florian Pflug wrote: The bit indicating that a checksum is present may be lost due to corruption. Though that concern mostly goes away if instead of a separate bit we use a special checksum value, say 0xDEAD, to indicate that the page isn't checksummed,

Re: [HACKERS] Enabling Checksums

2012-11-11 Thread Jeff Davis
On Fri, 2012-11-09 at 09:57 -0800, Josh Berkus wrote: Huh? Why would a GUC not make sense? How else would you make sure that checksums where on when you started the system? If we stored the information in pg_control, you could check with pg_controldata. We could have a separate utility,

Re: [HACKERS] Enabling Checksums

2012-11-11 Thread Pavel Stehule
Hello Does it make sense to store this information in pg_control? That doesn't require adding any new file, and it has the benefit that it's already checksummed. It's available during recovery and can be made available pretty easily in the places where we write data. And the next

Re: [HACKERS] Enabling Checksums

2012-11-11 Thread Jeff Davis
On Sun, 2012-11-11 at 21:20 +0100, Pavel Stehule wrote: I don't think so GUC are good for this purpouse, but I don't like single purpouse statements too. what do you think about enhancing ALTER DATABASE statement some like ALTER DATABASE name ENABLE CHECKSUMS and ALTER DATABASE name

Re: [HACKERS] Enabling Checksums

2012-11-11 Thread Andrew Dunstan
On 11/11/2012 05:52 PM, Jeff Davis wrote: On Sun, 2012-11-11 at 21:20 +0100, Pavel Stehule wrote: I don't think so GUC are good for this purpouse, but I don't like single purpouse statements too. what do you think about enhancing ALTER DATABASE statement some like ALTER DATABASE name ENABLE

Re: [HACKERS] Enabling Checksums

2012-11-11 Thread Greg Smith
On 11/11/12 2:56 PM, Jeff Davis wrote: We could have a separate utility, pg_checksums, that can alter the state and/or do an offline verification. And initdb would take an option that would start everything out fully protected with checksums. Adding an initdb option to start out with

Re: [HACKERS] Enabling Checksums

2012-11-11 Thread Jesper Krogh
On 12/11/12 05:55, Greg Smith wrote: The only guarantee I see that we can give for online upgrades is that after a VACUUM CHECKSUM sweep is done, and every page is known to both have a valid checksum on it and have its checksum bits set, *then* any page that doesn't have both set bits and a

Re: [HACKERS] Enabling Checksums

2012-11-11 Thread Greg Smith
On 11/12/12 12:55 AM, Jesper Krogh wrote: I'd just like some rough guard against hardware/OS related data corruption. and that is more likely to hit data-blocks constantly flying in and out of the system. I get that. I think that some of the design ideas floating around since this feature

Re: [HACKERS] Enabling Checksums

2012-11-10 Thread Florian Pflug
On Nov10, 2012, at 00:08 , Jeff Davis pg...@j-davis.com wrote: On Fri, 2012-11-09 at 20:48 +0100, Markus Wanner wrote: Given your description of option 2 I was under the impression that each page already has a bit indicating whether or not the page is protected by a checksum. Why do you need

Re: [HACKERS] Enabling Checksums

2012-11-09 Thread Markus Wanner
Jeff, On 11/09/2012 02:01 AM, Jeff Davis wrote: For the sake of simplicity (implementation as well as usability), it seems like there is agreement that checksums should be enabled or disabled for the entire instance, not per-table. Agreed. I've quickly thought about making it a per-database

Re: [HACKERS] Enabling Checksums

2012-11-09 Thread Markus Wanner
On 11/09/2012 06:18 AM, Jesper Krogh wrote: I would definately stuff our system in state = 2 in your description if it was available. Hm.. that's an interesting statement. What's probably worst when switching from OFF to ON is the VACUUM run that needs to touch every page (provided you haven't

Re: [HACKERS] Enabling Checksums

2012-11-09 Thread Josh Berkus
Jeff, I don't think a GUC entirely makes sense (in its current form, anyway). We basically care about 3 states: Huh? Why would a GUC not make sense? How else would you make sure that checksums where on when you started the system? 1. Off: checksums are not written, nor are they verified.

Re: [HACKERS] Enabling Checksums

2012-11-09 Thread Jeff Davis
On Fri, 2012-11-09 at 15:42 +0100, Markus Wanner wrote: On 11/09/2012 06:18 AM, Jesper Krogh wrote: I would definately stuff our system in state = 2 in your description if it was available. Hm.. that's an interesting statement. What's probably worst when switching from OFF to ON is the

Re: [HACKERS] Enabling Checksums

2012-11-09 Thread Jeff Davis
On Thu, 2012-11-08 at 23:33 -0300, Alvaro Herrera wrote: There's no such thing as a system-wide VACUUM. The most you can get is a database-wide VACUUM, which means you'd have to store the state per-database somewhere (presumably the pg_database catalog), and perhaps pg_control could have it

Re: [HACKERS] Enabling Checksums

2012-11-09 Thread Markus Wanner
On 11/09/2012 07:53 PM, Jeff Davis wrote: One problem is telling which pages are protected and which aren't. We can have a couple bits in the header indicating that a checksum is present, but it's a little disappointing to have only a few bits protecting a 16-bit checksum. Given your

Re: [HACKERS] Enabling Checksums

2012-11-09 Thread Jeff Davis
On Fri, 2012-11-09 at 20:48 +0100, Markus Wanner wrote: Given your description of option 2 I was under the impression that each page already has a bit indicating whether or not the page is protected by a checksum. Why do you need more bits than that? The bit indicating that a checksum is

[HACKERS] Enabling Checksums

2012-11-08 Thread Jeff Davis
As I understand it, the main part of the remaining work to be done for the checksums patch (at least the first commit) is to have a better way to enable/disable them. For the sake of simplicity (implementation as well as usability), it seems like there is agreement that checksums should be

Re: [HACKERS] Enabling Checksums

2012-11-08 Thread Alvaro Herrera
Jeff Davis wrote: And the next question is what commands to add to change state. Ideas: CHECKSUMS ENABLE; -- set state to Enabling CHECKSUMS DISABLE; -- set state to Off And then to get to the On state, you have to run a system-wide VACUUM while in the Enabling state. Or, if the

Re: [HACKERS] Enabling Checksums

2012-11-08 Thread Amit Kapila
On Friday, November 09, 2012 6:32 AM Jeff Davis wrote: As I understand it, the main part of the remaining work to be done for the checksums patch (at least the first commit) is to have a better way to enable/disable them. For the sake of simplicity (implementation as well as usability), it

Re: [HACKERS] Enabling Checksums

2012-11-08 Thread Jesper Krogh
On 09/11/12 02:01, Jeff Davis wrote: As I understand it, the main part of the remaining work to be done for the checksums patch (at least the first commit) is to have a better way to enable/disable them. For the sake of simplicity (implementation as well as usability), it seems like there is

<    1   2   3   4