Re: [HACKERS] Block-level CRC checks

2009-12-07 Thread Greg Stark
On Fri, Dec 4, 2009 at 10:47 PM, Chuck McDevitt cmcdev...@greenplum.com wrote: A curiosity question regarding torn pages:  How does this work on file systems that don't write in-place, but instead always do copy-on-write? My example would be Sun's ZFS file system (In Solaris BSD).  Because

Re: [HACKERS] Block-level CRC checks

2009-12-05 Thread Greg Stark
It can save space because the line pointers have less alignment requirements. But I don't see any point in the current state. -- Greg On 2009-12-04, at 3:48 PM, Tom Lane t...@sss.pgh.pa.us wrote: Greg Stark gsst...@mit.edu writes: I'm not sure why I said including ctid. We would have to

Re: [HACKERS] Block-level CRC checks

2009-12-04 Thread decibel
On Dec 3, 2009, at 1:53 PM, Jonah H. Harris wrote: On Tue, Dec 1, 2009 at 1:27 PM, Joshua D. Drake j...@commandprompt.com wrote: On Tue, 2009-12-01 at 13:20 -0500, Robert Haas wrote: Does $COMPETITOR offer this feature? My understanding is that MSSQL does. I am not sure about Oracle. Those

Re: [HACKERS] Block-level CRC checks

2009-12-04 Thread Simon Riggs
On Fri, 2009-12-04 at 03:32 -0600, decibel wrote: So... now that the upgrade discussion seems to have died down... was any consensus reached on how to do said checksumming? Possibly. Please can you go through the discussion and pull out a balanced summary of how to proceed? I lost track a

Re: [HACKERS] Block-level CRC checks

2009-12-04 Thread Massa, Harald Armin
Kevin, md5sum of each tuple?  As an optional system column (a la oid)? I am mainly an application programmer working with PostgreSQL. And I want to point out an additional usefullness of an md5sum of each tuple: it makes comparing table-contents in replicated / related databases MUCH more

Re: [HACKERS] Block-level CRC checks

2009-12-04 Thread Greg Stark
On Fri, Dec 4, 2009 at 9:34 AM, Simon Riggs si...@2ndquadrant.com wrote: Possibly. Please can you go through the discussion and pull out a balanced summary of how to proceed? I lost track a while back and I'm sure many others did also. I summarized the three feasible plans I think I saw;

Re: [HACKERS] Block-level CRC checks

2009-12-04 Thread Bruce Momjian
decibel wrote: On Dec 3, 2009, at 1:53 PM, Jonah H. Harris wrote: On Tue, Dec 1, 2009 at 1:27 PM, Joshua D. Drake j...@commandprompt.com wrote: On Tue, 2009-12-01 at 13:20 -0500, Robert Haas wrote: Does $COMPETITOR offer this feature? My understanding is that MSSQL does. I am

Re: [HACKERS] Block-level CRC checks

2009-12-04 Thread Simon Riggs
On Fri, 2009-12-04 at 07:12 -0500, Bruce Momjian wrote: I think the hint bit has to be added to the item pointer, by using the offset bits that are already zero, according to Greg Stark. That solution leads to easy programming, no expanding hint bit array, and it is backward compatible so

Re: [HACKERS] Block-level CRC checks

2009-12-04 Thread Alvaro Herrera
BTW with VACUUM FULL removed I assume we're going to get rid of HEAP_MOVED_IN and HEAP_MOVED_OFF too, right? -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing

Re: [HACKERS] Block-level CRC checks

2009-12-04 Thread Bruce Momjian
Simon Riggs wrote: On Fri, 2009-12-04 at 07:12 -0500, Bruce Momjian wrote: I think the hint bit has to be added to the item pointer, by using the offset bits that are already zero, according to Greg Stark. That solution leads to easy programming, no expanding hint bit array, and it is

Re: [HACKERS] Block-level CRC checks

2009-12-04 Thread Simon Riggs
On Fri, 2009-12-04 at 07:54 -0500, Bruce Momjian wrote: I should also point out that removing 4 bits from the tuple header would allow us to get rid of t_infomask2, reducing tuple length by a further 2 bytes. Wow, that is a nice win. Does alignment allow us to actually use that space?

Re: [HACKERS] Block-level CRC checks

2009-12-04 Thread Simon Riggs
On Fri, 2009-12-04 at 09:52 -0300, Alvaro Herrera wrote: BTW with VACUUM FULL removed I assume we're going to get rid of HEAP_MOVED_IN and HEAP_MOVED_OFF too, right? Much as I would like to see those go, no. VF code should remain for some time yet, IMHO. We could remove it, but doing so is not

Re: [HACKERS] Block-level CRC checks

2009-12-04 Thread Heikki Linnakangas
Simon Riggs wrote: On Fri, 2009-12-04 at 09:52 -0300, Alvaro Herrera wrote: BTW with VACUUM FULL removed I assume we're going to get rid of HEAP_MOVED_IN and HEAP_MOVED_OFF too, right? Much as I would like to see those go, no. VF code should remain for some time yet, IMHO. I don't think

Re: [HACKERS] Block-level CRC checks

2009-12-04 Thread Greg Stark
On Fri, Dec 4, 2009 at 12:57 PM, Simon Riggs si...@2ndquadrant.com wrote: On Fri, 2009-12-04 at 07:54 -0500, Bruce Momjian wrote: I should also point out that removing 4 bits from the tuple header would allow us to get rid of t_infomask2, reducing tuple length by a further 2 bytes. Wow,

Re: [HACKERS] Block-level CRC checks

2009-12-04 Thread Greg Stark
On Fri, Dec 4, 2009 at 1:35 PM, Greg Stark gsst...@mit.edu wrote: If we lose vacuum full then the table's open for reducing the width of command id too if we need more bits.  If we do that and we moved everything we could to the line pointers including ctid we might just be able to squeeze the

Re: [HACKERS] Block-level CRC checks

2009-12-04 Thread Alvaro Herrera
Greg Stark escribió: On Fri, Dec 4, 2009 at 1:35 PM, Greg Stark gsst...@mit.edu wrote: If we lose vacuum full then the table's open for reducing the width of command id too if we need more bits.  If we do that and we moved everything we could to the line pointers including ctid we might

Re: [HACKERS] Block-level CRC checks

2009-12-04 Thread Alvaro Herrera
Heikki Linnakangas escribió: Simon Riggs wrote: On Fri, 2009-12-04 at 09:52 -0300, Alvaro Herrera wrote: BTW with VACUUM FULL removed I assume we're going to get rid of HEAP_MOVED_IN and HEAP_MOVED_OFF too, right? Much as I would like to see those go, no. VF code should remain for

Re: [HACKERS] Block-level CRC checks

2009-12-04 Thread Robert Haas
On Fri, Dec 4, 2009 at 9:48 AM, Alvaro Herrera alvhe...@commandprompt.com wrote: Heikki Linnakangas escribió: Simon Riggs wrote: On Fri, 2009-12-04 at 09:52 -0300, Alvaro Herrera wrote: BTW with VACUUM FULL removed I assume we're going to get rid of HEAP_MOVED_IN and HEAP_MOVED_OFF too,

Re: [HACKERS] Block-level CRC checks

2009-12-04 Thread Tom Lane
Simon Riggs si...@2ndquadrant.com writes: As I pointed out here http://archives.postgresql.org/pgsql-hackers/2009-12/msg00056.php we only need to use 3 bits not 4, but it does limit tuple length to 4096 for all block sizes. (Two different options there for doing that). Limiting the tuple

Re: [HACKERS] Block-level CRC checks

2009-12-04 Thread Tom Lane
Greg Stark gsst...@mit.edu writes: I'm not sure why I said including ctid. We would have to move everything transactional to the line pointer, including xmin, xmax, ctid, all the hint bits, the updated flags, hot flags, etc. The only things left in the tuple header would be things that have to

Re: [HACKERS] Block-level CRC checks

2009-12-04 Thread Simon Riggs
On Fri, 2009-12-04 at 10:43 -0500, Tom Lane wrote: Simon Riggs si...@2ndquadrant.com writes: As I pointed out here http://archives.postgresql.org/pgsql-hackers/2009-12/msg00056.php we only need to use 3 bits not 4, but it does limit tuple length to 4096 for all block sizes. (Two different

Re: [HACKERS] Block-level CRC checks

2009-12-04 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes: Have we thought about what other things have changed between 8.4 and 8.5 that might cause problems for in-place upgrade? So far, nothing. We even made Andrew Gierth jump through hoops to keep hstore's on-disk representation upwards compatible.

Re: [HACKERS] Block-level CRC checks

2009-12-04 Thread Simon Riggs
On Fri, 2009-12-04 at 13:35 +, Greg Stark wrote: I don't think getting rid of infomask2 wins us 2 bytes so fast. The rest of those two bytes is natts which of course we still need. err, yes, OK. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list

Re: [HACKERS] Block-level CRC checks

2009-12-04 Thread Bruce Momjian
Robert Haas wrote: On Fri, Dec 4, 2009 at 9:48 AM, Alvaro Herrera alvhe...@commandprompt.com wrote: Heikki Linnakangas escribi?: Simon Riggs wrote: On Fri, 2009-12-04 at 09:52 -0300, Alvaro Herrera wrote: BTW with VACUUM FULL removed I assume we're going to get rid of

Re: [HACKERS] Block-level CRC checks

2009-12-04 Thread Robert Haas
On Fri, Dec 4, 2009 at 2:04 PM, Bruce Momjian br...@momjian.us wrote: Robert Haas wrote: On Fri, Dec 4, 2009 at 9:48 AM, Alvaro Herrera alvhe...@commandprompt.com wrote: Heikki Linnakangas escribi?: Simon Riggs wrote: On Fri, 2009-12-04 at 09:52 -0300, Alvaro Herrera wrote: BTW

Re: [HACKERS] Block-level CRC checks

2009-12-04 Thread Alvaro Herrera
Massa, Harald Armin wrote: I am in the process of adding a user-space myhash column to all my applications tables, filled by a trigger on insert / update. It really speeds up table comparison across databases; and it is very helpfull in debugging replications. Have you seen pg_comparator?

Re: [HACKERS] Block-level CRC checks

2009-12-04 Thread Bruce Momjian
Robert Haas wrote: Well, I am not sure how we would turn the _space_ used for CRC on and off because you would have to rewrite the entire table/database to turn it on, which seems unfortunate. Well, presumably you're going to have to do some of that work anyway, because even if the space

Re: [HACKERS] Block-level CRC checks

2009-12-04 Thread Chuck McDevitt
A curiosity question regarding torn pages: How does this work on file systems that don't write in-place, but instead always do copy-on-write? My example would be Sun's ZFS file system (In Solaris BSD). Because of its snapshot rollback functionality, it never writes a page in-place, but

Re: [HACKERS] Block-level CRC checks

2009-12-04 Thread Simon Riggs
On Fri, 2009-12-04 at 14:47 -0800, Chuck McDevitt wrote: A curiosity question regarding torn pages: How does this work on file systems that don't write in-place, but instead always do copy-on-write? My example would be Sun's ZFS file system (In Solaris BSD). Because of its snapshot

Re: [HACKERS] Block-level CRC checks

2009-12-04 Thread Massa, Harald Armin
I am in the process of adding a user-space myhash column to all my applications tables, filled by a trigger on insert / update. It really speeds up table comparison across databases; and it is very helpfull in debugging replications. Have you seen pg_comparator? yes, saw the lightning talk

Re: [HACKERS] Block-level CRC checks

2009-12-03 Thread Jonah H. Harris
On Tue, Dec 1, 2009 at 1:27 PM, Joshua D. Drake j...@commandprompt.comwrote: On Tue, 2009-12-01 at 13:20 -0500, Robert Haas wrote: Does $COMPETITOR offer this feature? My understanding is that MSSQL does. I am not sure about Oracle. Those are the only two I run into (I don't run into

Re: [HACKERS] Block-level CRC checks

2009-12-02 Thread Peter Eisentraut
On tis, 2009-12-01 at 19:41 +, Greg Stark wrote: Also, it would require reading back each page as it's written to disk, which is OK for a bunch of single-row writes, but for bulk data loads a significant problem. Not sure what that really means for Postgres. It would just mean

Re: [HACKERS] Block-level CRC checks

2009-12-02 Thread Peter Eisentraut
On tis, 2009-12-01 at 17:47 -0500, Tom Lane wrote: Bruce Momjian br...@momjian.us writes: I also like the idea that we don't need to CRC check the line pointers because any corruption there is going to appear immediately. However, the bad news is that we wouldn't find the corruption until

Re: [HACKERS] Block-level CRC checks

2009-12-01 Thread Heikki Linnakangas
Simon Riggs wrote: There is no creation of corruption events. This scheme detects corruption events that *have* occurred. Now I understand that we previously would have recovered seamlessly from such events, but they were corruption events nonetheless and I think they need to be reported.

Re: [HACKERS] Block-level CRC checks

2009-12-01 Thread Simon Riggs
On Tue, 2009-12-01 at 10:04 +0200, Heikki Linnakangas wrote: Simon Riggs wrote: There is no creation of corruption events. This scheme detects corruption events that *have* occurred. Now I understand that we previously would have recovered seamlessly from such events, but they were

Re: [HACKERS] Block-level CRC checks

2009-12-01 Thread Bruce Momjian
Simon Riggs wrote: The way we handle torn page corruptions *hides* actual corruptions from us. The frequency of true positives and false positives is important here. If the false positive ratio is very small, then reporting them is not a problem because of the benefit we get from having

Re: [HACKERS] Block-level CRC checks

2009-12-01 Thread Simon Riggs
On Tue, 2009-12-01 at 06:35 -0500, Bruce Momjian wrote: Simon Riggs wrote: The way we handle torn page corruptions *hides* actual corruptions from us. The frequency of true positives and false positives is important here. If the false positive ratio is very small, then reporting them is

Re: [HACKERS] Block-level CRC checks

2009-12-01 Thread Bruce Momjian
Simon Riggs wrote: I think the problem is that the existing proposal can't distinguish between these two cases so the user has no idea how to respond to the report. If 99.5% of cases are real corruption then there is little need to distinguish between the cases, nor much value in doing

Re: [HACKERS] Block-level CRC checks

2009-12-01 Thread Robert Haas
On Mon, Nov 30, 2009 at 3:27 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: Simon Riggs wrote: Proposal * We reserve enough space on a disk block for a CRC check. When a dirty block is written to disk we calculate and annotate the CRC value, though this is *not* WAL

Re: [HACKERS] Block-level CRC checks

2009-12-01 Thread Bruce Momjian
Robert Haas wrote: On Mon, Nov 30, 2009 at 3:27 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: Simon Riggs wrote: Proposal * We reserve enough space on a disk block for a CRC check. When a dirty block is written to disk we calculate and annotate the CRC value, though

Re: [HACKERS] Block-level CRC checks

2009-12-01 Thread Simon Riggs
On Tue, 2009-12-01 at 07:05 -0500, Bruce Momjian wrote: I assume torn pages are 99% of the reported problem, which are expected and are fixed, and bad hardware 1%, quite the opposite of your numbers above. On what basis do you make that assumption? -- Simon Riggs

Re: [HACKERS] Block-level CRC checks

2009-12-01 Thread Bruce Momjian
Simon Riggs wrote: On Tue, 2009-12-01 at 07:05 -0500, Bruce Momjian wrote: I assume torn pages are 99% of the reported problem, which are expected and are fixed, and bad hardware 1%, quite the opposite of your numbers above. On what basis do you make that assumption? Because we added

Re: [HACKERS] Block-level CRC checks

2009-12-01 Thread Bruce Momjian
bruce wrote: What might be interesting is to report CRC mismatches if the database was shut down cleanly previously; I think in those cases we shouldn't have torn pages. Sorry, stupid idea on my part. We don't WAL log hit bit changes so there is no guarantee the page is in WAL on recovery.

Re: [HACKERS] Block-level CRC checks

2009-12-01 Thread Simon Riggs
On Tue, 2009-12-01 at 07:58 -0500, Bruce Momjian wrote: bruce wrote: What might be interesting is to report CRC mismatches if the database was shut down cleanly previously; I think in those cases we shouldn't have torn pages. Sorry, stupid idea on my part. We don't WAL log hit bit

Re: [HACKERS] Block-level CRC checks

2009-12-01 Thread Simon Riggs
On Tue, 2009-12-01 at 07:42 -0500, Bruce Momjian wrote: Simon Riggs wrote: On Tue, 2009-12-01 at 07:05 -0500, Bruce Momjian wrote: I assume torn pages are 99% of the reported problem, which are expected and are fixed, and bad hardware 1%, quite the opposite of your numbers above.

Re: [HACKERS] Block-level CRC checks

2009-12-01 Thread Heikki Linnakangas
Bruce Momjian wrote: What might be interesting is to report CRC mismatches if the database was shut down cleanly previously; I think in those cases we shouldn't have torn pages. Unfortunately that's not true. You can crash, leading to a torn page, and then start up the database and shut it

Re: [HACKERS] Block-level CRC checks

2009-12-01 Thread marcin mank
On Mon, Nov 30, 2009 at 9:27 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: Simon Riggs wrote: Proposal * We reserve enough space on a disk block for a CRC check. When a dirty block is written to disk we calculate and annotate the CRC value, though this is *not* WAL

Re: [HACKERS] Block-level CRC checks

2009-12-01 Thread Andres Freund
On Tuesday 01 December 2009 14:38:26 marcin mank wrote: On Mon, Nov 30, 2009 at 9:27 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: Simon Riggs wrote: Proposal * We reserve enough space on a disk block for a CRC check. When a dirty block is written to disk we

Re: [HACKERS] Block-level CRC checks

2009-12-01 Thread Aidan Van Dyk
* Andres Freund and...@anarazel.de [091201 08:42]: On Tuesday 01 December 2009 14:38:26 marcin mank wrote: On Mon, Nov 30, 2009 at 9:27 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: Simon Riggs wrote: Proposal * We reserve enough space on a disk block for a

Re: [HACKERS] Block-level CRC checks

2009-12-01 Thread Robert Haas
On Tue, Dec 1, 2009 at 8:30 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: Bruce Momjian wrote: What might be interesting is to report CRC mismatches if the database was shut down cleanly previously;  I think in those cases we shouldn't have torn pages. Unfortunately

Re: [HACKERS] Block-level CRC checks

2009-12-01 Thread Andres Freund
On Tuesday 01 December 2009 15:26:21 Aidan Van Dyk wrote: * Andres Freund and...@anarazel.de [091201 08:42]: On Tuesday 01 December 2009 14:38:26 marcin mank wrote: On Mon, Nov 30, 2009 at 9:27 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: Simon Riggs wrote:

Re: [HACKERS] Block-level CRC checks

2009-12-01 Thread Heikki Linnakangas
Robert Haas wrote: On Tue, Dec 1, 2009 at 8:30 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: Bruce Momjian wrote: What might be interesting is to report CRC mismatches if the database was shut down cleanly previously; I think in those cases we shouldn't have torn pages.

Re: [HACKERS] Block-level CRC checks

2009-12-01 Thread Simon Riggs
On Tue, 2009-12-01 at 15:30 +0200, Heikki Linnakangas wrote: Bruce Momjian wrote: What might be interesting is to report CRC mismatches if the database was shut down cleanly previously; I think in those cases we shouldn't have torn pages. Unfortunately that's not true. You can crash,

Re: [HACKERS] Block-level CRC checks

2009-12-01 Thread Robert Haas
On Tue, Dec 1, 2009 at 9:40 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: Even if rescanning every page in the cluster was feasible from a performance point-of-view, it would make the CRC checking a lot less useful. It's not hard to imagine that when a hardware glitch

Re: [HACKERS] Block-level CRC checks

2009-12-01 Thread Florian Weimer
* Simon Riggs: * Put all hint bits in the block header to allow them to be excluded more easily from CRC checking. If we used 3 more bits from ItemIdData.lp_len (limiting tuple length to 4096) then we could store some hints in the item pointer. HEAP_XMIN_INVALID can be stored as LP_DEAD,

Re: [HACKERS] Block-level CRC checks

2009-12-01 Thread Tom Lane
Florian Weimer fwei...@bfk.de writes: What about putting the whole visibility information out-of-line, into its own B-tree, indexed by page number? Hint bits need to be *cheap* to examine. Otherwise there's little point in having them at all. regards, tom lane --

Re: [HACKERS] Block-level CRC checks

2009-12-01 Thread Simon Riggs
On Tue, 2009-12-01 at 16:40 +0200, Heikki Linnakangas wrote: It's not hard to imagine that when a hardware glitch happens causing corruption, it also causes the system to crash. Recalculating the CRCs after crash would mask the corruption. They are already masked from us, so continuing to

Re: [HACKERS] Block-level CRC checks

2009-12-01 Thread Tom Lane
Simon Riggs si...@2ndquadrant.com writes: On Tue, 2009-12-01 at 16:40 +0200, Heikki Linnakangas wrote: It's not hard to imagine that when a hardware glitch happens causing corruption, it also causes the system to crash. Recalculating the CRCs after crash would mask the corruption. They are

Re: [HACKERS] Block-level CRC checks

2009-12-01 Thread Robert Haas
On Tue, Dec 1, 2009 at 10:35 AM, Simon Riggs si...@2ndquadrant.com wrote: On Tue, 2009-12-01 at 16:40 +0200, Heikki Linnakangas wrote: It's not hard to imagine that when a hardware glitch happens causing corruption, it also causes the system to crash. Recalculating the CRCs after crash would

Re: [HACKERS] Block-level CRC checks

2009-12-01 Thread Joshua D. Drake
On Tue, 2009-12-01 at 07:05 -0500, Bruce Momjian wrote: All useful detection mechanisms have non-zero false positives because we would rather sometimes ring the bell for no reason than to let bad things through silently, as we do now. OK, but what happens if someone gets the failure

Re: [HACKERS] Block-level CRC checks

2009-12-01 Thread Simon Riggs
On Tue, 2009-12-01 at 10:55 -0500, Tom Lane wrote: Simon Riggs si...@2ndquadrant.com writes: On Tue, 2009-12-01 at 16:40 +0200, Heikki Linnakangas wrote: It's not hard to imagine that when a hardware glitch happens causing corruption, it also causes the system to crash. Recalculating the

Re: [HACKERS] Block-level CRC checks

2009-12-01 Thread Joshua D. Drake
On Tue, 2009-12-01 at 10:55 -0500, Tom Lane wrote: Simon Riggs si...@2ndquadrant.com writes: On Tue, 2009-12-01 at 16:40 +0200, Heikki Linnakangas wrote: It's not hard to imagine that when a hardware glitch happens causing corruption, it also causes the system to crash. Recalculating the

Re: [HACKERS] Block-level CRC checks

2009-12-01 Thread Bruce Momjian
Simon Riggs wrote: Also, we might * Put all hint bits in the block header to allow them to be excluded more easily from CRC checking. If we used 3 more bits from ItemIdData.lp_len (limiting tuple length to 4096) then we could store some hints in the item pointer. HEAP_XMIN_INVALID can be

Re: [HACKERS] Block-level CRC checks

2009-12-01 Thread Robert Haas
On Tue, Dec 1, 2009 at 1:02 PM, Joshua D. Drake j...@commandprompt.com wrote: The hard core reality is this. *IF* it is one of the goals of this project to insure that the software can be safely, effectively, and responsibly operated in a manner that is acceptable to C* level people in a

Re: [HACKERS] Block-level CRC checks

2009-12-01 Thread Simon Riggs
On Tue, 2009-12-01 at 13:05 -0500, Bruce Momjian wrote: Simon Riggs wrote: Also, we might * Put all hint bits in the block header to allow them to be excluded more easily from CRC checking. If we used 3 more bits from ItemIdData.lp_len (limiting tuple length to 4096) then we could

Re: [HACKERS] Block-level CRC checks

2009-12-01 Thread Joshua D. Drake
On Tue, 2009-12-01 at 13:20 -0500, Robert Haas wrote: On Tue, Dec 1, 2009 at 1:02 PM, Joshua D. Drake j...@commandprompt.com wrote: The hard core reality is this. *IF* it is one of the goals of this project to insure that the software can be safely, effectively, and responsibly operated

Re: [HACKERS] Block-level CRC checks

2009-12-01 Thread Tom Lane
Bruce Momjian br...@momjian.us writes: OK, here is another idea, maybe crazy: When we read in a page that has an invalid CRC, we check the page to see which hint bits are _not_ set, and we try setting them to see if can get a matching CRC. If there no missing hint bits and the CRC doesn't

Re: [HACKERS] Block-level CRC checks

2009-12-01 Thread Tom Lane
Simon Riggs si...@2ndquadrant.com writes: On Tue, 2009-12-01 at 13:05 -0500, Bruce Momjian wrote: When we read in a page that has an invalid CRC, we check the page to see which hint bits are _not_ set, and we try setting them to see if can get a matching CRC. Perhaps we could store a

Re: [HACKERS] Block-level CRC checks

2009-12-01 Thread Aidan Van Dyk
* Tom Lane t...@sss.pgh.pa.us [091201 13:58]: Actually, the killer problem with *any* scheme involving guessing is that each bit you guess translates directly to removing one bit of confidence from the CRC value. If you try to guess at as many as 32 bits, it is practically guaranteed that

Re: [HACKERS] Block-level CRC checks

2009-12-01 Thread Greg Stark
On Tue, Dec 1, 2009 at 6:41 PM, Tom Lane t...@sss.pgh.pa.us wrote: Bruce Momjian br...@momjian.us writes: OK, here is another idea, maybe crazy: When we read in a page that has an invalid CRC, we check the page to see which hint bits are _not_ set, and we try setting them to see if can get a

Re: [HACKERS] Block-level CRC checks

2009-12-01 Thread Tom Lane
Greg Stark gsst...@mit.edu writes: Another thought is that would could use the MSSQL-style torn page detection of including a counter (or even a bit?) in every 512-byte chunk which gets incremented every time the page is written. I think we can dismiss that idea, or any idea involving a

Re: [HACKERS] Block-level CRC checks

2009-12-01 Thread Josh Berkus
All, I feel strongly that we should be verifying pages on write, or at least providing the option to do so, because hardware is simply not reliable. And a lot of our biggest users are having issues; it seems pretty much guarenteed that if you have more than 20 postgres servers, at least one of

Re: [HACKERS] Block-level CRC checks

2009-12-01 Thread Kevin Grittner
Josh Berkus j...@agliodbs.com wrote: And a lot of our biggest users are having issues; it seems pretty much guarenteed that if you have more than 20 postgres servers, at least one of them will have bad memory, bad RAID and/or a bad driver. Huh?!? We have about 200 clusters running on

Re: [HACKERS] Block-level CRC checks

2009-12-01 Thread Greg Stark
On Tue, Dec 1, 2009 at 7:19 PM, Josh Berkus j...@agliodbs.com wrote: However, that solution would not detect subtle corruption, like single-bit-flipping issues caused by quantum errors. Well there is a solution for this, ECC RAM. There's *no* software solution for it. The corruption can just as

Re: [HACKERS] Block-level CRC checks

2009-12-01 Thread Robert Haas
On Tue, Dec 1, 2009 at 2:06 PM, Aidan Van Dyk ai...@highrise.ca wrote: Well, *I* think if we're ever going to have really reliable in-place upgrades that we can expect to function release after release, we're going to need to be able to read in old version pages, and convert them to current

Re: [HACKERS] Block-level CRC checks

2009-12-01 Thread Greg Stark
On Tue, Dec 1, 2009 at 7:51 PM, Robert Haas robertmh...@gmail.com wrote: On Tue, Dec 1, 2009 at 2:06 PM, Aidan Van Dyk ai...@highrise.ca wrote: Well, *I* think if we're ever going to have really reliable in-place upgrades that we can expect to function release after release, we're going to

Re: [HACKERS] Block-level CRC checks

2009-12-01 Thread Greg Stark
On Tue, Dec 1, 2009 at 8:04 PM, Greg Stark gsst...@mit.edu wrote: And there was no point writing it for previously releases because there was **no** pg_migrator anyways. oops -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription:

Re: [HACKERS] Block-level CRC checks

2009-12-01 Thread Robert Haas
On Tue, Dec 1, 2009 at 3:04 PM, Greg Stark gsst...@mit.edu wrote: I find that hard to understand. I believe the consensus is that an on-demand page-level migration statregy like Aidan described is precisely the plan when it's necessary to handle page format changes. There were no page format

Re: [HACKERS] Block-level CRC checks

2009-12-01 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes: OK, fair enough. My implication that only page formats were at issue was off-base. My underlying point was that I think we have to be prepared to write code that can understand old binary formats (on the tuple, page, or relation level) if we want

Re: [HACKERS] Block-level CRC checks

2009-12-01 Thread Andrew Dunstan
Tom Lane wrote: Robert Haas robertmh...@gmail.com writes: OK, fair enough. My implication that only page formats were at issue was off-base. My underlying point was that I think we have to be prepared to write code that can understand old binary formats (on the tuple, page, or relation

Re: [HACKERS] Block-level CRC checks

2009-12-01 Thread Bruce Momjian
Andrew Dunstan wrote: Tom Lane wrote: Robert Haas robertmh...@gmail.com writes: OK, fair enough. My implication that only page formats were at issue was off-base. My underlying point was that I think we have to be prepared to write code that can understand old binary formats

Re: [HACKERS] Block-level CRC checks

2009-12-01 Thread Bruce Momjian
Tom Lane wrote: Bruce Momjian br...@momjian.us writes: OK, here is another idea, maybe crazy: When we read in a page that has an invalid CRC, we check the page to see which hint bits are _not_ set, and we try setting them to see if can get a matching CRC. If there no missing hint bits

Re: [HACKERS] Block-level CRC checks

2009-12-01 Thread Tom Lane
Bruce Momjian br...@momjian.us writes: OK, crazy idea #3. What if we had a per-page counter of the number of hint bits set --- that way, we would only consider a CRC check failure to be corruption if the count matched the hint bit count on the page. Seems like rather a large hole in the

Re: [HACKERS] Block-level CRC checks

2009-12-01 Thread Richard Huxton
Bruce Momjian wrote: Tom Lane wrote: The suggestions that were made upthread about moving the hint bits could resolve the second objection, but once you do that you might as well just exclude them from the CRC and eliminate the guessing. OK, crazy idea #3. What if we had a per-page

Page-level version upgrade (was: [HACKERS] Block-level CRC checks)

2009-12-01 Thread decibel
On Dec 1, 2009, at 12:58 PM, Tom Lane wrote: The bottom line here seems to be that the only practical way to do anything like this is to move the hint bits into their own area of the page, and then exclude them from the CRC. Are we prepared to once again blow off any hope of in-place update for

Re: [HACKERS] Block-level CRC checks

2009-12-01 Thread Bruce Momjian
Tom Lane wrote: Bruce Momjian br...@momjian.us writes: OK, crazy idea #3. What if we had a per-page counter of the number of hint bits set --- that way, we would only consider a CRC check failure to be corruption if the count matched the hint bit count on the page. Seems like rather a

Re: [HACKERS] Block-level CRC checks

2009-12-01 Thread Greg Stark
On Tue, Dec 1, 2009 at 9:57 PM, Richard Huxton d...@archonet.com wrote: Why are we writing out the hint bits to disk anyway? Is it really so slow to calculate them on read + cache them that it's worth all this trouble? Are they not also to blame for the write my import data twice feature? It

Re: [HACKERS] Block-level CRC checks

2009-12-01 Thread decibel
On Dec 1, 2009, at 1:39 PM, Kevin Grittner wrote: Josh Berkus j...@agliodbs.com wrote: And a lot of our biggest users are having issues; it seems pretty much guarenteed that if you have more than 20 postgres servers, at least one of them will have bad memory, bad RAID and/or a bad driver.

Re: [HACKERS] Block-level CRC checks

2009-12-01 Thread Bruce Momjian
Greg Stark wrote: On Tue, Dec 1, 2009 at 6:41 PM, Tom Lane t...@sss.pgh.pa.us wrote: Bruce Momjian br...@momjian.us writes: OK, here is another idea, maybe crazy: When we read in a page that has an invalid CRC, we check the page to see which hint bits are _not_ set, and we try setting

Re: [HACKERS] Block-level CRC checks

2009-12-01 Thread Richard Huxton
Greg Stark wrote: On Tue, Dec 1, 2009 at 9:57 PM, Richard Huxton d...@archonet.com wrote: Why are we writing out the hint bits to disk anyway? Is it really so slow to calculate them on read + cache them that it's worth all this trouble? Are they not also to blame for the write my import data

Re: [HACKERS] Block-level CRC checks

2009-12-01 Thread Tom Lane
Bruce Momjian br...@momjian.us writes: Greg Stark wrote: It should be relatively cheap to skip the hint bits in the line pointers since they'll be the same bits of every 16-bit value for a whole range. Alternatively we could just CRC the tuples and assume a corrupted line pointer will show

Re: [HACKERS] Block-level CRC checks

2009-12-01 Thread Bruce Momjian
Tom Lane wrote: Bruce Momjian br...@momjian.us writes: Greg Stark wrote: It should be relatively cheap to skip the hint bits in the line pointers since they'll be the same bits of every 16-bit value for a whole range. Alternatively we could just CRC the tuples and assume a corrupted

Re: [HACKERS] Block-level CRC checks

2009-12-01 Thread Tom Lane
Richard Huxton d...@archonet.com writes: So what is the cost of calculating the hint-bits for a whole block of tuples in one go vs reading that block from actual spinning disk? Potentially a couple of hundred times worse, if you're unlucky and each XID on the page requires visiting a different

Re: [HACKERS] Block-level CRC checks

2009-12-01 Thread Tom Lane
Bruce Momjian br...@momjian.us writes: Tom Lane wrote: I don't think relatively cheap is the right criterion here --- the question to me is how many assumptions are you making in order to compute the page's CRC. Each assumption degrades the reliability of the check, not to mention creating

Re: [HACKERS] Block-level CRC checks

2009-12-01 Thread Greg Stark
On Tue, Dec 1, 2009 at 10:47 PM, Tom Lane t...@sss.pgh.pa.us wrote: Bruce Momjian br...@momjian.us writes: Greg Stark wrote: It should be relatively cheap to skip the hint bits in the line pointers since they'll be the same bits of every 16-bit value for a whole range. Alternatively we could

Re: [HACKERS] Block-level CRC checks

2009-12-01 Thread decibel
On Dec 1, 2009, at 4:13 PM, Greg Stark wrote: On Tue, Dec 1, 2009 at 9:57 PM, Richard Huxton d...@archonet.com wrote: Why are we writing out the hint bits to disk anyway? Is it really so slow to calculate them on read + cache them that it's worth all this trouble? Are they not also to blame

Re: [HACKERS] Block-level CRC checks

2009-12-01 Thread Tom Lane
Greg Stark gsst...@mit.edu writes: On Tue, Dec 1, 2009 at 10:47 PM, Tom Lane t...@sss.pgh.pa.us wrote: I don't think relatively cheap is the right criterion here --- the question to me is how many assumptions are you making in order to compute the page's CRC.  Each assumption degrades the

Re: [HACKERS] Block-level CRC checks

2009-12-01 Thread Greg Stark
On Wed, Dec 2, 2009 at 12:03 AM, Tom Lane t...@sss.pgh.pa.us wrote: Greg Stark gsst...@mit.edu writes: On Tue, Dec 1, 2009 at 10:47 PM, Tom Lane t...@sss.pgh.pa.us wrote: I don't think relatively cheap is the right criterion here --- the question to me is how many assumptions are you making in

Re: [HACKERS] Block-level CRC checks

2009-12-01 Thread Aidan Van Dyk
* Greg Stark gsst...@mit.edu [091201 20:14]: I'm not sure we're on the same page. As I understand it there are three proposals on the table now: 1) set aside a section of the page to contain only non-checksummed hint bits. That section has to be relocatable so the crc check would have to

Re: [HACKERS] Block-level CRC checks

2009-11-30 Thread Simon Riggs
On Fri, 2008-10-17 at 12:26 -0300, Alvaro Herrera wrote: So this discussion died with no solution arising to the hint-bit-setting-invalidates-the-CRC problem. Apparently the only solution in sight is to WAL-log hint bits. Simon opines it would be horrible from a performance standpoint to

  1   2   3   4   >