Re: [HACKERS] [PATCHES] wal_checksum = on (default) | off
On Fri, 2007-01-05 at 22:57 -0500, Tom Lane wrote: Jim Nasby [EMAIL PROTECTED] writes: On Jan 5, 2007, at 6:30 AM, Zeugswetter Andreas ADI SD wrote: Ok, so when you need CRC's on a replicate (but not on the master) you Which sounds to me like a good reason to allow the option in recovery.conf as well... Actually, I'm not seeing the use-case for a slave having a different setting from the master at all? My backup server is less reliable than the primary. My backup server is more reliable than the primary. Somehow, neither of these statements seem likely to be uttered by a sane DBA ... If I take a backup of a server and bring it up on a new system, the blocks in the backup will not have been CRC checked before they go to disk. If I take the same server and now stream log records across to it, why *must* that data be CRC checked, when the original data has not been? I'm proposing choice, with a safe default. That's all. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] [PATCHES] wal_checksum = on (default) | off
Simon Riggs wrote: Somehow, neither of these statements seem likely to be uttered by a sane DBA ... If I take a backup of a server and bring it up on a new system, the blocks in the backup will not have been CRC checked before they go to disk. If I take the same server and now stream log records across to it, why *must* that data be CRC checked, when the original data has not been? I'm proposing choice, with a safe default. That's all. Are there performance numbers to justify the option? We don't give people options unless there is real value to it. -- Bruce Momjian [EMAIL PROTECTED] EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] [PATCHES] wal_checksum = on (default) | off
Recovery can occur with/without same setting of wal_checksum, to avoid complications from crashes immediately after turning GUC on. Surely not. Otherwise even the on setting is not really a defense. Only when the CRC is exactly zero, which happens very very rarely. It works most of the time doesn't exactly satisfy me. What's the Agreed use-case for changing the variable on the fly anyway? Seems a better solution is just to lock down the setting at postmaster start. I guess that the use case is more for a WAL based replicate, that has/wants a different setting. Maybe we want a WAL entry for the change, or force a log switch (so you can interrupt the replicate, change it's setting and proceed with the next log) ? Maybe a 3rd mode for replicates that ignores 0 CRC's ? Andreas ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] [PATCHES] wal_checksum = on (default) | off
On Fri, 2007-01-05 at 11:01 +0100, Zeugswetter Andreas ADI SD wrote: What's the use-case for changing the variable on the fly anyway? Seems a better solution is just to lock down the setting at postmaster start. I guess that the use case is more for a WAL based replicate, that has/wants a different setting. Maybe we want a WAL entry for the change, or force a log switch (so you can interrupt the replicate, change it's setting and proceed with the next log) ? Maybe a 3rd mode for replicates that ignores 0 CRC's ? Well, wal_checksum allows you to have this turned ON for the main server and OFF on a Warm Standby. The recovery process doesn't check for postgresql.conf reloads, so setting it at server start is effectively the same thing in that case. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] [PATCHES] wal_checksum = on (default) | off
What's the use-case for changing the variable on the fly anyway? Seems a better solution is just to lock down the setting at postmaster start. I guess that the use case is more for a WAL based replicate, that has/wants a different setting. Maybe we want a WAL entry for the change, or force a log switch (so you can interrupt the replicate, change it's setting and proceed with the next log) ? Maybe a 3rd mode for replicates that ignores 0 CRC's ? Well, wal_checksum allows you to have this turned ON for the main server and OFF on a Warm Standby. Ok, so when you need CRC's on a replicate (but not on the master) you turn it off during standby replay, but turn it on when you start the replicate for normal operation. Andreas ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] [PATCHES] wal_checksum = on (default) | off
Ok, so when you need CRC's on a replicate (but not on the master) you turn it off during standby replay, but turn it on when you start the replicate for normal operation. Thought: even when it's off, the CRC had better be computed for shutdown-checkpoint records. Else there's no way to turn it on even with a postmaster restart --- unless we accept the idea of poking a hole in the normal mode. (Which I still dislike, and even more so if the special value is zero. Almost any other value would be safer than zero.) On the whole, though, I still don't want to put this in. I don't think Simon has thought it through sufficiently, Well, the part that we do not really want a special value (at least not 0) is new, and makes things a bit more complicated. and we haven't even seen any demonstration of a big speedup. Yes, iirc the demonstration was with the 64 bit crc instead of the sufficient 32-bit (or a bad crc compiler optimization?). But I do think it can be shown to provide significant speedup (at least peak burst performance). Especially on target hardware WAL write IO is extremely fast (since it is write cached), so the CPU should show. Andreas ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] [PATCHES] wal_checksum = on (default) | off
On Jan 5, 2007, at 6:30 AM, Zeugswetter Andreas ADI SD wrote: Ok, so when you need CRC's on a replicate (but not on the master) you turn it off during standby replay, but turn it on when you start the replicate for normal operation. Which sounds to me like a good reason to allow the option in recovery.conf as well... -- Jim Nasby[EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] [PATCHES] wal_checksum = on (default) | off
Jim Nasby [EMAIL PROTECTED] writes: On Jan 5, 2007, at 6:30 AM, Zeugswetter Andreas ADI SD wrote: Ok, so when you need CRC's on a replicate (but not on the master) you Which sounds to me like a good reason to allow the option in recovery.conf as well... Actually, I'm not seeing the use-case for a slave having a different setting from the master at all? My backup server is less reliable than the primary. My backup server is more reliable than the primary. Somehow, neither of these statements seem likely to be uttered by a sane DBA ... regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] [PATCHES] wal_checksum = on (default) | off
Actually, I'm not seeing the use-case for a slave having a different setting from the master at all? My backup server is less reliable than the primary. My backup server is more reliable than the primary. Somehow, neither of these statements seem likely to be uttered by a sane DBA ... My backup server is actually my dev machine. My backup server is just a reporting machine. My backup machine is using SATA just because it is just an absolute emergency machine. My backups machine is also my web server. Real world dictates differently. Let's not forget that not every company can spend 100k on two identical machines, yet many companies can spend 50k + 5k for a backup machine based on Sata or secondary services. Sincerely, Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] [PATCHES] wal_checksum = on (default) | off
On Thu, 2007-01-04 at 11:09 -0500, Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: On Thu, 2007-01-04 at 10:00 -0500, Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: Recovery can occur with/without same setting of wal_checksum, to avoid complications from crashes immediately after turning GUC on. Surely not. Otherwise even the on setting is not really a defense. Only when the CRC is exactly zero, which happens very very rarely. It works most of the time doesn't exactly satisfy me. What's the use-case for changing the variable on the fly anyway? Seems a better solution is just to lock down the setting at postmaster start. That would prevent us from using the secondary checkpoint location, in the case of a crash effecting the primary checkpoint when it is a shutdown checkpoint where we changed the setting of wal_checksum. It seemed safer to allow a very rare error through to the next level of error checking rather than to close the door so tight that recovery would not be possible in a very rare case. If your're good with server start, so am I. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] [PATCHES] wal_checksum = on (default) | off
On Thu, 2007-01-04 at 17:58 +0100, Florian Weimer wrote: * Simon Riggs: Surely not. Otherwise even the on setting is not really a defense. Only when the CRC is exactly zero, which happens very very rarely. Have you tried switching to Adler32 instead of CRC32? No. Please explain further. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] [PATCHES] wal_checksum = on (default) | off
Florian Weimer [EMAIL PROTECTED] writes: Have you tried switching to Adler32 instead of CRC32? Is anything known about the error detection capabilities of Adler32? There's a lot of math behind CRCs but AFAIR Adler's method is pretty much ad-hoc. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] [PATCHES] wal_checksum = on (default) | off
Simon Riggs [EMAIL PROTECTED] writes: On Thu, 2007-01-04 at 11:09 -0500, Tom Lane wrote: It works most of the time doesn't exactly satisfy me. It seemed safer to allow a very rare error through to the next level of error checking rather than to close the door so tight that recovery would not be possible in a very rare case. If a DBA is turning checksums off at all, he's already bought into the assumption that he's prepared to recover from backups. What you don't seem to get here is that this feature is pretty darn questionable in the first place, and for it to have a side effect of poking a hole in the system's reliability even when it's off is more than enough to get it rejected outright. It's just a No Sale. I don't believe that the hole is real small, either, as overwrite-with-zeroes is not exactly an unheard-of failure mode for filesystems. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] [PATCHES] wal_checksum = on (default) | off
* Tom Lane: Florian Weimer [EMAIL PROTECTED] writes: Have you tried switching to Adler32 instead of CRC32? Is anything known about the error detection capabilities of Adler32? There's a lot of math behind CRCs but AFAIR Adler's method is pretty much ad-hoc. Correct me if I'm wrong, but the main reason for the WAL CRC is to detect partial WAL writes (due to improper caching, for instance). This means that you're out of the realm of traditional CRC analysis anyway, because the things you are guarding against are neither burts errors nor n-bit errors (for small n). ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] [PATCHES] wal_checksum = on (default) | off
Florian Weimer [EMAIL PROTECTED] writes: * Tom Lane: There's a lot of math behind CRCs but AFAIR Adler's method is pretty much ad-hoc. Correct me if I'm wrong, but the main reason for the WAL CRC is to detect partial WAL writes (due to improper caching, for instance). Well, that's *a* reason, but not the only one, and IMHO not one that gives any particular guidance on what kind of checksum to use. This means that you're out of the realm of traditional CRC analysis anyway, because the things you are guarding against are neither burts errors nor n-bit errors (for small n). I think short burst errors are fairly likely: the kind of scenario I'm worried about is a wild store corrupting a word of a WAL entry while it's waiting around to be written in the WAL buffers. So the CRC math does give me some comfort that that'll be detected. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] [PATCHES] wal_checksum = on (default) | off
Florian Weimer [EMAIL PROTECTED] writes: Ah, does this mean that each WAL entry gets its own checksum? Right. (I had assumed that PostgreSQLs WAL checksumming was justified by the partial write issue. The wild store could easily occur with a heap page, too, and AFAIK, tuples, aren't checksummed. Which would be an interesting option, I guess.) We've discussed it but there's never been a pressing reason to do it. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster