Re: [pgsql-patches] [HACKERS] [PATCHES] wal_checksum = on (default) | off
On Thu, Jan 11, 2007 at 11:10:38PM +, Simon Riggs wrote: > On Thu, 2007-01-11 at 17:06 +, Gregory Stark wrote: > > Having a CRC in WAL but not in the heap seems kind of pointless. > > Yes... > > > If your > > hardware is unreliable the corruption could anywhere. > > Agreed. I thought the point was that the WAL protects against unexpected power failure, that sort of thing. In that situation, the memory is the first to be corrupted, and an active DMA transfer will thus be corrupted also. We don't need to worry about the data, because the WAL is known to be accurate. The WAL does not protect against random data corruption, in normal operation it is never read. If we want to detect random corruption, we'd need checksum everywhere, yes. But that's not the goal here. Have a nice day, -- Martijn van Oosterhout http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to > litigate. signature.asc Description: Digital signature
Re: [pgsql-patches] [HACKERS] [PATCHES] wal_checksum = on (default) | off
On Thu, 2007-01-11 at 17:06 +, Gregory Stark wrote: > Having a CRC in WAL but not in the heap seems kind of pointless. Yes... > If your > hardware is unreliable the corruption could anywhere. Agreed. Other DBMS have one setting for the whole server; I've never seen separate settings for WAL and data. > Depending on it to solve > multiple problems means we can't offer the option to disable it > because it > would affect other things as well. > > What I would like to see is a CRC option that would put CRC checks in > every > disk page whether heap, index, WAL, control file, etc. I think we > would > default that to off to match our current setup most closely. > > Separately we would have a feature in WAL to detect torn pages so that > we can > reliably detect the end of valid WAL. That would have to always be on. > But > having it as a separate feature means the CRC could be optional. Your thoughts seem logical to me. It does seem a bigger project than I'd envisaged, but doable, one day. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [pgsql-patches] [HACKERS] [PATCHES] wal_checksum = on (default) | off
"Tom Lane" <[EMAIL PROTECTED]> writes: > Oh, sorry, had the wrong context in mind. I'm still not very impressed > with the idea --- a CRC check will catch many kinds of problems, whereas > this approach catches exactly one kind of problem. Well in fairness I tossed in a throwaway comment at the end of that email about heap pages. I'll do the same here since I can't resist. But the main thread here is about xlog really. It just seems to me like it's better to target each problem with a solution that addresses it directly than have one feature that we hope hope addresses them all more or less. Having a CRC in WAL but not in the heap seems kind of pointless. If your hardware is unreliable the corruption could anywhere. Depending on it to solve multiple problems means we can't offer the option to disable it because it would affect other things as well. What I would like to see is a CRC option that would put CRC checks in every disk page whether heap, index, WAL, control file, etc. I think we would default that to off to match our current setup most closely. Separately we would have a feature in WAL to detect torn pages so that we can reliably detect the end of valid WAL. That would have to always be on. But having it as a separate feature means the CRC could be optional. Also, incidentally like I mentioned in my previous email, we could do the torn page detection in heap pages too by handling it in the smgr using readv/writev. No copies, no corrupted datums. Essentially the tags would be inserted on the fly as the data was copied into kernel space. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [pgsql-patches] [HACKERS] [PATCHES] wal_checksum = on (default) | off
Gregory Stark <[EMAIL PROTECTED]> writes: > "Tom Lane" <[EMAIL PROTECTED]> writes: >> You understand wrong ... a tuple sitting on disk is normally read >> directly from the shared buffer, and I don't think we want to pay for >> copying it. > "xlog records" Oh, sorry, had the wrong context in mind. I'm still not very impressed with the idea --- a CRC check will catch many kinds of problems, whereas this approach catches exactly one kind of problem. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [pgsql-patches] [HACKERS] [PATCHES] wal_checksum = on (default) | off
"Tom Lane" <[EMAIL PROTECTED]> writes: > Gregory Stark <[EMAIL PROTECTED]> writes: >> "Tom Lane" <[EMAIL PROTECTED]> writes: >>> Pretty much not happening; or are you volunteering to fix every part of >>> the system to tolerate injections of inserted data anywhere in a stored >>> datum? > >> I was thinking to do it at a low level as the xlog records are prepared to be >> written to the filesystem and as the data is being read from disk. I haven't >> read that code yet to see where to inject it but I understand there's already >> a copy happening and it could be done there. > > You understand wrong ... a tuple sitting on disk is normally read > directly from the shared buffer, and I don't think we want to pay for > copying it. "xlog records" -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [pgsql-patches] [HACKERS] [PATCHES] wal_checksum = on (default) | off
Gregory Stark <[EMAIL PROTECTED]> writes: > "Tom Lane" <[EMAIL PROTECTED]> writes: >> Pretty much not happening; or are you volunteering to fix every part of >> the system to tolerate injections of inserted data anywhere in a stored >> datum? > I was thinking to do it at a low level as the xlog records are prepared to be > written to the filesystem and as the data is being read from disk. I haven't > read that code yet to see where to inject it but I understand there's already > a copy happening and it could be done there. You understand wrong ... a tuple sitting on disk is normally read directly from the shared buffer, and I don't think we want to pay for copying it. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [pgsql-patches] [HACKERS] [PATCHES] wal_checksum = on (default) | off
"Tom Lane" <[EMAIL PROTECTED]> writes: > Gregory Stark <[EMAIL PROTECTED]> writes: >> What did you think about protecting against torn writes using id numbers >> every >> 512 bytes. > > Pretty much not happening; or are you volunteering to fix every part of > the system to tolerate injections of inserted data anywhere in a stored > datum? I was thinking to do it at a low level as the xlog records are prepared to be written to the filesystem and as the data is being read from disk. I haven't read that code yet to see where to inject it but I understand there's already a copy happening and it could be done there. Even if we optimize out all the copies we could do it in the actual i/o call using readv/writev. I wasn't thinking of doing it on actual disk buffers since it doesn't help us avoid full page writes, but it could be done there too using readv/writev in the smgr. That might be useful for a full_page_writes=off system so even if it can't guarantee no corruption at least it can guarantee no silent corruption. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [pgsql-patches] [HACKERS] [PATCHES] wal_checksum = on (default) | off
Gregory Stark <[EMAIL PROTECTED]> writes: > What did you think about protecting against torn writes using id numbers every > 512 bytes. Pretty much not happening; or are you volunteering to fix every part of the system to tolerate injections of inserted data anywhere in a stored datum? regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [pgsql-patches] [HACKERS] [PATCHES] wal_checksum = on (default) | off
"Tom Lane" <[EMAIL PROTECTED]> writes: > "Simon Riggs" <[EMAIL PROTECTED]> writes: >> COPY XLogInsert() #1 on oprofile results at 17% CPU >> (full_page_writes = on) > > But what portion of that is actually CRC-related? XLogInsert does quite > a lot. > > Anyway, I can't see degrading the reliability of the system for a gain > in the range of a few percent, which is the most that we'd be likely > to get here ... for a factor of two or more, maybe people would be > willing to take a risk. Well the problem with that is when you have dozens of 1% improvements which are additive and the net negative effects aren't additive. Since we can't quantify the effect on reliability here it's hard to tell if that's the case. What did you think about protecting against torn writes using id numbers every 512 bytes. That "improves" the reliability of the system in that it detects torn writes 100% of the instead of just (2^32-1)/2^32 (or 99.9998%) of the time. (which poking a hole in the CRC mechanism would reduce to (2^31-1)/2^31 (or 99.9995%)). I don't think either of those percentages are significant. If you crash your system once a day and always get a torn WAL page you're still far more likely to still get corrupted data due to cosmic rays long before you get corrupted data due to mistakenly applying a torn page from WAL. But if we can save CPU on every WAL write while not harming reliability (in fact increasing it, albeit insignificantly) why not? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [pgsql-patches] [HACKERS] [PATCHES] wal_checksum = on (default) | off
On Thu, 2007-01-11 at 09:01 -0500, Tom Lane wrote: > "Simon Riggs" <[EMAIL PROTECTED]> writes: > > COPYXLogInsert() #1 on oprofile results at 17% CPU > > (full_page_writes = on) > > But what portion of that is actually CRC-related? XLogInsert does quite > a lot. > > Anyway, I can't see degrading the reliability of the system for a gain > in the range of a few percent, which is the most that we'd be likely > to get here ... for a factor of two or more, maybe people would be > willing to take a risk. All I would add is that the loss of reliability was not certain in all cases, otherwise I myself would have dropped the idea long ago. With the spectre of doubt surrounding this, I'm happy to drop the idea until we have proof/greater certainty either way. Patch revoked. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [pgsql-patches] [HACKERS] [PATCHES] wal_checksum = on (default) | off
"Simon Riggs" <[EMAIL PROTECTED]> writes: > COPY XLogInsert() #1 on oprofile results at 17% CPU > (full_page_writes = on) But what portion of that is actually CRC-related? XLogInsert does quite a lot. Anyway, I can't see degrading the reliability of the system for a gain in the range of a few percent, which is the most that we'd be likely to get here ... for a factor of two or more, maybe people would be willing to take a risk. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [pgsql-patches] [HACKERS] [PATCHES] wal_checksum = on (default) | off
On Wed, 2007-01-10 at 23:32 -0500, Bruce Momjian wrote: > Simon Riggs wrote: > > On Fri, 2007-01-05 at 22:57 -0500, Tom Lane wrote: > > > Jim Nasby <[EMAIL PROTECTED]> writes: > > > > On Jan 5, 2007, at 6:30 AM, Zeugswetter Andreas ADI SD wrote: > > > >> Ok, so when you need CRC's on a replicate (but not on the master) you > > > > > > > Which sounds to me like a good reason to allow the option in > > > > recovery.conf as well... > > > > > > Actually, I'm not seeing the use-case for a slave having a different > > > setting from the master at all? > > > > > > "My backup server is less reliable than the primary." > > > > > > "My backup server is more reliable than the primary." > > > > > > Somehow, neither of these statements seem likely to be uttered by > > > a sane DBA ... > > > > If I take a backup of a server and bring it up on a new system, the > > blocks in the backup will not have been CRC checked before they go to > > disk. > > > > If I take the same server and now stream log records across to it, why > > *must* that data be CRC checked, when the original data has not been? > > > > I'm proposing choice, with a safe default. That's all. > > I am assuming this item is dead because no performance results have been > reported. It's been on my hold queue, as a result of its lack of clear acceptance. Results from earlier tests show the routines which are dominated by CRC checking overhead are prominent in a number of important workloads. Those workloads all have a substantial disk component, so test results will vary between no saving at all on a disk-bound system to some savings on a CPU bound system. Restore RecordIsValid() #1 on oprofile results at 50-70% CPU COPYXLogInsert() #1 on oprofile results at 17% CPU (full_page_writes = on) OLTPno test with full_page_writes = on (less relevant) OLTPXLogInsert() #5 on oprofile results at 1.2% CPU (full_page_writes = off) Removing the CRC checks on WAL would likely be the easiest to remove 1% CPU on the system as it stands. Other changes require algorithmic or architectural changes to improve matters, though gains can be much larger. 1% doesn't sound much, but PostgreSQL is a very sleek beast these days. As we improve things in other areas the importance of this patch as a tuning switch will grow. Clearly the current patch is not accepted, but can we imagine a patch that saved substantial CPU time in these areas that would be acceptable? *Always* as a non-default option, IMHO, with careful documentation as to its possible use. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [pgsql-patches] [HACKERS] [PATCHES] wal_checksum = on (default) | off
Simon Riggs wrote: > On Fri, 2007-01-05 at 22:57 -0500, Tom Lane wrote: > > Jim Nasby <[EMAIL PROTECTED]> writes: > > > On Jan 5, 2007, at 6:30 AM, Zeugswetter Andreas ADI SD wrote: > > >> Ok, so when you need CRC's on a replicate (but not on the master) you > > > > > Which sounds to me like a good reason to allow the option in > > > recovery.conf as well... > > > > Actually, I'm not seeing the use-case for a slave having a different > > setting from the master at all? > > > > "My backup server is less reliable than the primary." > > > > "My backup server is more reliable than the primary." > > > > Somehow, neither of these statements seem likely to be uttered by > > a sane DBA ... > > If I take a backup of a server and bring it up on a new system, the > blocks in the backup will not have been CRC checked before they go to > disk. > > If I take the same server and now stream log records across to it, why > *must* that data be CRC checked, when the original data has not been? > > I'm proposing choice, with a safe default. That's all. I am assuming this item is dead because no performance results have been reported. -- Bruce Momjian [EMAIL PROTECTED] EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org