"Josh Berkus" <email@example.com> writes: > It's a question of whether your HW+OS can guarentee no torn page writes for > the xlog.
no, the data files. torn pages in the xlog is also a problem but we protect ourselves with a CRC and stop replay if it the CRC doesn't match. So the cost there is a bit of cpu, not extra i/o. > Running on Sun hardware combined with Solaris 10 with the xlog mounted > forcedirectio, the Solaris folks are convinced we are torn-page-proof and so > far we haven't been able to prove them wrong. And, on Solaris it's a > substantial performance gain (like, 8-10% on OLTP benchmarks). I would expect you to need a small non-volatile cache, either in the controller or the drive itself to be torn-page-proof. Or failing that to have drives that operate on 8kb sectors and guarantee that whole sectors get written using residual power. I don't think any drives operate in 8k sectors though. The scary thing about torn pages with full_page_writes off is that we don't offer any way to detect them. If both halves of the 8kb page look reasonable you could conceivably end up continuing without ever knowing your data is corrupt. That could happen if, for example, the change that was being written isn't very dramatic. Perhaps all that's missing is an update chain pointer for example. So you could have two versions of the same record but be missing the chain pointer in the old record. That would eventually lead to having two visible versions of the same record but no crashes or other red flags. I suggested a while back implementing torn page detection by writing a sequential number ever 512 bytes in the blocks. (I was talking about WAL at the time but the same principle applies.) Do it at the smgr layer using readv/writev and the upper layers need never know their data wasn't contiguous on disk. The only effect would be to shorten page sizes by 16 bytes which would be annoying but much less so than full_page_writes. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster