Tom, * Tom Lane (t...@sss.pgh.pa.us) wrote: > Not at all; I just think that it's not clear that they are a net win > for the average user, and so I'm unconvinced that turning them on by > default is a good idea. I could be convinced otherwise by suitable > evidence. What I'm objecting to is turning them on without making > any effort to collect such evidence.
As it happens, rather unexpectedly, we had evidence of a bit-flip happening on a 9.1.24 install show up on IRC today: https://paste.fedoraproject.org/533186/85041907/ What that shows is the output from: select * from heap_page_items(get_raw_page('theirtable', 4585)); With a row whose t_ctid is (134222313,18). Looking at the base-2 format of 4585 and 134222313: 0000 0000 0000 0000 0001 0001 1110 1001 0000 1000 0000 0000 0001 0001 1110 1001 There appears to be other issues with the page also but this was discovered through a pg_dump where the user was trying to get data out to upgrade to something more recent. Not clear if the errors on the page all happened at once or if it was over time, of course, but it's at least possible that this particular area of storage has been degrading over time and that identifying an error when it was just the bit-flip in the t_ctid (thanks to a checksum) might have allowed the user to pull out the data. During the discussion on IRC, someone else mentioned a similar problem which was due to not having ECC memory in their server. As discussed, that might mean that we wouldn't have caught the corruption since we only calculate the checksum on the way out of shared_buffers, but it's also entirely possible that we would have because it could have happened in kernel space too. We're still working with the user to see if we can get their data out, but that looks like pretty good evidence that maybe we should care about enabling checksums to catch corruption before it causes undo pain for our users. The raw page is here: https://paste.fedoraproject.org/533195/48504224/ if anyone is curious to look at it further (we're looking through it too). Thanks! Stephen
signature.asc
Description: Digital signature