On Fri, Jan 27, 2012 at 1:31 PM, Bridget Frey <bridget.f...@redfin.com> wrote: > Thanks for the info - that's very helpful. We had also noted that the alloc > seems to be -3 bytes. We have run pg_check and it found no instances of > corruption. We've also replayed queries that have failed, and have never > been able to get the same query to fail twice. In the case you > investigated, was there permanent page corruption - e.g. you could run the > same query over and over and get the same result?
Yes. I observed that the infomask bits on several tuples had somehow been overwritten by nonsense. I am not sure whether there were other kinds of corruption as well - I suspect probably so - but that's the only one I saw with my own eyes, courtesy of pg_filedump. > It really does seem like this is an issue either in Hot Standby or very > closely related to that feature, where there is temporary corruption of a > btree index that then disappears. Our master is not experiencing any malloc > issues, while the 3 slaves get about a dozen per day, despite similar > workloads. We haven't have a slave segfault since we set it up to produce a > core dump, but we're expecting to have that within the next few days > (assuming we'll continue to get a segfault every 3-4 days). We're also > planning to set up one slave that will panic when it gets a malloc issue, as > you (and other posters on 6400) had suggested. > > Thanks again for the help, and we'll keep you posted as we learn more... The case I investigated involved corruption on the master, and I think it predated Hot Standby. However, the symptom is generic enough that it seems quite possible that there's more than one way for it to happen. :-( -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs