> >> Um, Vadim? Still of the opinion that elog(STOP) is a good > >> idea here? That's two people now for whom that decision has > >> turned localized corruption into complete database failure. > >> I don't think it's a good tradeoff. > > > One is able to use pg_resetxlog so I don't see point in > > removing elog(STOP) there. What do you think? > > Well, pg_resetxlog would get around the symptom, but at the cost of > possibly losing updates that are further along in the xlog than the > update for the corrupted page. (I'm assuming that the problem here > is a page with a corrupt LSN.) I think it's better to treat flush ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ On restart, entire content of all modified after last checkpoint pages should be restored from WAL. In Denis case it looks like newly allocated for update page was somehow corrupted before heapam.c:2235 (7.1.2 src) and so there was no XLOG_HEAP_INIT_PAGE flag in WAL record => page content was not initialized on restart. Denis reported system crash - very likely due to memory problem.
> request past end of log as a DEBUG or NOTICE condition and keep going. > Sure, it indicates badness somewhere, but we should try to have some > robustness in the face of that badness. I do not see any reason why > XLOG has to declare defeat and go home because of this condition. Ok - what about setting some flag there on restart and abort restart after all records from WAL applied? So DBA will have choice either to run pg_resetxlog after that and try to dump data or restore from old backup. I still object just NOTICE there - easy to miss it. And in normal processing mode I'd leave elog(STOP) there. Vadim P.S. Further discussions will be in hackers-list, sorry. ---------------------------(end of broadcast)--------------------------- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html