On Fri, Oct 28, 2005 at 02:26:31PM +1000, Gavin Sherry wrote: > Have spoken with Jim on IRC, he says that there have been several crashes > recently due to a faulty disk array. I guess the zeroing could be an > outcome of the faulty disk. I wonder if the crash the faulty disk resulted > in could have been caused some where around mdextend() where we create a > zero'd page but before we could have written out the initialised page.
Just to clarify, there's no evidence that the array is faulty. I do know that they were using write-back with a non-battery-backed cache though. What has been happening is periodic random crashes, around 1 a week. I now have a good core for one, as well as an assert: TRAP: FailedAssertion("!(shared->page_number[slotno] == pageno && shared->page_status[slotno] == SLRU_PAGE_READ_IN_PROGRESS)", File: "slru.c", Line: 308) I haven't looked at that code yet, so I have no idea what that actually means. Let me know what info y'all would like to see out of the core. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq