Andy Osborne <[EMAIL PROTECTED]> writes: > Tom Lane wrote: >>> FATAL 2: open of /u0/pgdata/pg_clog/0726 failed: No such file or directory >> What range of file names do you actually see in pg_clog?
> Currently 0000 to 00D6. I don't know what it was last night. Not any greater, for sure. (FYI, each segment covers one million transactions.) > the next backup was running when the database crashed. Any > attempt to access the table crashed it again. I don't know if > it helps, but a select * from news where <conditional on a field > with an index) was ok but if the where was not indexed and resulted > in a table scan, it crashed it. This is consistent with one page of the table being corrupted. > While I wouldn't rule out data corruption, the kernel message > ring has no errors for the md dirver, scsi host adapter or the > disks, which I would expect if we had bad blocks appearing on a > disk or somesuch. Some of the cases that I've seen look like completely unrelated data (not even Postgres stuff, just bits of text files) was written into a page of a Postgres table. This could possibly be a kernel bug, along the lines of getting confused about which buffer belongs to which file. But with no way to reproduce it it's hard to pin blame. >> You didn't happen to make a physical copy of the news table before >> dropping it, did you? It'd be interesting to examine the remains. > Sadly, no I didn't. This is one of our live database servers > and I was under a lot of pressure to get it back quickly. If > it does it again, what can I do to provide the most useful > feedback ?. If the database isn't unreasonably large, perhaps you could take a tarball dump of the whole $PGDATA directory tree while the postmaster is stopped? That would document the situation for examination at leisure. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 4: Don't 'kill -9' the postmaster