On Sun, Jul 17, 2011 at 3:04 AM, Cédric Villemain < cedric.villemain.deb...@gmail.com> wrote:
> 2011/7/17 Ken Caruso <k...@ipl31.net>: > > > > > > On Sat, Jul 16, 2011 at 2:30 PM, Tom Lane <t...@sss.pgh.pa.us> wrote: > >> > >> Ken Caruso <k...@ipl31.net> writes: > >> > Sorry, the actual error reported by CLUSTER is: > >> > >> > gpup=> cluster verbose tablename; > >> > INFO: clustering "dbname.tablename" > >> > WARNING: could not write block 12125253 of base/2651908/652397108 > >> > DETAIL: Multiple failures --- write error might be permanent. > >> > ERROR: could not open file "base/2651908/652397108.1" (target block > >> > 12125253): No such file or directory > >> > CONTEXT: writing block 12125253 of relation base/2651908/652397108 > >> > >> Hmm ... it looks like you've got a dirty buffer in shared memory that > >> corresponds to a block that no longer exists on disk; in fact, the whole > >> table segment it belonged to is gone. Or maybe the block or file number > >> in the shared buffer header is corrupted somehow. > >> > >> I imagine you're seeing errors like this during each checkpoint attempt? > > > > Hi Tom, > > Thanks for the reply. > > Yes, I tried a pg_start_backup() to force a checkpoint and it failed due > to > > similar error. > > > >> > >> I can't think of any very good way to clean that up. What I'd try here > >> is a forced database shutdown (immediate-mode stop) and see if it starts > >> up cleanly. It might be that whatever caused this has also corrupted > >> the back WAL and so WAL replay will result in the same or similar error. > >> In that case you'll be forced to do a pg_resetxlog to get the DB to come > >> up again. If so, a dump and reload and some manual consistency checking > >> would be indicated :-( > > > > Before seeing this message, I restarted Postgres and it was able to get > to a > > consistent state at which point I reclustered the db without error and > > everything appears to be fine. Any idea what caused this? Was it > something > > to do with the Vacuum Full? > > Block number 12125253 is bigger that any block we can find in > base/2651908/652397108.1 > Should the table size be in the 100GB range or 2-3 GB range ? > This should help decide: if in the former case, then probably at least > a segment disappear or, in the later, the shared_buffer turn > corrupted. > The DB was in the 200GB-300GB range when this happened. What would cause the segment to go missing? Just wondering if there is any further action I should take like filing a bug or if this is a known issue. Thanks for everyone's help. -Ken > > Ken, you didn't change RELSEG_SIZE, right ? (it needs to be change in > source code before compile it yourself) > In both case a hardware check is welcome I believe. > -- > Cédric Villemain 2ndQuadrant > http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support >