OK I apologise for the lack of clarity of the first message. Let me summarize the steps that lead me to the error. I have 2 servers running Ubuntu 12.04 on which I am testing Postgres 9.1.9. I set up streaming replication between them (no synchronous replication) Both servers have 4 SATA hard drives with ext3 file system set up as follows
sda --> / main os and the database files, except for the ones defined below sdb ---> pg_xlog directory sdc ----> one tablespace where heavy transaction tables are stored sdd --> another tablespace where big historic tables are stored. archiving mode is on and the archive location is sda (and from there to the hot-standby server) For testing I Populate the database with the data currently in production (currently Postgres 8.3). Then I run several load testing etc. For tunning / improving the archiving process I needed to generate big ammount of WAL. To do so I just deleted the contents of one big table, and then VACUUM it, like this DELETE form bigtable; VACUUM bigtable; And I found the error reported. I repeated the whole process (creating a new cluster, populating it with data - allways the same data- , seting up replication) a couple of times after that and I found the error again about 90% of the time. I tried deleting a big portion of the table and the error did not appeard. It only appears after deleting ALL. Also in some cases I didn't run the VACUUM command manually, and the error ocurred during auto-vacuum- My last test, was, in case there was a hardware problem in the primary, to trigger the standby server and try the vacuum there. With the same results. Here a chunk of the log: 2013-04-29 17:02:21 ART [12024]: [32-1] PANIC: XX001: corrupted item pointer: offset = 8128, size = 80 2013-04-29 17:02:21 ART [12024]: [33-1] LOCATION: PageIndexMultiDelete, bufpage.c:779 2013-04-29 17:02:21 ART [12024]: [34-1] STATEMENT: VACUUM callshopcdrs ; 2013-04-29 17:02:21 ART [23787]: [8-1] LOG: server process (PID 12024) was terminated by signal 6: Aborte d 2013-04-29 17:02:21 ART [23787]: [9-1] LOG: terminating any other active server processes 2013-04-29 17:02:21 ART [7300]: [2-1] WARNING: terminating connection because of crash of another server process 2013-04-29 17:02:21 ART [7300]: [3-1] DETAIL: The postmaster has commanded this server process to roll ba ck the current transaction and exit, because another server process exited abnormally and possibly corrupt ed shared memory. 2013-04-29 17:02:21 ART [7300]: [4-1] HINT: In a moment you should be able to reconnect to the database a nd repeat your command. 2013-04-29 17:02:21 ART [30304]: [1-1] FATAL: the database system is in recovery mode 2013-04-29 17:02:21 ART [23787]: [10-1] LOG: archiver process (PID 7301) exited with exit code 1 2013-04-29 17:02:21 ART [23787]: [11-1] LOG: all server processes terminated; reinitializing 2013-04-29 17:02:21 ART [30305]: [1-1] LOG: database system was interrupted; last known up at 2013-04-29 16:59:01 ART 2013-04-29 17:02:21 ART [30305]: [2-1] LOG: database system was not properly shut down; automatic recover y in progress 2013-04-29 17:02:21 ART [30305]: [3-1] LOG: redo starts at 11/497D4338 2013-04-29 17:02:21 ART [30305]: [4-1] LOG: invalid magic number 0000 in log file 17, segment 73, offset 8216576 2013-04-29 17:02:21 ART [30305]: [5-1] LOG: redo done at 11/497D4440 2013-04-29 17:02:22 ART [30308]: [1-1] LOG: autovacuum launcher started 2013-04-29 17:02:22 ART [23787]: [12-1] LOG: database system is ready to accept connections There is a core file generated, it is 7GB big: $ file core core: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style, from 'postgres: postgres tvoip3 [local] VACUUM' Many thanks for your help and let me know any extra information that might be useful. -- German On Tue, Apr 30, 2013 at 8:51 AM, Kevin Grittner <kgri...@ymail.com> wrote: > [please don't top-post] > > German Becker <german.bec...@gmail.com> wrote: > > Albe Laurenz <laurenz.a...@wien.gv.at> wrote: > >> German Becker wrote: > > >>> I am testing version 9.1.9 before putting it in production. One > >>> of my tests involved deleting a the contents of a big table ( ~ > >>> 13 GB size) and then VACUUMing it. During VACUUM PANICS. > > >> If you mess with the database files, errors like this are to be > >> expected. > > > Thanks for your reply. In which sense did I mess with the > > database files? > > You didn't say how you deleted the contents of that big table, and > it appears that Albe assumed you deleted or truncated the > underlying disk file rather than using the DELETE or TRUNCATE SQL > statement. > > In any event, more details would help people come up with ideas on > what might be wrong. > > http://wiki.postgresql.org/wiki/Guide_to_reporting_problems > > -- > Kevin Grittner > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company >