Re: [ADMIN] PANIC during VACUUM

German Becker Tue, 30 Apr 2013 05:26:32 -0700

OK I apologise for the lack of clarity of the first message. Let
me summarize the steps that lead me to the error.
I have 2 servers running Ubuntu 12.04 on which I am testing Postgres 9.1.9.
I set up streaming replication between them (no synchronous replication)
Both servers have 4 SATA hard drives with ext3 file system set up as follows

sda   --> / main os and the database files, except for the ones defined
below
sdb   ---> pg_xlog directory
sdc ----> one tablespace where heavy transaction tables are stored
sdd --> another tablespace where big historic tables are stored.

archiving mode is on and the archive location is sda (and from there to the
hot-standby server)
For testing I Populate the database with the data currently in production
(currently Postgres 8.3).
Then I run several load testing etc.
For tunning / improving the archiving process I needed to generate big
ammount of WAL. To do so I just deleted the contents of one big table, and
then VACUUM it, like this

DELETE form bigtable;
VACUUM bigtable;

And I found the error reported.
I repeated the whole process (creating a new cluster, populating it with
data - allways the same data- , seting up replication) a couple of times
after that and I found the error again about 90% of the time. I tried
deleting a big portion of the table and the error did not appeard. It
only appears after deleting ALL. Also in some cases I didn't run the VACUUM
command manually, and the error ocurred during auto-vacuum-
My last test, was, in case there was a hardware problem in the primary, to
trigger the standby server and try the vacuum there. With the same results.
Here a chunk of the log:

2013-04-29 17:02:21 ART [12024]: [32-1] PANIC:  XX001: corrupted item
pointer: offset = 8128, size = 80
2013-04-29 17:02:21 ART [12024]: [33-1] LOCATION:  PageIndexMultiDelete,
bufpage.c:779
2013-04-29 17:02:21 ART [12024]: [34-1] STATEMENT:  VACUUM callshopcdrs ;
2013-04-29 17:02:21 ART [23787]: [8-1] LOG:  server process (PID 12024) was
terminated by signal 6: Aborte
d
2013-04-29 17:02:21 ART [23787]: [9-1] LOG:  terminating any other active
server processes
2013-04-29 17:02:21 ART [7300]: [2-1] WARNING:  terminating connection
because of crash of another server
process
2013-04-29 17:02:21 ART [7300]: [3-1] DETAIL:  The postmaster has commanded
this server process to roll ba
ck the current transaction and exit, because another server process exited
abnormally and possibly corrupt
ed shared memory.
2013-04-29 17:02:21 ART [7300]: [4-1] HINT:  In a moment you should be able
to reconnect to the database a
nd repeat your command.
2013-04-29 17:02:21 ART [30304]: [1-1] FATAL:  the database system is in
recovery mode
2013-04-29 17:02:21 ART [23787]: [10-1] LOG:  archiver process (PID 7301)
exited with exit code 1
2013-04-29 17:02:21 ART [23787]: [11-1] LOG:  all server processes
terminated; reinitializing
2013-04-29 17:02:21 ART [30305]: [1-1] LOG:  database system was
interrupted; last known up at 2013-04-29
16:59:01 ART
2013-04-29 17:02:21 ART [30305]: [2-1] LOG:  database system was not
properly shut down; automatic recover
y in progress
2013-04-29 17:02:21 ART [30305]: [3-1] LOG:  redo starts at 11/497D4338
2013-04-29 17:02:21 ART [30305]: [4-1] LOG:  invalid magic number 0000 in
log file 17, segment 73, offset
8216576
2013-04-29 17:02:21 ART [30305]: [5-1] LOG:  redo done at 11/497D4440
2013-04-29 17:02:22 ART [30308]: [1-1] LOG:  autovacuum launcher started
2013-04-29 17:02:22 ART [23787]: [12-1] LOG:  database system is ready to
accept connections

There is a core file generated, it is 7GB big:

$ file core
core: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style, from
'postgres: postgres tvoip3 [local] VACUUM'

Many thanks for your help and let me know any extra information that might
be useful.

--

German

On Tue, Apr 30, 2013 at 8:51 AM, Kevin Grittner <[email protected]> wrote:

> [please don't top-post]
>
> German Becker <[email protected]> wrote:
> > Albe Laurenz <[email protected]> wrote:
> >> German Becker wrote:
>
> >>> I am testing version 9.1.9 before putting it in production. One
> >>> of my tests involved deleting a the contents of a big table ( ~
> >>> 13 GB size) and then VACUUMing it. During VACUUM PANICS.
>
> >> If you mess with the database files, errors like this are to be
> >> expected.
>
> > Thanks for your reply. In which sense did I mess with the
> > database files?
>
> You didn't say how you deleted the contents of that big table, and
> it appears that Albe assumed you deleted or truncated the
> underlying disk file rather than using the DELETE or TRUNCATE SQL
> statement.
>
> In any event, more details would help people come up with ideas on
> what might be wrong.
>
> http://wiki.postgresql.org/wiki/Guide_to_reporting_problems
>
> --
> Kevin Grittner
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

Re: [ADMIN] PANIC during VACUUM

Reply via email to