Hello everybody,
We're using PostgreSQL 8.1.0 on AIX 5.3 through NFS (a Netapp Filer hosts the
database files), and we're encoutering somes issues with vaccums. PostgreSQL
binaries are built with xlc 6 (C for AIX Compiler 6.0.0.6) on AIX 5.2 (yes, I
know, building on 5.2 and running on 5.3 is not the best way to avoid bugs...).
We have strong performance constraints with this database, so we planned
vacuums with a crontab :
- Frequent vacuum analyze on some heavily-updated tables (few rows, but a lot
of insert/update/delete). The frequency varies between 5 and 15 minutes.
- A nightly (not FULL) vacuum on the entire database.
We don't use autovacuum or FULL vacuum, because the high havailability needed
for the database. We prefer to keep it under control.
Since some weeks, the amount of data hosted by the database grows, and, some
nights, the database vacuum seems to "freeze" during his execution. In verbose
mode, the logs show that the vacuum clean up a table (not always the same
table), and... no more news. The system shows a vacuum process, which seems to
be sleeping (no CPU used, no memory consumption...). In addition, the logs of
our application show that database transactions seems to be slower.
For some internal reasons, the only way for us to workaround this problem is to
shutdown of the application and the database. After a full restart, things are
ok.
Some questions :
1) During the nightly database vacuum, some vacuums run concurrently (vacuums
on heavily-updated tables). Can this concurrency cause some deadlocks ? We're
planning to patch our shell scripts to avoid this concurrency.
2) I believed that the poor performances during the vacuum freeze were due to
the obsolete data statistics. But after a full restart of the dabatase,
performances are good. Does PostgreSQL rebuild his statistics during startup ?
3) Can we explain the freeze with a bad database configuration ? For instance,
postgreSQL running out of connections, or whatever, causing the vacuum process
to wait for free ressources ?
4) This morning, just before the database vacuum freeze, the logs show this
error :
<2007-06-13 03:20:35 DFT%>ERROR: could not open relation 16391/16394/107937: A
system call received an interrupt.
<2007-06-13 03:20:35 DFT%>CONTEXT: writing block 2 of relation
16391/16394/107937
<2007-06-13 03:20:40 DFT%>LOG: could not fsync segment 0 of relation
16392/16394/107925: A system call received an interrupt.
<2007-06-13 03:20:40 DFT%>ERROR: storage sync failed on magnetic disk: A
system call received an interrupt.
This is the first time we're encountering this error. Can it be a cause of the
vacuum freeze ?
Regards,
--
Loic Restoux
Capgemini Telecom & Media / ITDR
tel : 02 99 27 82 30
e-mail : [EMAIL PROTECTED]
This message contains information that may be privileged or confidential and is
the property of the Capgemini Group. It is intended only for the person to whom
it is addressed. If you are not the intended recipient, you are not authorized
to read, print, retain, copy, disseminate, distribute, or use this message or
any part thereof. If you receive this message in error, please notify the
sender immediately and delete all copies of this message.
---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match