Hello everybody,

We're using PostgreSQL 8.1.0 on AIX 5.3 through NFS (a Netapp Filer hosts the 
database files), and we're encoutering somes issues with vaccums. PostgreSQL 
binaries are built with xlc 6 (C for AIX Compiler 6.0.0.6) on AIX 5.2 (yes, I 
know, building on 5.2 and running on 5.3 is not the best way to avoid bugs...).


We have strong performance constraints with this database, so we planned 
vacuums with a crontab :
- Frequent vacuum analyze on some heavily-updated tables (few rows, but a lot 
of insert/update/delete). The frequency varies between 5 and 15 minutes.
- A nightly (not FULL) vacuum on the entire database.
We don't use autovacuum or FULL vacuum, because the high havailability needed 
for the database. We prefer to keep it under control.


Since some weeks, the amount of data hosted by the database grows, and, some 
nights,  the database vacuum seems to "freeze" during his execution. In verbose 
mode, the logs show that the vacuum clean up a table (not always the same 
table), and... no more news. The system shows a vacuum process, which seems to 
be sleeping (no CPU used, no memory consumption...). In addition, the logs of 
our application show that database transactions seems to be slower. 

For some internal reasons, the only way for us to workaround this problem is to 
shutdown of the application and the database. After a full restart, things are 
ok. 


Some questions :

1) During the nightly database vacuum, some vacuums run concurrently (vacuums 
on heavily-updated tables). Can this concurrency cause some deadlocks ? We're 
planning to patch our shell scripts to avoid this concurrency.


2) I believed that the poor performances during the vacuum freeze were due to 
the obsolete data statistics. But after a full restart of the dabatase, 
performances are good. Does PostgreSQL rebuild his statistics during startup ? 


3) Can we explain the freeze with a bad database configuration ? For instance, 
postgreSQL running out of connections, or whatever, causing the vacuum process 
to wait for free ressources ?


4) This morning, just before the database vacuum freeze, the logs show this 
error :
<2007-06-13 03:20:35 DFT%>ERROR:  could not open relation 16391/16394/107937: A 
system call received an interrupt.
<2007-06-13 03:20:35 DFT%>CONTEXT:  writing block 2 of relation 
16391/16394/107937
<2007-06-13 03:20:40 DFT%>LOG:  could not fsync segment 0 of relation 
16392/16394/107925: A system call received an interrupt.
<2007-06-13 03:20:40 DFT%>ERROR:  storage sync failed on magnetic disk: A 
system call received an interrupt.

This is the first time we're encountering this error. Can it be a cause of the 
vacuum freeze ?
 

Regards,


-- 
  Loic Restoux
  Capgemini Telecom & Media / ITDR
  tel : 02 99 27 82 30
  e-mail : [EMAIL PROTECTED] 

This message contains information that may be privileged or confidential and is 
the property of the Capgemini Group. It is intended only for the person to whom 
it is addressed. If you are not the intended recipient,  you are not authorized 
to read, print, retain, copy, disseminate,  distribute, or use this message or 
any part thereof. If you receive this  message in error, please notify the 
sender immediately and delete all  copies of this message.


---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
       choose an index scan if your joining column's datatypes do not
       match

Reply via email to