Hi!
I ran in to a
serious problem with a server running Postgresql 7.2.1. I know that an upgrade
would be in its place, but this is not an option at the
moment.
The problem started
with the server running out of space (toast table galore) due to a too low
max_fsm_pages number (1,000,000 was too low on this particular
server).
I raised it to
3,000,000 and tried a "VACUUM FULL" on the table. Out popped an error when it
reached the toast table index: "Number of tuples (xxx) not same at number
in heap (XXX), recreate index". The numbers (XXX) differed by just 2.
It then
started recreating the index, but stopped short when another error popped
up: "insertion of duplicate keys"-failure.
I have subsequently
attempted to reindex the tables/indexes, but it didn't help a bit. I still got
the: "Number of tuples (xxx) not same at number in heap (XXX), recreate
index"- error.
I then tried to drop
the indexes on the table in question, and then run another "VACUUM FULL". The
result is this:
SNIP
------------------------------------------------------------------------------------------------------
backend>
vacuum full infected;
DEBUG: --Relation infected--
DEBUG: Pages 71795: Changed 0, reaped 16560, Empty 0, New 0; Tup 748830: Vac 0, Keep/VTL 0/0, UnUsed 81433, MinLen 142, MaxLen 2032; Re-using: Free/Avail. Space 13279172/10747696; EndEmpty/Avail. Pages 0/21243.
CPU 2.05s/0.24u sec elapsed 69.52 sec.
DEBUG: XLogWrite: new log file created - consider increasing WAL_FILES
DEBUG: Rel infected: Pages: 71795 --> 71795; Tuple(s) moved: 0.
CPU 1.37s/1.23u sec elapsed 112.46 sec.
DEBUG: --Relation pg_toast_16581--
DEBUG: Pages 1517406: Changed 48938, reaped 1134683, Empty 217179, New 0; Tup 1598162: Vac 201754, Keep/VTL 0/0, UnUsed 4053916, MinLen 45, MaxLen 2034; Re-using: Free/Avail. Space 9420709796/9419556500; EndEmpty/Avail. Pages 0/1354671.
CPU 38.72s/5.26u sec elapsed 850.95 sec.
DEBUG: XLogWrite: new log file created - consider increasing WAL_FILES
DEBUG: XLogWrite: new log file created - consider increasing WAL_FILES
DEBUG: XLogWrite: new log file created - consider increasing WAL_FILES
DEBUG: XLogWrite: new log file created - consider increasing WAL_FILES
DEBUG: XLogWrite: new log file created - consider increasing WAL_FILES
DEBUG: XLogWrite: new log file created - consider increasing WAL_FILES
DEBUG: XLogWrite: new log file created - consider increasing WAL_FILES
....
DEBUG: --Relation infected--
DEBUG: Pages 71795: Changed 0, reaped 16560, Empty 0, New 0; Tup 748830: Vac 0, Keep/VTL 0/0, UnUsed 81433, MinLen 142, MaxLen 2032; Re-using: Free/Avail. Space 13279172/10747696; EndEmpty/Avail. Pages 0/21243.
CPU 2.05s/0.24u sec elapsed 69.52 sec.
DEBUG: XLogWrite: new log file created - consider increasing WAL_FILES
DEBUG: Rel infected: Pages: 71795 --> 71795; Tuple(s) moved: 0.
CPU 1.37s/1.23u sec elapsed 112.46 sec.
DEBUG: --Relation pg_toast_16581--
DEBUG: Pages 1517406: Changed 48938, reaped 1134683, Empty 217179, New 0; Tup 1598162: Vac 201754, Keep/VTL 0/0, UnUsed 4053916, MinLen 45, MaxLen 2034; Re-using: Free/Avail. Space 9420709796/9419556500; EndEmpty/Avail. Pages 0/1354671.
CPU 38.72s/5.26u sec elapsed 850.95 sec.
DEBUG: XLogWrite: new log file created - consider increasing WAL_FILES
DEBUG: XLogWrite: new log file created - consider increasing WAL_FILES
DEBUG: XLogWrite: new log file created - consider increasing WAL_FILES
DEBUG: XLogWrite: new log file created - consider increasing WAL_FILES
DEBUG: XLogWrite: new log file created - consider increasing WAL_FILES
DEBUG: XLogWrite: new log file created - consider increasing WAL_FILES
DEBUG: XLogWrite: new log file created - consider increasing WAL_FILES
....
...
...
DEBUG:
XLogWrite: new log file created - consider increasing WAL_FILES
FATAL 2: ZeroFill failed to write /var/lib/pgsql/data//pg_xlog/xlogtemp.25146: No space left on device
DEBUG: shutting down
FATAL 2: open of /var/lib/pgsql/data//pg_xlog/00000023000000FE (log file 35, segment 254) failed: No such file or directory
---------------------------------------------------------------------------------------------------------------------/SNIP
FATAL 2: ZeroFill failed to write /var/lib/pgsql/data//pg_xlog/xlogtemp.25146: No space left on device
DEBUG: shutting down
FATAL 2: open of /var/lib/pgsql/data//pg_xlog/00000023000000FE (log file 35, segment 254) failed: No such file or directory
---------------------------------------------------------------------------------------------------------------------/SNIP
The last lines keep
coming until no more diskspace is left. I searched this forum and got some hints
to this problem.
I raised the
"checkpoint" default values in postgres.conf to these values, which should have
decreased the need for diskspace:
-------------
checkpoint_segments
= 12 # in logfile segments (16MB each), min 1
checkpoint_timeout = 3600 # in seconds, range 30-3600
-------------
checkpoint_timeout = 3600 # in seconds, range 30-3600
-------------
But the log files
keep coming. And my original problem of vacuuming full to reclaim space is
unsolved....
I then moved pg_xlog
to a different partition with 2 gigs of diskspace, and raised wal_files further
from 24 to 46. Ran "VACUUM FULL" once more.
Those 2 gigs was
also used up pretty quickly...
So this is where i
am now... desperate! Any suggestions will be greatly
appriciated!!!
What can i do to fix
this mess??
Regards,
Thomas.