Hi everyone.  I'm currently in the situation of administering a rather large 
PostgreSQL database which for some reason seems to be even much larger than it 
should be.

I'm currently running version 8.4.5 - not the latest and greatest, I know - but 
this is a live database that would problematic to take down to upgrade unless 
all else fails - especially considering its size if it does need to be rebuilt 
somehow.

Anyway, I'm no stranger to SQL, but new to PostgreSQL - all my SQL 
administration in the past has been with MySQL.  So I'm somewhat bumbling my 
way through administrative commands trying to solve this - please bear with me.

The size of the tables reported by \dt+ add up to around 120 GB.  The size of 
the indexes reported with \di+ adds up to around 15 GB.  This is pretty 
consistent with what I would expect the data to require.

The problem is, the disk usage of the pgsql directory where the data is kept 
(as reported by 'du') comes to 647 GB - significantly more than it should.  
select pg_database_size('mydatabase') confirms this, returning 690830939920.

Vacuuming the tables (full and otherwise) hasn't helped, but then considering 
how the database is used, I didn't really expect it to.  It's strictly a 
read-only database, with the exception of once a month when it is refreshed by 
loading new data into newly created tables, and once that is done, vacuum 
analyzing the new tables, dropping the old tables, then renaming the new ones 
to have the name of the old ones.  Vacuums never claim to recover any space, 
and the disk usage stays the same.

So how do I find out what's eating up all this extra space?

I'm not sure this is related, but in doing a bit of digging I ran across the 
following command to try and see where the space is being used:

SELECT relname, pg_size_pretty(relpages::bigint * 8 *1024) AS size, CASE WHEN 
relkind = 't' THEN (SELECT pgd.relname FROM pg_class pgd WHERE 
pgd.relfilenode::text = SUBSTRING(pg.relname FROM 10)) ELSE (SELECT pgc.relname 
FROM pg_class pgc WHERE pg.reltoastrelid = pgc.relfilenode) END AS refrelname, 
relfilenode, relkind, reltuples::bigint, relpages FROM pg_class pg ORDER BY 
relpages DESC;

The biggest culprit in this is a file named pg_toast_101748 which weighs in at 
242 GB.  I understand that the toast files are supplemental storage files 
linked to tables, but I'm wondering if that particular file (and perhaps 
others) have lost their links?  The reason I consider this is the third column 
- which typically shows database names corresponding to most other toast files, 
is completely empty for that one.  There are other toast files too that don't 
seem to refer to a "real" database, but they only weight in at 2 GB or less, so 
they're less of a problem.

Sometimes in the past, the import process I mentioned above has crashed due to 
a lack of memory, as did the postgres daemon itself on at least one occasion, 
which I'm wondering may have left the internal database organization structure 
in an uncertain state (since our tables are created from scratch every month, I 
don't suspect that our particular database is corrupted, though I'm wondering 
if PostgreSQL's inner working have become so).  Is it possible that in such a 
scenario, a pg_toast file might be created but never used?  In such a case, how 
is that most safely deleted?  Or am I completely barking up the wrong tree?

I've done a bunch of Google searching and haven't come up with anything so far 
to shed some light on this.  Any help someone could provide on how to figure 
out where this substantial amount of extra disk space is being used would be 
greatly appreciated!

Thanks!

Dan
--
Syzygy Research & Technology
Box 83, Legal, AB  T0G 1L0 Canada
Phone: 780-961-2213


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Reply via email to