Re: [PERFORM] DB size and TABLE sizes don't seem to add up

2014-02-18 Thread David Wall


On 2/18/2014 12:34 AM, Heikki Linnakangas wrote:

On 02/18/2014 12:14 AM, David Wall wrote:

I am running PG 9.2.4 and I am trying to figure out why my database size
shows one value, but the sum of my total relation sizes is so much less.

Basically, I'm told my database is 188MB, but the sum of my total
relation sizes adds up to just 8.7MB, which is 1/20th of the reported
total.  Where is the 19/20th of my data then?  We do make significant
use of large objects, so I suspect it's in there.  Is there a relation
size query that would include the large object data associated with any
OIDs in those tables?


You can use "select pg_total_relation_size('pg_largeobject')" to get 
the total size of the large objects. Attributing large objects to the 
tables that refer them is more difficult. For a single table, 
something like this:


select sum(pg_column_size(lo.data))
from lotest_stash_values t, pg_largeobject lo
where lo.loid = t.loid;

Replace "lotest_stash_values" with the table's name and lo.loid with 
the name of the OID column.


Thanks, Heikki.  It's generally even trickier for us because we have a 
blob table that other components use for storing 
large/binary/unstructured objects (the code handles 
compression/decompression and encryption/decryption options for us).  So 
those tables have an UUID that points to a row in that table that 
contains the actual LOID.  I'll use your technique to at least tell me 
the size for specific tables where I can build the query like you've 
described.



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] DB size and TABLE sizes don't seem to add up

2014-02-18 Thread Heikki Linnakangas

On 02/18/2014 12:14 AM, David Wall wrote:

I am running PG 9.2.4 and I am trying to figure out why my database size
shows one value, but the sum of my total relation sizes is so much less.

Basically, I'm told my database is 188MB, but the sum of my total
relation sizes adds up to just 8.7MB, which is 1/20th of the reported
total.  Where is the 19/20th of my data then?  We do make significant
use of large objects, so I suspect it's in there.  Is there a relation
size query that would include the large object data associated with any
OIDs in those tables?


You can use "select pg_total_relation_size('pg_largeobject')" to get the 
total size of the large objects. Attributing large objects to the tables 
that refer them is more difficult. For a single table, something like this:


select sum(pg_column_size(lo.data))
from lotest_stash_values t, pg_largeobject lo
where lo.loid = t.loid;

Replace "lotest_stash_values" with the table's name and lo.loid with the 
name of the OID column.


- Heikki


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] DB size and TABLE sizes don't seem to add up

2014-02-17 Thread David Wall
I am running PG 9.2.4 and I am trying to figure out why my database size 
shows one value, but the sum of my total relation sizes is so much less.


Basically, I'm told my database is 188MB, but the sum of my total 
relation sizes adds up to just 8.7MB, which is 1/20th of the reported 
total.  Where is the 19/20th of my data then?  We do make significant 
use of large objects, so I suspect it's in there.  Is there a relation 
size query that would include the large object data associated with any 
OIDs in those tables?


Here's the data I am working off of:

First, I run a query to get my total DB size (this is after a restore 
from a backup, so it should not have too many "holes"):


bpn=# SELECT pg_size_pretty(pg_database_size('bpn'));
 pg_size_pretty

 188 MB
(1 row)

Second, I run this query (from 
http://wiki.postgresql.org/wiki/Disk_Usage) to get the total relation 
sizes for the tables in that database:


bpn=# SELECT nspname || '.' || relname AS "relation",
bpn-# pg_size_pretty(pg_total_relation_size(C.oid)) AS "total_size"
bpn-#   FROM pg_class C
bpn-#   LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
bpn-#   WHERE nspname NOT IN ('pg_catalog', 'information_schema')
bpn-# AND C.relkind <> 'i'
bpn-# AND nspname !~ '^pg_toast'
bpn-#   ORDER BY pg_total_relation_size(C.oid) DESC;
relation | total_size
-+
 public.esf_outbound_email_message   | 1624 kB
 public.esf_transaction_activity_log | 968 kB
 public.esf_blob | 560 kB
 public.esf_outbound_email_message_attachment| 552 kB
 public.esf_tran_report_field_string | 232 kB
 public.esf_system_activity_log  | 192 kB
 public.esf_permission_option_group  | 184 kB
 public.esf_library_document_version_page| 176 kB
 public.esf_transaction  | 152 kB
 public.esf_transaction_party| 136 kB
 public.esf_library_dropdown_version_option  | 128 kB
 public.esf_signature_key| 112 kB
 public.esf_transaction_document | 104 kB
 public.esf_permission_option| 96 kB
 public.esf_library_email_template_version   | 96 kB
 public.esf_transaction_party_document   | 96 kB
 public.esf_field_template   | 88 kB
 public.esf_transaction_party_assignment | 88 kB
 public.esf_outbound_email_message_response  | 88 kB
 public.esf_user_activity_log| 88 kB
 public.esf_library_buttonmessage| 80 kB
 public.esf_library_email_template   | 80 kB
 public.esf_library_documentstyle| 80 kB
 public.esf_package  | 80 kB
 public.esf_package_version_party_template   | 80 kB
 public.esf_permission   | 80 kB
 public.esf_report_template  | 80 kB
 public.esf_transaction_template | 80 kB
 public.esf_user | 80 kB
 public.esf_userlogin| 80 kB
 public.esf_group| 80 kB
 public.esf_library  | 80 kB
 public.esf_library_document | 80 kB
 public.esf_library_dropdown | 80 kB
 public.esf_stats| 80 kB
 public.esf_library_image| 80 kB
 public.esf_library_propertyset  | 80 kB
 public.esf_package_version  | 72 kB
 public.esf_package_version_document | 72 kB
 public.esf_report_template_report_field | 72 kB
 public.esf_library_document_version | 72 kB
 public.esf_library_documentstyle_version| 72 kB
 public.esf_group_user   | 72 kB
 public.esf_report_field_template| 72 kB
 public.esf_library_buttonmessage_version| 72 kB
 public.esf_library_propertyset_version  | 72 kB
 public.esf_library_dropdown_version | 72 kB
 public.esf_party_template_field_template| 72 kB
 public.esf_library_image_version| 72 kB
 public.esf_party_template   | 72 kB
 public.esf_label_template   | 56 kB
 public.esf_library_document_version_page_field_template | 56 kB
 public.esf_package_version_report_field | 56 kB
 public.esf_package_version_party_document_party | 

Re: [PERFORM] db size

2008-04-17 Thread Richard Huxton

Adrian Moisey wrote:

Hi

INFO:  "blahxxx": scanned 27 of 27 pages, containing 1272 live rows 
and 0 dead rows; 1272 rows in sample, 1272 estimated total rows


This is a small table that takes up 27 pages and it scanned all of 
them. You have 1272 rows in it and none of them are dead (i.e. 
deleted/updated but still taking up space).


I had a look through a few other tables...:

INFO:  "table1": scanned 22988 of 22988 pages, containing 2713446 live 
rows and 895662 dead rows; 45000 rows in sample, 2713446 estimate

d total rows

INFO:  "table2": scanned 24600 of 24600 pages, containing 270585 live 
rows and 65524 dead rows; 45000 rows in sample, 270585 estimated total rows


Is that dead rows an issue?  Should I try clean it out?  Will it improve 
performance ?


What you're hoping to see is that figure remain stable. The point of the 
free-space-map is to track these and allow the space to be re-used. If 
you find that the number of dead rows is increasing then either you are:

1. Just deleting rows
2. Not vacuuming enough - check your autovacuum settings

The effect on performance is that when you read in a page from disk 
you're reading dead rows along with the data you are after. Trying to 
keep 0 dead rows in a constantly updated table isn't worth the effort 
though - you'd end up wasting your disk I/O on maintenance rather than 
queries.


The figures above look high to me - 90,000 out of 270,000 and 65,000 out 
of 270,000. Of course, if these tables have just had bulk 
updates/deletes then that's fine. If there's a steady stream of updates 
though, you probably want to up your autovacuum settings.


--
  Richard Huxton
  Archonet Ltd

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] db size

2008-04-17 Thread Adrian Moisey

Hi

INFO:  "blahxxx": scanned 27 of 27 pages, containing 1272 live rows 
and 0 dead rows; 1272 rows in sample, 1272 estimated total rows


This is a small table that takes up 27 pages and it scanned all of them. 
You have 1272 rows in it and none of them are dead (i.e. deleted/updated 
but still taking up space).


I had a look through a few other tables...:

INFO:  "table1": scanned 22988 of 22988 pages, containing 2713446 live 
rows and 895662 dead rows; 45000 rows in sample, 2713446 estimate

d total rows

INFO:  "table2": scanned 24600 of 24600 pages, containing 270585 live 
rows and 65524 dead rows; 45000 rows in sample, 270585 estimated total rows


Is that dead rows an issue?  Should I try clean it out?  Will it improve 
performance ?



--
Adrian Moisey
Systems Administrator | CareerJunction | Your Future Starts Here.
Web: www.careerjunction.co.za | Email: [EMAIL PROTECTED]
Phone: +27 21 686 6820 | Mobile: +27 82 858 7830 | Fax: +27 21 686 6842

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] db size

2008-04-17 Thread Adrian Moisey

Hi

You are tracking ~ 4.6 million pages and have space to track ~ 15.5 
million, so that's fine. You are right up against your limit of 
relations (tables, indexes etc) being tracked though - 1200. You'll 
probably want to increase max_fsm_relations - see manual for details 
(server configuration / free space map).


That is helpful, thanks.

I did a grep on the output to find out more about the max_fsm_relations:

INFO:  free space map contains 2333562 pages in 832 relations
INFO:  free space map contains 3012404 pages in 544 relations
INFO:  free space map contains 3012303 pages in 654 relations
INFO:  free space map contains 3012345 pages in 669 relations
INFO:  free space map contains 3012394 pages in 678 relations
INFO:  free space map contains 3017248 pages in 717 relations
INFO:  free space map contains 2860737 pages in 824 relations
INFO:  free space map contains 4667977 pages in 1199 relations
INFO:  free space map contains 3140238 pages in 181 relations
INFO:  free space map contains 3140322 pages in 182 relations
INFO:  free space map contains 3140387 pages in 183 relations
INFO:  free space map contains 3142781 pages in 184 relations

It doesn't go up close to 1200 often... should I still up that value?

--
Adrian Moisey
Systems Administrator | CareerJunction | Your Future Starts Here.
Web: www.careerjunction.co.za | Email: [EMAIL PROTECTED]
Phone: +27 21 686 6820 | Mobile: +27 82 858 7830 | Fax: +27 21 686 6842

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] db size

2008-04-17 Thread Richard Huxton

Adrian Moisey wrote:

Hi


Running VACUUM VERBOSE will give you a detailed view of space usage of
each individual table.


I did that.

Not too sure what I'm looking for, can someone tell me what this means:

INFO:  "blahxxx": scanned 27 of 27 pages, containing 1272 live rows and 
0 dead rows; 1272 rows in sample, 1272 estimated total rows


This is a small table that takes up 27 pages and it scanned all of them. 
You have 1272 rows in it and none of them are dead (i.e. deleted/updated 
but still taking up space).



INFO:  free space map contains 4667977 pages in 1199 relations
DETAIL:  A total of 4505344 page slots are in use (including overhead).
4505344 page slots are required to track all free space.
Current limits are:  15537488 page slots, 1200 relations, using 91172 kB.


You are tracking ~ 4.6 million pages and have space to track ~ 15.5 
million, so that's fine. You are right up against your limit of 
relations (tables, indexes etc) being tracked though - 1200. You'll 
probably want to increase max_fsm_relations - see manual for details 
(server configuration / free space map).


--
  Richard Huxton
  Archonet Ltd

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] db size

2008-04-16 Thread Adrian Moisey

Hi


Running VACUUM VERBOSE will give you a detailed view of space usage of
each individual table.


I did that.

Not too sure what I'm looking for, can someone tell me what this means:

INFO:  "blahxxx": scanned 27 of 27 pages, containing 1272 live rows and 
0 dead rows; 1272 rows in sample, 1272 estimated total rows

INFO:  free space map contains 4667977 pages in 1199 relations
DETAIL:  A total of 4505344 page slots are in use (including overhead).
4505344 page slots are required to track all free space.
Current limits are:  15537488 page slots, 1200 relations, using 91172 kB.



--
Adrian Moisey
Systems Administrator | CareerJunction | Your Future Starts Here.
Web: www.careerjunction.co.za | Email: [EMAIL PROTECTED]
Phone: +27 21 686 6820 | Mobile: +27 82 858 7830 | Fax: +27 21 686 6842

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] db size

2008-04-15 Thread Bill Moran
Adrian Moisey <[EMAIL PROTECTED]> wrote:
>
> Hi
> 
> >>> Now, is the bloat in the tables (which tables ?) or in the 
> >>> indexes (which indexes ?), or in the toast tables perhaps, or in the 
> >>> system catalogs or all of the above ? Or perhaps there is a 
> >>> long-forgotten process that got zombified while holding a huge temp 
> >>> table ? (not very likely, but who knows).
> >>> Use pg_relation_size() and its friends to get an idea of the size 
> >>> of stuff.
> 
> Can anybody give me some advice on the above?  I'm not sure where to 
> start looking or how to start looking

Running VACUUM VERBOSE will give you a detailed view of space usage of
each individual table.

-- 
Bill Moran
Collaborative Fusion Inc.

[EMAIL PROTECTED]
Phone: 412-422-3463x4023

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] db size

2008-04-15 Thread Adrian Moisey

Hi

Now, is the bloat in the tables (which tables ?) or in the 
indexes (which indexes ?), or in the toast tables perhaps, or in the 
system catalogs or all of the above ? Or perhaps there is a 
long-forgotten process that got zombified while holding a huge temp 
table ? (not very likely, but who knows).
Use pg_relation_size() and its friends to get an idea of the size 
of stuff.


Can anybody give me some advice on the above?  I'm not sure where to 
start looking or how to start looking


--
Adrian Moisey
Systems Administrator | CareerJunction | Your Future Starts Here.
Web: www.careerjunction.co.za | Email: [EMAIL PROTECTED]
Phone: +27 21 686 6820 | Mobile: +27 82 858 7830 | Fax: +27 21 686 6842

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] db size

2008-04-14 Thread Bill Moran
In response to Adrian Moisey <[EMAIL PROTECTED]>:
> 
> We currently have a 16CPU 32GB box running postgres 8.2.
> 
> When I do a pg_dump with the following parameters "/usr/bin/pg_dump -E 
> UTF8 -F c -b" I get a file of 14GB in size.
> 
> But the database is 110GB in size on the disk.  Why the big difference 
> in size?  Does this have anything to do with performance?

In a dump, indexes are a single command.  In the actual database, the
indexes actually contain all the data the indexes require, which can
be substantially more in size than the command to create the index.

Additionally, a running database has a certain amount of wasted space.
If you're running vacuum on a proper schedule, this won't get out of
hand.  Read this page to understand better:
http://www.postgresql.org/docs/8.1/static/maintenance.html

And lastly, I expect that the pg_dump format is able to do more aggressive
compression than the running database.

-- 
Bill Moran
Collaborative Fusion Inc.
http://people.collaborativefusion.com/~wmoran/

[EMAIL PROTECTED]
Phone: 412-422-3463x4023

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] db size

2008-04-14 Thread PFC



Will this help with performance ?


	Depends if the bloat is in part of your working set. If debloating can  
make the working set fit in RAM, or lower your IOs, you'll get a boost.


Now, is the bloat in the tables (which tables ?) or in the indexes  
(which indexes ?), or in the toast tables perhaps, or in the system  
catalogs or all of the above ? Or perhaps there is a long-forgotten  
process that got zombified while holding a huge temp table ? (not very  
likely, but who knows).
Use pg_relation_size() and its friends to get an idea of the size  
of stuff.


I'll look into that, thanks





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] db size

2008-04-14 Thread Adrian Moisey

Hi


the live one is 113G
the restored one is 78G

>

Good news for you is that you know that you can do something ;)


:)

Will this help with performance ?

Now, is the bloat in the tables (which tables ?) or in the indexes 
(which indexes ?), or in the toast tables perhaps, or in the system 
catalogs or all of the above ? Or perhaps there is a long-forgotten 
process that got zombified while holding a huge temp table ? (not very 
likely, but who knows).
Use pg_relation_size() and its friends to get an idea of the size of 
stuff.


I'll look into that, thanks

--
Adrian Moisey
Systems Administrator | CareerJunction | Your Future Starts Here.
Web: www.careerjunction.co.za | Email: [EMAIL PROTECTED]
Phone: +27 21 686 6820 | Mobile: +27 82 858 7830 | Fax: +27 21 686 6842

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] db size

2008-04-14 Thread Craig Ringer
Adrian Moisey wrote:
> Hi
> 
>> If you suspect your tables or indexes are bloated, restore your
>> dump to a test box.
>> Use fsync=off during restore, you don't care about integrity on
>> the test box.
>> This will avoid slowing down your production database.
>> Then look at the size of the restored database.
>> If it is much smaller than your production database, then you have
>> bloat.
> 
> I have done that, and I get the following:
> 
> the live one is 113G
> the restored one is 78G
> 
> How should I get rid of the bloat?
> VACUUM FULL?

And/or REINDEX if you're not satisfied with the results of a VACUUM FULL.

http://www.postgresql.org/docs/8.3/interactive/vacuum.html
http://www.postgresql.org/docs/8.3/interactive/sql-reindex.html

Of course, all of these will have performance consequences while they're
running, and take out locks that prevent certain other operatons as
shown in table 13-2:

http://www.postgresql.org/docs/8.3/static/explicit-locking.html

and the explanation following it.

Note in particular:


ACCESS EXCLUSIVE

Conflicts with locks of all modes (ACCESS SHARE, ROW SHARE, ROW
EXCLUSIVE, SHARE UPDATE EXCLUSIVE, SHARE, SHARE ROW EXCLUSIVE,
EXCLUSIVE, and ACCESS EXCLUSIVE). This mode guarantees that the holder
is the only transaction accessing the table in any way.

Acquired by the ALTER TABLE, DROP TABLE, TRUNCATE, REINDEX, CLUSTER,
and VACUUM FULL commands. This is also the default lock mode for LOCK
TABLE statements that do not specify a mode explicitly.

Tip: Only an ACCESS EXCLUSIVE lock blocks a SELECT (without FOR
UPDATE/SHARE) statement.


In other words, you won't be doing much with a table/index while a
VACUUM FULL or a REINDEX is in progress on it.

Given that, you probably want to check your table/index sizes and see if
there are particular problem tables or indexes, rather than just using a
sledgehammer approach.

--
Craig Ringer

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] db size

2008-04-14 Thread PFC


If you suspect your tables or indexes are bloated, restore your  
dump to a test box.
Use fsync=off during restore, you don't care about integrity on the  
test box.

This will avoid slowing down your production database.
Then look at the size of the restored database.
If it is much smaller than your production database, then you have  
bloat.


I have done that, and I get the following:

the live one is 113G
the restored one is 78G


Ah.
Good news for you is that you know that you can do something ;)

	Now, is the bloat in the tables (which tables ?) or in the indexes (which  
indexes ?), or in the toast tables perhaps, or in the system catalogs or  
all of the above ? Or perhaps there is a long-forgotten process that got  
zombified while holding a huge temp table ? (not very likely, but who  
knows).
	Use pg_relation_size() and its friends to get an idea of the size of  
stuff.
	Perhaps you have 1 extremely bloated table or index, or perhaps  
everything is bloated.

The solution to your problem depends on which case you have.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] db size

2008-04-14 Thread Adrian Moisey

Hi

If you suspect your tables or indexes are bloated, restore your dump 
to a test box.
Use fsync=off during restore, you don't care about integrity on the 
test box.

This will avoid slowing down your production database.
Then look at the size of the restored database.
If it is much smaller than your production database, then you have 
bloat.


I have done that, and I get the following:

the live one is 113G
the restored one is 78G

How should I get rid of the bloat?
VACUUM FULL?

--
Adrian Moisey
Systems Administrator | CareerJunction | Your Future Starts Here.
Web: www.careerjunction.co.za | Email: [EMAIL PROTECTED]
Phone: +27 21 686 6820 | Mobile: +27 82 858 7830 | Fax: +27 21 686 6842

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] db size

2008-04-14 Thread PFC

Hi

We currently have a 16CPU 32GB box running postgres 8.2.

When I do a pg_dump with the following parameters "/usr/bin/pg_dump -E  
UTF8 -F c -b" I get a file of 14GB in size.


But the database is 110GB in size on the disk.  Why the big difference  
in size?  Does this have anything to do with performance?


I have a 2GB database, which dumps to a 340 MB file...
Two reasons :

- I have lots of big fat but very necessary indexes (not included in 
dump)
- Dump is compressed with gzip which really works well on database data.

	If you suspect your tables or indexes are bloated, restore your dump to a  
test box.
	Use fsync=off during restore, you don't care about integrity on the test  
box.

This will avoid slowing down your production database.
Then look at the size of the restored database.
If it is much smaller than your production database, then you have 
bloat.
	Time to CLUSTER, or REINDEX, or VACUUM FULL (your choice), on the tables  
that are bloated, and take note to vacuum those more often (and perhaps  
tune the autovacuum).
	Judicious use of CLUSTER on that small, but extremely often updated table  
can also be a very good option.

8.3 and its new HOT feature are also a good idea.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] db size

2008-04-14 Thread Craig Ringer
Adrian Moisey wrote:
> Hi
> 
> We currently have a 16CPU 32GB box running postgres 8.2.
> 
> When I do a pg_dump with the following parameters "/usr/bin/pg_dump -E
> UTF8 -F c -b" I get a file of 14GB in size.
> 
> But the database is 110GB in size on the disk.  Why the big difference
> in size?  Does this have anything to do with performance?


Reasons:

You're using a compact format designed to limit size and provide fast
dump/restore. The database, by contrast, is designed for fast access.

The database can contain "dead space" that hasn't been reclaimed by a
VACUUM. It can also have space allocated that it doesn't need, which you
can reclaim with VACUUM FULL. This dead space can really add up, but
it's the price of fast updates, inserts and deletes.

Your indexes take up disk space in the database, but are not dumped and
do not take up space in the dump file. Indexes can get very large
especially if you have lots of multi-column indexes.

I'm told that under certain loads indexes can grow full of mostly empty
pages, and a REINDEX every now and then can be useful to shrink them -
see "\h reindex" in psql. That won't affect your dump sizes as indexes
aren't dumped, but will affect the database size.

You can examine index (and relation) sizes using a query like:

select * from pg_class order by relpages desc


Data in the database is either not compressed, or (for larger fields) is
compressed with an algorithm that's very fast but doesn't achieve high
levels of compression. By contrast, the dumps are quite efficiently
compressed.

One of my database clusters is 571MB on disk at the moment, just after
being dropped, recreated, and populated from another data source. The
repopulation process is quite complex. I found that running VACUUM FULL
followed by REINDEX DATABASE dbname knocked 50MB off the database size,
pushing it down to 521MB. That's on a basically brand new DB. Note,
however, that 130MB of that space is in pg_xlog, and much of it will be
wasted as the DB has been under very light load but uses large xlogs
because it needs to perform well under huge load spikes. The size of the
`base' directory (the "real data", indexes, etc) is only 392MB.

If I dump that database using the same options you dumped yours with, I
end up with a hilariously small 29MB dump file. That's less than 10% of
the size of the main DB. The difference will be entirely due to
compression, a more compact storage layout in the dump files, and to the
lack of index data in the dumps. The database has quite a few indexes,
some of which are multicolumn indexes on tables with large numbers of
tuples, so that bloats the "live" version a lot.

--
Craig Ringer

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] db size

2008-04-13 Thread Vinubalaji Gopal
Hi Adrian,



>When I do a pg_dump with the following parameters "/usr/bin/pg_dump -E
UTF8 -F c -b" I get a file of 14GB in size.


>From the man page of pg_dump
"
-F format, --format=format

 Selects the format of the output. format can be one of the following:
c
output a custom archive suitable for input into pg_restore. This is the most 
flexible format in that it allows reordering of data load as well as schema 
elements. This format is also compressed by default.
"

 The output is compressed and it is a dump of the database which contain the 
SQL commands:



>But the database is 110GB in size on the disk.  Why the big difference
>in size?  Does this have anything to do with performance?

VACUUM or VACUUM FULL of the entire database will reduce the size of the 
database by reclaiming any unused space and you can use the filesystem based 
backup or backup/restore strategy.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] db size

2008-04-13 Thread Adrian Moisey

Hi

We currently have a 16CPU 32GB box running postgres 8.2.

When I do a pg_dump with the following parameters "/usr/bin/pg_dump -E 
UTF8 -F c -b" I get a file of 14GB in size.


But the database is 110GB in size on the disk.  Why the big difference 
in size?  Does this have anything to do with performance?


--
Adrian Moisey
Systems Administrator | CareerJunction | Your Future Starts Here.
Web: www.careerjunction.co.za | Email: [EMAIL PROTECTED]
Phone: +27 21 686 6820 | Mobile: +27 82 858 7830 | Fax: +27 21 686 6842

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance