I wrote this query to identify how much space is being wasted in a database by denormalized data. For each field in each table, it calculates how much space the field is taking in it's table as well as indexes, then it calculates how much space would be needed if that field was normalized into a separate table. It places some (somewhat arbitrary) minimums on how much space would have to be saved to include that field in the output. If you want to get rid of the limit you should still keep savings > 0 in the query, otherwise you'll start seeing normalization suggestions that make no sense (like normalizing an int).
As it stands, the query reports a total for the entire database, but you
can use just sub-sections of the query to see savings per-table, or
per-field.
-- Total
SELECT pg_size_pretty(sum(sum)::bigint)
-- Summarize by table
FROM (SELECT schemaname, tablename, sum(savings),
pg_size_pretty(sum(savings)::bigint)
-- Get pretty size. Start here if you want per-table info
FROM (SELECT *, pg_size_pretty(savings::bigint)
FROM (
-- Here's where the real work starts
SELECT *, table_space_delta + coalesce(index_space_delta, 0) - side_table_space
AS savings
FROM (
-- Figure out how much space we'd save in indexes by converting to an
int
SELECT *, index_tuple_count*(avg_width-4) AS index_space_delta
FROM (SELECT s.*, index_count, index_tuple_count
FROM (SELECT schemaname, tablename, attname, null_frac,
avg_width, n_distinct
-- How much space would we gain by
changing this field to an int?
,
reltuples*(1-null_frac)*(avg_width-4) AS table_space_delta
-- Estimate how big our "side
table" will be
, CASE WHEN n_distinct >= 0 THEN
n_distinct ELSE -n_distinct * reltuples END
*
(24+4+avg_width+6+4+6+avg_width) AS side_table_space
FROM pg_stats s
JOIN pg_class c ON c.relname=tablename AND
c.relkind='r'
JOIN pg_namespace n ON n.oid =
c.relnamespace AND n.nspname=s.schemaname
WHERE schemaname NOT IN
('pg_catalog','information_schema')
) s
NATURAL LEFT JOIN (
SELECT n.nspname AS schemaname, c.relname
AS tablename, attname
, count(*) AS index_count,
sum(i.reltuples) AS index_tuple_count
FROM pg_index x
JOIN pg_class c ON c.oid =
x.indrelid
JOIN pg_class i ON i.oid =
x.indexrelid
JOIN pg_namespace n ON n.oid =
c.relnamespace
JOIN pg_attribute a ON a.attrelid =
i.oid
GROUP BY n.nspname, c.relname, attname
) i
) a
) a
) a
-- Minimum savings to consider per-field
WHERE savings > 1e6) a
GROUP BY schemaname, tablename
-- Minimum savings to consider per-table
HAVING sum(savings) > 1e7
ORDER BY sum(savings) DESC
) a
;
--
Decibel!, aka Jim C. Nasby, Database Architect [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828
pgpPK8JSpK8sA.pgp
Description: PGP signature
