Hi everyone,

I assume this is not easy with standard PG but I wanted to double check.

I have a column that has a very uneven distribution of values.  ~95% of the 
values will be the same, with some long tail of another few dozens of values.

I want to have an index over this value.  Queries that select the most common 
value will not use the index, because it is a overwhelming percentage of the 
table.  This means that ~95% of the disk space and IOPS to maintain the index 
is "wasted".

I cannot use a hardcoded partial index because:
1) The common value is not known at schema definition time, and may change 
(very slowly) over time.
2) JDBC uses prepared statements for everything, and the value to be selected 
is not known at statement prepare time, so any partial indices are ignored 
(this is a really really obnoxious behavior and makes partial indices almost 
useless combined with prepared statements, sadly…)

The table size is expected to approach the 0.5 billion row mark within the next 
few months, hence my eagerness to save even seemingly small amounts of per-row 
costs.

Curious if anyone has a good way to approach this problem.
Thanks,
Steven



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Reply via email to