On 09/22/2015 12:16 PM, Shulgin, Oleksandr wrote:
Hi Hackers,
I've recently stumbled upon a problem with table bloat estimation in
case there are columns of type JSON.
The quick bloat estimation queries use sum over pg_statistic.stawidth
of table's columns, but in case of JSON the corresponding entry is
never created by the ANALYZE command due to equality comparison
operator missing. I understand why there is no such operator defined
for this particular type, but shouldn't we still try to produce
meaningful average width estimation?
In my case the actual bloat is around 40% as verified with
pgstattuple, while the bloat reported by quick estimate can be between
75% and 95%(!) in three instances of this problem. We're talking
about some hundreds of GB of miscalculation.
Attached patch against master makes the std_typanalyze still try to
compute the minimal stats even if there is no "=" operator. Makes sense?
I could also find this report in archives that talks about similar
problem, but due to all values being over the analyze threshold:
http://www.postgresql.org/message-id/flat/12480.1389370...@sss.pgh.pa.us#12480.1389370...@sss.pgh.pa.us
I think we could try harder, otherwise any estimate relying on average
width can be way off in such cases.
Yes, "/revenons/ à /nos moutons/." You can set up text based comparison
ops fairly easily for json - you just need to be aware of the
limitations. See https://gist.github.com/adunstan/32ad224d7499d2603708
But I agree we should be able to do some analysis of types without
comparison ops.
cheers
andrew
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers