On 09/22/2015 12:16 PM, Shulgin, Oleksandr wrote:
Hi Hackers,

I've recently stumbled upon a problem with table bloat estimation in case there are columns of type JSON.

The quick bloat estimation queries use sum over pg_statistic.stawidth of table's columns, but in case of JSON the corresponding entry is never created by the ANALYZE command due to equality comparison operator missing. I understand why there is no such operator defined for this particular type, but shouldn't we still try to produce meaningful average width estimation?

In my case the actual bloat is around 40% as verified with pgstattuple, while the bloat reported by quick estimate can be between 75% and 95%(!) in three instances of this problem. We're talking about some hundreds of GB of miscalculation.

Attached patch against master makes the std_typanalyze still try to compute the minimal stats even if there is no "=" operator. Makes sense?

I could also find this report in archives that talks about similar problem, but due to all values being over the analyze threshold:

http://www.postgresql.org/message-id/flat/12480.1389370...@sss.pgh.pa.us#12480.1389370...@sss.pgh.pa.us

I think we could try harder, otherwise any estimate relying on average width can be way off in such cases.


Yes, "/revenons/ à /nos moutons/." You can set up text based comparison ops fairly easily for json - you just need to be aware of the limitations. See https://gist.github.com/adunstan/32ad224d7499d2603708

But I agree we should be able to do some analysis of types without comparison ops.

cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to