On Wed, Aug 13, 2014 at 09:01:43PM -0400, Tom Lane wrote: > I wrote: > > That's a fair question. I did a very very simple hack to replace the item > > offsets with item lengths -- turns out that that mostly requires removing > > some code that changes lengths to offsets ;-). I then loaded up Larry's > > example of a noncompressible JSON value, and compared pg_column_size() > > which is just about the right thing here since it reports datum size after > > compression. Remembering that the textual representation is 12353 bytes: > > > json: 382 bytes > > jsonb, using offsets: 12593 bytes > > jsonb, using lengths: 406 bytes > > Oh, one more result: if I leave the representation alone, but change > the compression parameters to set first_success_by to INT_MAX, this > value takes up 1397 bytes. So that's better, but still more than a > 3X penalty compared to using lengths. (Admittedly, this test value > probably is an outlier compared to normal practice, since it's a hundred > or so repetitions of the same two strings.)
Uh, can we get compression for actual documents, rather than duplicate strings? -- Bruce Momjian <br...@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers