Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-08 Thread Larry White
I was not complaining; I think JSONB is awesome.

But I am one of those people who would like to put 100's of GB (or more)
JSON files into Postgres and I am concerned about file size and possible
future changes to the format.


On Fri, Aug 8, 2014 at 7:10 PM, Peter Geoghegan p...@heroku.com wrote:

 On Fri, Aug 8, 2014 at 12:06 PM, Josh Berkus j...@agliodbs.com wrote:
  One we ship 9.4, many users are going to load 100's of GB into JSONB
  fields.  Even if we fix the compressability issue in 9.5, those users
  won't be able to fix the compression without rewriting all their data,
  which could be prohibitive.  And we'll be  in a position where we have
  to support the 9.4 JSONB format/compression technique for years so that
  users aren't blocked from upgrading.

 FWIW, if we take the delicious JSON data as representative, a table
 storing that data as jsonb is 1374 MB in size. Whereas an equivalent
 table with the data typed using the original json datatype (but with
 white space differences more or less ignored, because it was created
 using a jsonb - json cast), the same data is 1352 MB.

 Larry's complaint is valid; this is a real problem, and I'd like to
 fix it before 9.4 is out. However, let us not lose sight of the fact
 that JSON data is usually a poor target for TOAST compression. With
 idiomatic usage, redundancy is very much more likely to appear across
 rows, and not within individual Datums. Frankly, we aren't doing a
 very good job there, and doing better requires an alternative
 strategy.

 --
 Peter Geoghegan



Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-07 Thread Larry White
Apologies if this is a ridiculous suggestion, but I believe that swapping
out the compression algorithm (for Snappy, for example) has been discussed
in the past. I wonder if that algorithm is sufficiently different that it
would produce a better result, and if that might not be preferable to some
of the other options.


On Thu, Aug 7, 2014 at 11:17 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 I looked into the issue reported in bug #11109.  The problem appears to be
 that jsonb's on-disk format is designed in such a way that the leading
 portion of any JSON array or object will be fairly incompressible, because
 it consists mostly of a strictly-increasing series of integer offsets.
 This interacts poorly with the code in pglz_compress() that gives up if
 it's found nothing compressible in the first first_success_by bytes of a
 value-to-be-compressed.  (first_success_by is 1024 in the default set of
 compression parameters.)

 As an example, here's gdb's report of the bitwise representation of the
 example JSON value in the bug thread:

 0x2ab85ac:  0x2005  0x0004  0x50003098  0x309f
 0x2ab85bc:  0x30ae  0x30b8  0x30cf  0x30da
 0x2ab85cc:  0x30df  0x30ee  0x3105  0x6b6e756a
 0x2ab85dc:  0x40de  0x0034  0x0068  0x009c
 0x2ab85ec:  0x00d0  0x0104  0x0138  0x016c
 0x2ab85fc:  0x01a0  0x01d4  0x0208  0x023c
 0x2ab860c:  0x0270  0x02a4  0x02d8  0x030c
 0x2ab861c:  0x0340  0x0374  0x03a8  0x03dc
 0x2ab862c:  0x0410  0x0444  0x0478  0x04ac
 0x2ab863c:  0x04e0  0x0514  0x0548  0x057c
 0x2ab864c:  0x05b0  0x05e4  0x0618  0x064c
 0x2ab865c:  0x0680  0x06b4  0x06e8  0x071c
 0x2ab866c:  0x0750  0x0784  0x07b8  0x07ec
 0x2ab867c:  0x0820  0x0854  0x0888  0x08bc
 0x2ab868c:  0x08f0  0x0924  0x0958  0x098c
 0x2ab869c:  0x09c0  0x09f4  0x0a28  0x0a5c
 0x2ab86ac:  0x0a90  0x0ac4  0x0af8  0x0b2c
 0x2ab86bc:  0x0b60  0x0b94  0x0bc8  0x0bfc
 0x2ab86cc:  0x0c30  0x0c64  0x0c98  0x0ccc
 0x2ab86dc:  0x0d00  0x0d34  0x0d68  0x0d9c
 0x2ab86ec:  0x0dd0  0x0e04  0x0e38  0x0e6c
 0x2ab86fc:  0x0ea0  0x0ed4  0x0f08  0x0f3c
 0x2ab870c:  0x0f70  0x0fa4  0x0fd8  0x100c
 0x2ab871c:  0x1040  0x1074  0x10a8  0x10dc
 0x2ab872c:  0x1110  0x1144  0x1178  0x11ac
 0x2ab873c:  0x11e0  0x1214  0x1248  0x127c
 0x2ab874c:  0x12b0  0x12e4  0x1318  0x134c
 0x2ab875c:  0x1380  0x13b4  0x13e8  0x141c
 0x2ab876c:  0x1450  0x1484  0x14b8  0x14ec
 0x2ab877c:  0x1520  0x1554  0x1588  0x15bc
 0x2ab878c:  0x15f0  0x1624  0x1658  0x168c
 0x2ab879c:  0x16c0  0x16f4  0x1728  0x175c
 0x2ab87ac:  0x1790  0x17c4  0x17f8  0x182c
 0x2ab87bc:  0x1860  0x1894  0x18c8  0x18fc
 0x2ab87cc:  0x1930  0x1964  0x1998  0x19cc
 0x2ab87dc:  0x1a00  0x1a34  0x1a68  0x1a9c
 0x2ab87ec:  0x1ad0  0x1b04  0x1b38  0x1b6c
 0x2ab87fc:  0x1ba0  0x1bd4  0x1c08  0x1c3c
 0x2ab880c:  0x1c70  0x1ca4  0x1cd8  0x1d0c
 0x2ab881c:  0x1d40  0x1d74  0x1da8  0x1ddc
 0x2ab882c:  0x1e10  0x1e44  0x1e78  0x1eac
 0x2ab883c:  0x1ee0  0x1f14  0x1f48  0x1f7c
 0x2ab884c:  0x1fb0  0x1fe4  0x2018  0x204c
 0x2ab885c:  0x2080  0x20b4  0x20e8  0x211c
 0x2ab886c:  0x2150  0x2184  0x21b8  0x21ec
 0x2ab887c:  0x2220  0x2254  0x2288  0x22bc
 0x2ab888c:  0x22f0  0x2324  0x2358  0x238c
 0x2ab889c:  0x23c0  0x23f4  0x2428  0x245c
 0x2ab88ac:  0x2490  0x24c4  0x24f8  0x252c
 0x2ab88bc:  0x2560  0x2594  0x25c8  0x25fc
 0x2ab88cc:  0x2630  0x2664  0x2698  0x26cc
 0x2ab88dc:  0x2700  0x2734  0x2768