On Fri, Aug 8, 2014 at 10:48 AM, Stephen Frost <sfr...@snowman.net> wrote:
> * Tom Lane (t...@sss.pgh.pa.us) wrote: > > I looked into the issue reported in bug #11109. The problem appears to > be > > that jsonb's on-disk format is designed in such a way that the leading > > portion of any JSON array or object will be fairly incompressible, > because > > it consists mostly of a strictly-increasing series of integer offsets. > > This interacts poorly with the code in pglz_compress() that gives up if > > it's found nothing compressible in the first first_success_by bytes of a > > value-to-be-compressed. (first_success_by is 1024 in the default set of > > compression parameters.) > > I haven't looked at this in any detail, so take this with a grain of > salt, but what about teaching pglz_compress about using an offset > farther into the data, if the incoming data is quite a bit larger than > 1k? This is just a test to see if it's worthwhile to keep going, no? I > wonder if this might even be able to be provided as a type-specific > option, to avoid changing the behavior for types other than jsonb in > this regard. > > +1 for offset. Or sample the data in the beginning, middle and end. Obviously one could always come up with worst case, but. > (I'm imaginging a boolean saying "pick a random sample", or perhaps a > function which can be called that'll return "here's where you wanna test > if this thing is gonna compress at all") > > I'm rather disinclined to change the on-disk format because of this > specific test, that feels a bit like the tail wagging the dog to me, > especially as I do hope that some day we'll figure out a way to use a > better compression algorithm than pglz. > > Thanks, > > Stephen > -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company