Stephen Frost <sfr...@snowman.net> writes: > What about considering how large the object is when we are analyzing if > it compresses well overall?
Hmm, yeah, that's a possibility: we could redefine the limit at which we bail out in terms of a fraction of the object size instead of a fixed limit. However, that risks expending a large amount of work before we bail, if we have a very large incompressible object --- which is not exactly an unlikely case. Consider for example JPEG images stored as bytea, which I believe I've heard of people doing. Another issue is that it's not real clear that that fixes the problem for any fractional size we'd want to use. In Larry's example of a jsonb value that fails to compress, the header size is 940 bytes out of about 12K, so we'd be needing to trial-compress about 10% of the object before we reach compressible data --- and I doubt his example is worst-case. >> 1. The real problem here is that jsonb is emitting quite a bit of >> fundamentally-nonrepetitive data, even when the user-visible input is very >> repetitive. That's a compression-unfriendly transformation by anyone's >> measure. > I disagree that another algorithm wouldn't be able to manage better on > this data than pglz. pglz, from my experience, is notoriously bad a > certain data sets which other algorithms are not as poorly impacted by. Well, I used to be considered a compression expert, and I'm going to disagree with you here. It's surely possible that other algorithms would be able to get some traction where pglz fails to get any, but that doesn't mean that presenting them with hard-to-compress data in the first place is a good design decision. There is no scenario in which data like this is going to be friendly to a general-purpose compression algorithm. It'd be necessary to have explicit knowledge that the data consists of an increasing series of four-byte integers to be able to do much with it. And then such an assumption would break down once you got past the header ... > Perhaps another options would be a new storage type which basically says > "just compress it, no matter what"? We'd be able to make that the > default for jsonb columns too, no? Meh. We could do that, but it would still require adding arguments to toast_compress_datum() that aren't there now. In any case, this is a band-aid solution; and as Josh notes, once we ship 9.4 we are going to be stuck with jsonb's on-disk representation pretty much forever. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers