Josh Berkus <j...@agliodbs.com> writes:
> On 08/14/2014 07:24 PM, Tom Lane wrote:
>> We can certainly reduce that.  The question was whether it would be
>> worth the effort to try.  At this point, with three different test
>> data sets having shown clear space savings, I think it is worth
>> the effort.  I'll poke into it tomorrow or over the weekend, unless
>> somebody beats me to it.

> Note that I specifically created that data set to be a worst case: many
> top-level keys, no nesting, and small values.  However, I don't think
> it's an unrealistic worst case.

> Interestingly, even on the unpatched, 1GB table case, the *index* on the
> JSONB is only 60MB.  Which shows just how terrific the improvement in
> GIN index size/performance is.

I've been poking at this, and I think the main explanation for your result
is that with more JSONB documents being subject to compression, we're
spending more time in pglz_decompress.  There's no free lunch in that
department: if you want compressed storage it's gonna cost ya to
decompress.  The only way I can get decompression and TOAST access to not
dominate the profile on cases of this size is to ALTER COLUMN SET STORAGE
PLAIN.  However, when I do that, I do see my test patch running about 25%
slower overall than HEAD on an "explain analyze select jfield -> 'key'
from table" type of query with 200-key documents with narrow fields (see
attached perl script that generates the test data).

It seems difficult to improve much on that for this test case.  I put some
logic into findJsonbValueFromContainer to calculate the offset sums just
once not once per binary-search iteration, but that only improved matters
5% at best.  I still think it'd be worth modifying the JsonbIterator code
to avoid repetitive offset calculations, but that's not too relevant to
this test case.

Having said all that, I think this test is something of a contrived worst
case.  More realistic cases are likely to have many fewer keys (so that
speed of the binary search loop is less of an issue) or else to have total
document sizes large enough that inline PLAIN storage isn't an option,
meaning that detoast+decompression costs will dominate.

                        regards, tom lane

#! /usr/bin/perl

for (my $i = 0; $i < 100000; $i++) {
    print "{";
    for (my $k = 1; $k <= 200; $k++) {
	print ", " if $k > 1;
	printf "\"k%d\": %d", $k, int(rand(10));
    }
    print "}\n";
}
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to