Re: [HACKERS] jsonb format is pessimal for toast compression

Heikki Linnakangas Tue, 16 Sep 2014 11:13:53 -0700

On 09/16/2014 07:47 PM, Josh Berkus wrote:

On 09/16/2014 06:31 AM, Robert Haas wrote:

On Mon, Sep 15, 2014 at 7:44 PM, Peter Geoghegan <p...@heroku.com> wrote:

On Mon, Sep 15, 2014 at 4:05 PM, Josh Berkus <j...@agliodbs.com> wrote:

Actually, having the keys all at the same level *is* relevant for the
issue we're discussing.  If those 270 keys are organized in a tree, it's
not the same as having them all on one level (and not as problematic).


I believe Robert meant that the 270 keys are not at the top level, but
are at some level (in other words, some object has 270 pairs). That is
equivalent to having them at the top level for the purposes of this
discussion.


Yes, that's exactly what I meant.

FWIW, I am slightly concerned about weighing use cases around very
large JSON documents too heavily. Having enormous jsonb documents just
isn't going to work out that well, but neither will equivalent designs
in popular document database systems for similar reasons. For example,
the maximum BSON document size supported by MongoDB is 16 megabytes,
and that seems to be something that their users don't care too much
about. Having 270 pairs in an object isn't unreasonable, but it isn't
going to be all that common either.


Well, I can only judge from the use cases I personally have, none of
which involve more than 100 keys at any level for most rows.  So far
I've seen some people argue hypotetical use cases involving hundreds of
keys per level, but nobody who *actually* has such a use case.  Also,
note that we currently don't know where the "last value" extraction
becomes a performance problem at this stage, except that it's somewhere
between 200 and 100,000.  Also, we don't have a test which shows the
hybrid approach (Heikki's patch) performing better with 1000's of keys.

Basically, if someone is going to make a serious case for Heikki's
hybrid approach over the simpler lengths-only approach, then please post
some test data showing the benefit ASAP, since I can't demonstrate it.
Otherwise, let's get beta 3 out the door so we can get the 9.4 release
train moving again.

Are you looking for someone with a real life scenario, or just synthetic test case? The latter is easy to do.

See attached test program. It's basically the same I posted earlier. Here are the results from my laptop with Tom's jsonb-lengths-merged.patch:


postgres=# select * from testtimes ;
 elem | duration_ms
------+-------------
   11 |    0.289508
   12 |    0.288122
   13 |    0.290558
   14 |    0.287889
   15 |    0.286303
   17 |    0.290415
   19 |    0.289829
   21 |    0.289783
   23 |    0.287104
   25 |    0.289834
   28 |    0.290735
   31 |    0.291844
   34 |    0.293454
   37 |    0.293866
   41 |    0.291217
   45 |    0.289243
   50 |    0.290385
   55 |    0.292085
   61 |    0.290892
   67 |    0.292335
   74 |    0.292561
   81 |    0.291416
   89 |    0.295714
   98 |     0.29844
  108 |    0.297421
  119 |    0.299471
  131 |    0.299877
  144 |    0.301604
  158 |    0.303365
  174 |    0.304203
  191 |    0.303596
  210 |    0.306526
  231 |    0.304189
  254 |    0.307782
  279 |    0.307372
  307 |    0.306873
  338 |    0.310471
  372 |      0.3151
  409 |    0.320354
  450 |     0.32038
  495 |    0.322127
  545 |    0.323256
  600 |    0.330419
  660 |    0.334226
  726 |    0.336951
  799 |     0.34108
  879 |    0.347746
  967 |    0.354275
 1064 |    0.356696
 1170 |    0.366906
 1287 |    0.375352
 1416 |    0.392952
 1558 |    0.392907
 1714 |    0.402157
 1885 |    0.412384
 2074 |    0.425958
 2281 |    0.435415
 2509 |     0.45301
 2760 |    0.469983
 3036 |    0.487329
 3340 |    0.505505
 3674 |    0.530412
 4041 |    0.552585
 4445 |    0.581815
 4890 |    0.610509
 5379 |    0.642885
 5917 |    0.680395
 6509 |    0.713849
 7160 |    0.757561
 7876 |    0.805225
 8664 |    0.856142
 9530 |    0.913255
(72 rows)

That's up to 9530 elements - it's pretty easy to extrapolate from there to higher counts, it's O(n).

With unpatched git master, the runtime is flat, regardless of which element is queried, at about 0.29 s. With jsonb-with-offsets-and-lengths-2.patch, there's no difference that I could measure.

The difference starts to be meaningful at around 500 entries. In practice, I doubt anyone's going to notice until you start talking about tens of thousands of entries.

I'll leave it up to the jury to decide if we care or not. It seems like a fairly unusual use case, where you push around large enough arrays or objects to notice. Then again, I'm sure *someone* will do it. People do strange things, and they find ways to abuse the features that the original developers didn't think of.


- Heikki

jsonb-lengths.sql
Description: application/sql

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb format is pessimal for toast compression

Reply via email to