On 09/16/2014 07:47 PM, Josh Berkus wrote:
On 09/16/2014 06:31 AM, Robert Haas wrote:On Mon, Sep 15, 2014 at 7:44 PM, Peter Geoghegan <p...@heroku.com> wrote:On Mon, Sep 15, 2014 at 4:05 PM, Josh Berkus <j...@agliodbs.com> wrote:Actually, having the keys all at the same level *is* relevant for the issue we're discussing. If those 270 keys are organized in a tree, it's not the same as having them all on one level (and not as problematic).I believe Robert meant that the 270 keys are not at the top level, but are at some level (in other words, some object has 270 pairs). That is equivalent to having them at the top level for the purposes of this discussion.Yes, that's exactly what I meant.FWIW, I am slightly concerned about weighing use cases around very large JSON documents too heavily. Having enormous jsonb documents just isn't going to work out that well, but neither will equivalent designs in popular document database systems for similar reasons. For example, the maximum BSON document size supported by MongoDB is 16 megabytes, and that seems to be something that their users don't care too much about. Having 270 pairs in an object isn't unreasonable, but it isn't going to be all that common either.Well, I can only judge from the use cases I personally have, none of which involve more than 100 keys at any level for most rows. So far I've seen some people argue hypotetical use cases involving hundreds of keys per level, but nobody who *actually* has such a use case. Also, note that we currently don't know where the "last value" extraction becomes a performance problem at this stage, except that it's somewhere between 200 and 100,000. Also, we don't have a test which shows the hybrid approach (Heikki's patch) performing better with 1000's of keys. Basically, if someone is going to make a serious case for Heikki's hybrid approach over the simpler lengths-only approach, then please post some test data showing the benefit ASAP, since I can't demonstrate it. Otherwise, let's get beta 3 out the door so we can get the 9.4 release train moving again.
Are you looking for someone with a real life scenario, or just synthetic test case? The latter is easy to do.
See attached test program. It's basically the same I posted earlier. Here are the results from my laptop with Tom's jsonb-lengths-merged.patch:
postgres=# select * from testtimes ; elem | duration_ms ------+------------- 11 | 0.289508 12 | 0.288122 13 | 0.290558 14 | 0.287889 15 | 0.286303 17 | 0.290415 19 | 0.289829 21 | 0.289783 23 | 0.287104 25 | 0.289834 28 | 0.290735 31 | 0.291844 34 | 0.293454 37 | 0.293866 41 | 0.291217 45 | 0.289243 50 | 0.290385 55 | 0.292085 61 | 0.290892 67 | 0.292335 74 | 0.292561 81 | 0.291416 89 | 0.295714 98 | 0.29844 108 | 0.297421 119 | 0.299471 131 | 0.299877 144 | 0.301604 158 | 0.303365 174 | 0.304203 191 | 0.303596 210 | 0.306526 231 | 0.304189 254 | 0.307782 279 | 0.307372 307 | 0.306873 338 | 0.310471 372 | 0.3151 409 | 0.320354 450 | 0.32038 495 | 0.322127 545 | 0.323256 600 | 0.330419 660 | 0.334226 726 | 0.336951 799 | 0.34108 879 | 0.347746 967 | 0.354275 1064 | 0.356696 1170 | 0.366906 1287 | 0.375352 1416 | 0.392952 1558 | 0.392907 1714 | 0.402157 1885 | 0.412384 2074 | 0.425958 2281 | 0.435415 2509 | 0.45301 2760 | 0.469983 3036 | 0.487329 3340 | 0.505505 3674 | 0.530412 4041 | 0.552585 4445 | 0.581815 4890 | 0.610509 5379 | 0.642885 5917 | 0.680395 6509 | 0.713849 7160 | 0.757561 7876 | 0.805225 8664 | 0.856142 9530 | 0.913255 (72 rows)That's up to 9530 elements - it's pretty easy to extrapolate from there to higher counts, it's O(n).
With unpatched git master, the runtime is flat, regardless of which element is queried, at about 0.29 s. With jsonb-with-offsets-and-lengths-2.patch, there's no difference that I could measure.
The difference starts to be meaningful at around 500 entries. In practice, I doubt anyone's going to notice until you start talking about tens of thousands of entries.
I'll leave it up to the jury to decide if we care or not. It seems like a fairly unusual use case, where you push around large enough arrays or objects to notice. Then again, I'm sure *someone* will do it. People do strange things, and they find ways to abuse the features that the original developers didn't think of.
- Heikki
jsonb-lengths.sql
Description: application/sql
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers