On Thu, May 8, 2014 at 11:45 AM, Heikki Linnakangas <hlinnakan...@vmware.com > wrote:
> On 05/08/2014 02:25 AM, Peter Geoghegan wrote: > >> findJsonbValueFromSuperHeader()'s lowbound argument >> previously served to establish a low bound for searching when >> searching for multiple keys (so the second and subsequent >> user-supplied key could skip much of the object). >> > > Got that. > > In the case of >> jsonb_exists_any(), say, if you only have a reasonable expectation >> that about 1 key exists, and that happens to be the last key that the >> user passed to the text[] argument (to the existence/? operator), then >> n - 1 calls to what is now findJsonbValueFromContainer() (which now >> does not accept a lowbound) are wasted. >> > > Check. > > That's elem_count - 1 >> top-level binary searches of the entire jsonb. Or elem_count such >> calls rather than 1 call (plus 1 sort of the supplied array) in the >> common case where jsonb_exists_any() will return false. >> > > Ok, but I don't see any big difference in that regard. It still called > findJsonbValueFromContainer() elem_count times, before this commit. Each > call was somewhat cheaper, because the lower bound of the binary search was > initialized to where the previous search ended, but you still had to > perform the search. > > Granted, that might not be that bad right now, given that it's only >> ever (say) elem_count or elem_count - 1 wasted binary searches through >> the *top* level, but that might not always be true. >> > > If we are ever to perform a deep search, I think we'll want to do much > more to optimize that than just keep track of the lower bound. Like, start > an iterator of tree and check for all of the keys in one go. > > And even today, >> sorting a presumably much smaller user-passed lookup array once has to >> be cheaper than searching through the entire jsonb perhaps elem_count >> times per call. >> > > Well, the quick testing I did suggested otherwise. It's not a big > difference, but sorting has all kinds of overhead, like pallocing the array > to sort, copying the elements around etc. For a small array, the startup > cost of sorting trumps the savings in the binary searches. Possibly the way > the sorting was done was not optimal, and you could reduce the copying and > allocations involved in that, but it's hardly worth the trouble. With current head I can't load delicious dataset into jsonb format. I got segfault. It looks like memory corruption. $ gzip -c -d js.copy.gz | psql postgres -c 'copy js from stdin;' WARNING: problem in alloc set ExprContext: bogus aset link in block 0x14a846000, chunk 0x14a84d278 CONTEXT: COPY js, line 246766 WARNING: problem in alloc set ExprContext: bad single-chunk 0x14a804b18 in block 0x14a7e6000 CONTEXT: COPY js, line 1009820 WARNING: problem in alloc set ExprContext: bogus aset link in block 0x14a7e6000, chunk 0x14a804b18 CONTEXT: COPY js, line 1009820 server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. You can get dataset here: http://mira.sai.msu.ru/~megera/tmp/js.copy.gz ------ With best regards, Alexander Korotkov.