On Thu, May 8, 2014 at 11:45 AM, Heikki Linnakangas <hlinnakan...@vmware.com
> wrote:

> On 05/08/2014 02:25 AM, Peter Geoghegan wrote:
>
>> findJsonbValueFromSuperHeader()'s lowbound argument
>> previously served to establish a low bound for searching when
>> searching for multiple keys (so the second and subsequent
>> user-supplied key could skip much of the object).
>>
>
> Got that.
>
>  In the case of
>> jsonb_exists_any(), say, if you only have a reasonable expectation
>> that about 1 key exists, and that happens to be the last key that the
>> user passed to the text[] argument (to the existence/? operator), then
>> n - 1 calls to what is now findJsonbValueFromContainer() (which now
>> does not accept a lowbound) are wasted.
>>
>
> Check.
>
>  That's elem_count - 1
>> top-level binary searches of the entire jsonb. Or elem_count such
>> calls rather than 1 call (plus 1 sort of the supplied array) in the
>> common case where jsonb_exists_any() will return false.
>>
>
> Ok, but I don't see any big difference in that regard. It still called
> findJsonbValueFromContainer() elem_count times, before this commit. Each
> call was somewhat cheaper, because the lower bound of the binary search was
> initialized to where the previous search ended, but you still had to
> perform the search.
>
>  Granted, that might not be that bad right now, given that it's only
>> ever (say) elem_count or elem_count - 1 wasted binary searches through
>> the *top* level, but that might not always be true.
>>
>
> If we are ever to perform a deep search, I think we'll want to do much
> more to optimize that than just keep track of the lower bound. Like, start
> an iterator of tree and check for all of the keys in one go.
>
>  And even today,
>> sorting a presumably much smaller user-passed lookup array once has to
>> be cheaper than searching through the entire jsonb perhaps elem_count
>> times per call.
>>
>
> Well, the quick testing I did suggested otherwise. It's not a big
> difference, but sorting has all kinds of overhead, like pallocing the array
> to sort, copying the elements around etc. For a small array, the startup
> cost of sorting trumps the savings in the binary searches. Possibly the way
> the sorting was done was not optimal, and you could reduce the copying and
> allocations involved in that, but it's hardly worth the trouble.


With current head I can't load delicious dataset into jsonb format. I got
segfault. It looks like memory corruption.

$ gzip -c -d js.copy.gz | psql postgres -c 'copy js from stdin;'
WARNING:  problem in alloc set ExprContext: bogus aset link in block
0x14a846000, chunk 0x14a84d278
CONTEXT:  COPY js, line 246766
WARNING:  problem in alloc set ExprContext: bad single-chunk 0x14a804b18 in
block 0x14a7e6000
CONTEXT:  COPY js, line 1009820
WARNING:  problem in alloc set ExprContext: bogus aset link in block
0x14a7e6000, chunk 0x14a804b18
CONTEXT:  COPY js, line 1009820
server closed the connection unexpectedly
 This probably means the server terminated abnormally
before or while processing the request.

You can get dataset here:
http://mira.sai.msu.ru/~megera/tmp/js.copy.gz

------
With best regards,
Alexander Korotkov.

Reply via email to