Re: [sqlite] Memoization in sqlite json1 functions

Deon Brewis Thu, 23 Mar 2017 15:18:07 -0700

It has more to do with how you parse JSON - if you want to build it into an 
object tree, sure, then you're obviously dead in the water with any kind of 
json.

If you however can use a forward-only push or pull parser like a SAX or StAX 
parse, it's a different story. I'm using a StAX-like pull parser for a binary 
json-ish internal format we have, and reading & parsing through it is on par 
with the performance of reading equivalent SQLITE columns directly (apart from 
the expression indexed covered value lookup scenario... grumble... grumble...).

So binary-json like formats can perform well - you just can't use an object 
tree to parse them.

That obviously implies if you do random-access into a structure you have to 
keep reparsing it (which is where Memoization would be nice). However, CPU 
caches are better at reading continues data streams in forward-only fashion 
than they are with pointers, so forward-only pull parsers, even when you have 
to repeat the entire parse, are often faster than the math behind it suggests. 
(In the way that scanning an unsorted vector in O(n) is often times faster than 
searching O(log n) through a map).

Besides, in 99% of cases my users take the outcome from a json parse and just 
store the results into a C++ data structure anyway. In that case the 
intermediary object tree is just a throwaway and you may as well have built the 
C++ structure up using a pull or push parser. It's very like extra work, and 
it's way... way... faster. 

- Deon

-----Original Message-----
From: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] On 
Behalf Of Jens Alfke
Sent: Thursday, March 23, 2017 11:05 AM
To: SQLite mailing list <sqlite-users@mailinglists.sqlite.org>
Subject: Re: [sqlite] Memoization in sqlite json1 functions

> On Mar 23, 2017, at 4:30 AM, Richard Hipp <d...@sqlite.org> wrote:
> 
> BLOBs are reserved for a future enhancement in which BLOBs will store the 
> binary encoding for JSON.

I’ve been down this road. There are a number of JSON-compatible binary encoding 
formats, but most of them don’t save much time, because (a) they’re not _that_ 
much faster to parse, (b) reading JSON tends to be dominated by allocating an 
object tree, not by the actual parsing, and (c) usually you have to parse the 
entire data even if you only want to use one piece of it [as in a query].

I ended up designing and implementing a new format, called Fleece*. Its 
advantage is that it doesn’t require parsing or even memory allocation. The 
internal structure is already an object tree, except that it uses compressed 
relative offsets instead of pointers. This means that internal pointers into 
Fleece data can be used directly as the data objects.

In my current project** we’re storing Fleece in SQLite instead of JSON, with a 
modified version of the json1 extension to make it accessible in queries. It 
works very well. The Fleece equivalent of json_extract( ) just does some 
pointer manipulation to find the root object in the blob, then further lookups 
to jump to each nested object in the path.

—Jens

* https://github.com/couchbaselabs/fleece 
<https://github.com/couchbaselabs/fleece>
** https://github.com/couchbase/couchbase-lite-core
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Memoization in sqlite json1 functions

Reply via email to