> On Feb 21, 2020, at 4:20 AM, Wout Mertens <wout.mert...@gmail.com> wrote:
> 
> I was wondering if the JSON extension could not do the same thing: for each
> table, keep a hidden stash of object layouts, and store the values as
> sqlite primitives. (you'd be able to disable this, in case the layouts
> rarely repeat)

We do something a bit like this in Couchbase Lite[1]. We don't directly store 
JSON, we store a binary data format we designed called Fleece, which has the 
same semantics but is faster to traverse and more compact.

https://github.com/couchbaselabs/fleece

In Fleece a JSON object is represented as an array of fixed-width {key, value} 
pairs.
- The keys are typically 16-bit integers that map into a global (per-database) 
table of common key strings.
- The values are either inline if they fit (small ints, boolean, null) or else 
offsets to where the real data is.
The array is sorted by key, so lookup is a binary search, usually on 16-bit 
ints.

Numbers are stored in a compact variable-width encoding. Strings are unescaped 
UTF-8, and multiple occurrences of the same string are only stored once.

(I've recently discovered that this is similar to FlexBuffers. Fleece has been 
around since 2015, so I don't know if the designers of FlexBuffers looked at 
it…)

I've thought about the 'layouts' concept but haven't tried implementing it yet. 
(Most JavaScript runtimes use something like it, btw.) It would only save about 
4*n bytes per record, where n is the average number of keys in a layout, and a 
small number of clock cycles per key lookup.

> On Feb 21, 2020, at 6:03 AM, Richard Hipp <d...@sqlite.org> wrote:
> 
> I experimented with a number of similar ideas for storing JSON when I
> was first designing the JSON components for SQLite.  I was never able
> to find anything that was as fast or as compact as just storing the
> original JSON text.

I had similar experiences with BSON, Bencode, etc. In my use case I found that 
the major expense wasn't parsing, rather allocating an object tree. So I 
designed the Fleece format to be fully useable in-place without requiring 
parsing. A secondary factor is that traversing objects and arrays is expensive 
because the items are variable width, so you have to parse through all the 
(potentially nested) items before the one you're looking for. That's why Fleece 
objects and arrays are fixed-width, using offsets to point to variable-size 
values.

—Jens

[1] https://www.couchbase.com/products/mobile
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to