On Fri, Feb 21, 2020 at 6:03 AM Richard Hipp <d...@sqlite.org> wrote:
> On 2/21/20, Wout Mertens <wout.mert...@gmail.com> wrote: > > The idea is that upon storing the JSON > > data, the JSON1 extension parses it, extracts the layouts recursively, > > stores them when they are not known yet, and then only stores the > > values in the binary format with the layout identifiers. > > I experimented with a number of similar ideas for storing JSON when I > was first designing the JSON components for SQLite. I was never able > to find anything that was as fast or as compact as just storing the > original JSON text. But I could have overlooked something. If you > have example code for a mechanism that is more space efficient and/or > faster, please share it with us. > text is as long as text is, and numbers, for small ranges, are also compressed to 2 bytes (one for a separator, or opener, and 1 for the value) gets you 0-9 (0-64 if you base64 encode it)... looking at just the data part of JSON. You end up with a lot of overhead from the repeated field name definition. I created a format https://github.com/d3x0r/jsox#jsox--javascript-object-exchange-format that is compatible with existing JSON, but adds the ability to specify 'class' definitions. There's a specification of the grammar in bnf format, and pictures... It tracks the current parsing state, 0, initial being called 'unknown'. If a string is found in an unknown state, followed by an object, then that defines a class type... 'record{id,title,author,data}' then after than, a second occurrence in the unknown state, or within an array or object context, '[record{1342,"book","person",1}]' would use the existing list of names in order with the values, and build an object that was { id:1342,title:"book",author:"person",data:1 }. 'record' could be shortened to any single unicode character, otherwise the saving isn't so great. The definition of 'string' is sort of loose in JSOX, as long as there isn't a format control character ( whitespace, ':', '{', '}', '[', ']' ) you don't need quotes around a sequence of characters to make a string; excepting of course starting with characters that look like a number, and/or match a keyword... The triggering of the mode is '{' after a string, or while collecting a string I also extended the number format of JSON to allow specifying ISO-8601 times as numbers.... (just have to special case in addition to '.'; ':' 'T' 'Z' '-' (inline and not just at start)). other than that; if space is really a concern, maybe a zip layer? J > -- > D. Richard Hipp > d...@sqlite.org > _______________________________________________ > sqlite-users mailing list > sqlite-users@mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > _______________________________________________ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users