On Fri, Feb 21, 2020 at 6:03 AM Richard Hipp <d...@sqlite.org> wrote:

> On 2/21/20, Wout Mertens <wout.mert...@gmail.com> wrote:
> > The idea is that upon storing the JSON
> > data, the JSON1 extension parses it, extracts the layouts recursively,
> > stores them when they are not known yet, and then only stores the
> > values in the binary format with the layout identifiers.
>
> I experimented with a number of similar ideas for storing JSON when I
> was first designing the JSON components for SQLite.  I was never able
> to find anything that was as fast or as compact as just storing the
> original JSON text.  But I could have overlooked something.  If you
> have example code for a mechanism that is more space efficient and/or
> faster, please share it with us.
>

text is as long as text is, and numbers, for small ranges, are also
compressed to 2 bytes (one for a separator, or opener, and 1 for the value)
gets you 0-9 (0-64 if you base64 encode it)... looking at just the data
part of JSON.  You end up with a lot of overhead from the repeated field
name definition.

I created a format
https://github.com/d3x0r/jsox#jsox--javascript-object-exchange-format that
is compatible with existing JSON, but adds the ability to specify 'class'
definitions.  There's a specification of the grammar in bnf format, and
pictures... It tracks the current parsing state, 0, initial being called
'unknown'.  If a string is found in an unknown state, followed by an
object, then that defines a class type... 'record{id,title,author,data}'
then after than, a second occurrence in the unknown state, or within an
array or object context, '[record{1342,"book","person",1}]'  would use the
existing list of names in order with the values, and build an object that
was { id:1342,title:"book",author:"person",data:1 }.   'record' could be
shortened to any single unicode character, otherwise the saving isn't so
great.
The definition of 'string' is sort of loose in JSOX, as long as there isn't
a format control character ( whitespace, ':', '{', '}', '[', ']' ) you
don't need quotes around a sequence of characters to make a string;
excepting of course starting with characters that look like a number,
and/or match a keyword...
The triggering of the mode is '{' after a string, or while collecting a
string

I also extended the number format of JSON to allow specifying ISO-8601
times as numbers.... (just have to special case in addition to '.'; ':' 'T'
'Z' '-' (inline and not just at start)).

other than that; if space is really a concern, maybe a zip layer?

J


> --
> D. Richard Hipp
> d...@sqlite.org
> _______________________________________________
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to