On 2/25/19, Richard Hipp <d...@sqlite.org> wrote:
> performance of just over 3GB/sec, which is slightly
> faster than reported simdjson performance of 2.9GB/sec.

Further analysis shows that SQLite was caching its parse tree, which
was distorting the measurement.  The following script adds a different
string of spaces to the end of each instance of gsoc-2019.json that is
parsed, thereby invalidating the cache.

.timer on
CREATE TEMP TABLE [$Parameters](key TEXT PRIMARY KEY,value) WITHOUT ROWID;
INSERT INTO [$Parameters](key,value)
 VALUES('$json',readfile('/home/drh/tmp/gsoc-2018.json'));
SELECT length($json);
WITH RECURSIVE c(x) AS (VALUES(1) UNION ALL SELECT x+1 FROM c WHERE x<1000)
SELECT count(json_valid($json||printf('%*c',x,' '))) FROM c;

In this case, SQLite parses JSON at 1.1GB/sec.  That is slower than
simdjson, but it is still pretty fast.  And there are other reasons to
prefer the current SQLite implementation:

(1) The SQLite code is public domain.  Simdjson is not.  We do not
want a license on SQLite that says something like "Public Domain
unless you use JSON functions, in which case the license is Apache."

(2) SQLite is written in portable C code.  It runs everywhere.
Simdjson is written in C++ and makes use of SIMD extensions that are
not universally available.

(3) Simdjson is optimized for large JSON blobs.  SQLite is optimized
for the common database case of small JSON blobs.

-- 
D. Richard Hipp
d...@sqlite.org
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to