Re: [sqlite] Setting auto_vacuum=2 doesn't work after setting journal_mode=WAL
On Friday, 21 February, 2020 19:36, Simon Slavin wrote: >On 22 Feb 2020, at 2:28am, Keith Medcalf wrote: >> When a database is to be created these commands must be given BEFORE >any command which opens or creates the database: >> >> pragma auto_vacuum >> pragma encoding >> pragma page_size >> pragma data_store_directory >> The issuance (or preparation) of ANY OTHER COMMAND will cause a new >> blank database to be created using the values of auto_vacuum, encoding, >> and page_size in effect at the time that command is issued (prepared). >The problem is, Jens points out, that this is not documented. The >documentation doesn't distinguish between those four PRAGMAs and other >PRAGMAs which stop those four from working. That is not entirely true. All of them say that they change the format of a database and only work if the database does not yet exist, although in various variant wording. Perhaps the wording needs to be more clear like it is for the "encoding" pragma which is very clear in stating that the attempt to change the encoding of an existing database will have no effect and therefore this command must be given before the database file is created in order to have effect. In the case of the auto_vacuum and page_size pragma's, however, they DO have effect on a currently existing open database in particular circumstances. However if you want them to have effect for a newly created database you need to issue them before the database is created. >I've previously suggested that the PRAGMAs should be divided into >categories. Perhaps this should be a new category: those which do not >create a database but have to be done before anything that creates the >database. Well, that would be only one pragma, encoding. Whether on not the same applies to any other pragma (page_size, auto_vacuum) depends on the intent of the issuer of the command. If they are expected to affect a database which has not yet been created, then obviously they must be issued before the database is created. If they are intended to affect the database after it is created then they should be issued after the database is created. If they are issued after the database is created they are subject to the limitations of the operation of those commands on already existant databases. Perhaps the pragma encoding, pragma auto_vacuum and pragma page_size simply need to say that if one wants the change to apply to a "newly created" database these commands must be given first, before any other command. -- The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Setting auto_vacuum=2 doesn't work after setting journal_mode=WAL
On 22 Feb 2020, at 2:28am, Keith Medcalf wrote: > When a database is to be created these commands must be given BEFORE any > command which opens or creates the database: > > pragma auto_vacuum > pragma encoding > pragma page_size > pragma data_store_directory > > The issuance (or preparation) of ANY OTHER COMMAND will cause a new blank > database to be created using the values of auto_vacuum, encoding, and > page_size in effect at the time that command is issued (prepared). The problem is, Jens points out, that this is not documented. The documentation doesn't distinguish between those four PRAGMAs and other PRAGMAs which stop those four from working. I've previously suggested that the PRAGMAs should be divided into categories. Perhaps this should be a new category: those which do not create a database but have to be done before anything that creates the database. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Setting auto_vacuum=2 doesn't work after setting journal_mode=WAL
There are three pragma's which affect the "organization" of a newly created database. When a database is to be created these commands must be given BEFORE any command which opens or creates the database: pragma auto_vacuum pragma encoding pragma page_size pragma data_store_directory The issuance (or preparation) of ANY OTHER COMMAND will cause a new blank database to be created using the values of auto_vacuum, encoding, and page_size in effect at the time that command is issued (prepared). The default for auto_vacuum is 0 or none. auto_vacuum can be changed freely between full (1) and incremental (2) if the database was created with a non-zero value of auto_vacuum (auto_vacuum was not none at the time of database creation). page_size can be changed for an existing non-wal mode database by changing the page_size and doing a vacuum. encoding cannot be changed and defaults to utf8. The reference to "create tables" in the database in the documentation is a euphamism for "database file created" since a database will always have at least one table in it -- the sqlite_master table -- once the database is created, even if no user tables are created. -- The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume. >-Original Message- >From: sqlite-users On >Behalf Of Jens Alfke >Sent: Friday, 21 February, 2020 16:24 >To: SQLite mailing list >Subject: [sqlite] Setting auto_vacuum=2 doesn't work after setting >journal_mode=WAL > >I just discovered that none of our databases are auto-vacuuming, even >though we set auto_vacuum to 'incremental' when a database is created, >and periodically call 'pragma incremental_vacuum'. If I go into the CLI, >open a database and run "pragma auto_vacuum", it returns 0. > >After some experimentation I've discovered that (at least in 3.28) >`PRAGMA auto_vacuum=incremental` has to be done as the very first thing >after creating a database, even before `PRAGMA journal_mode=WAL`. I know >it's documented that auto_vacuum has to be enabled before creating any >tables, but the docs say nothing about ordering wrt other pragmas! > >To be precise, this is what we currently do after creating a new >database, which does not work: > >sqlite3_exec(db, "PRAGMA journal_mode=WAL; " > "PRAGMA auto_vacuum=incremental; "// ⟵ will >have no effect > "BEGIN; " > "CREATE TABLE …….. ; " > "PRAGMA user_version=302; " > "END;"); > >If I swap the first two statements, it does work: > >sqlite3_exec(db, "PRAGMA auto_vacuum=incremental; "// ⟵ will take >effect > "PRAGMA journal_mode=WAL; " > "BEGIN; " > "CREATE TABLE …….. ; " > "PRAGMA user_version=302; " > "END;"); > >Is this expected? If so, the docs for auto_vacuum should be updated. > >—Jens >___ >sqlite-users mailing list >sqlite-users@mailinglists.sqlite.org >http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Setting auto_vacuum=2 doesn't work after setting journal_mode=WAL
I just discovered that none of our databases are auto-vacuuming, even though we set auto_vacuum to 'incremental' when a database is created, and periodically call 'pragma incremental_vacuum'. If I go into the CLI, open a database and run "pragma auto_vacuum", it returns 0. After some experimentation I've discovered that (at least in 3.28) `PRAGMA auto_vacuum=incremental` has to be done as the very first thing after creating a database, even before `PRAGMA journal_mode=WAL`. I know it's documented that auto_vacuum has to be enabled before creating any tables, but the docs say nothing about ordering wrt other pragmas! To be precise, this is what we currently do after creating a new database, which does not work: sqlite3_exec(db, "PRAGMA journal_mode=WAL; " "PRAGMA auto_vacuum=incremental; "// ⟵ will have no effect "BEGIN; " "CREATE TABLE …….. ; " "PRAGMA user_version=302; " "END;"); If I swap the first two statements, it does work: sqlite3_exec(db, "PRAGMA auto_vacuum=incremental; "// ⟵ will take effect "PRAGMA journal_mode=WAL; " "BEGIN; " "CREATE TABLE …….. ; " "PRAGMA user_version=302; " "END;"); Is this expected? If so, the docs for auto_vacuum should be updated. —Jens ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Record serialization infrastructure of SQLITE
Inspired by the json encoding thread. We have a need for multi-dimensional data (like json), and currently use a record format for that that is derived from the SQLITE format with Serial Type/Size stored in varint Huffman encoding of twos-complement. It is a great fast, compact storage format. We added support for arrays and objects on top of it, but just recursive processing of blobs would allow for the same. However, we had to build our own make/extract/expression extensions from scratch to do this. It would be very helpful if SQLITE can expose its internal record serialization infrastructure a bit so that there is a way (using the API's) to create an array of sqlite3_value from a blob, and blob from sqlite3_value array. E.g. Reading: blob = sqlite3_column_blob(_stmt, 2); sqlite3_array** arr = sqlite3_deserialize(blob, ); val1 = sqlite3_value_int(arr[0]); val2 = sqlite3_value_int(arr[1]); Writing: sqlite3_value* arr[2]; arr[0] = sqlite3_make_value_int(42); arr[1] = sqlite3_make_value_int(43); sqlite3_value blob = sqlite3_serialize(arr, 2); sqlite3_bind_value(stmt, 2, blob); It would probably also be useful if there's some limited expression functions to interact with it, but those are easy to build yourself. The serialization infrastructure however is pretty complex. - Deon ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] more efficient JSON encoding: idle musing
On 21.02.2020 15:03 Richard Hipp wrote: > On 2/21/20, Wout Mertens wrote: >> The idea is that upon storing the JSON >> data, the JSON1 extension parses it, extracts the layouts recursively, >> stores them when they are not known yet, and then only stores the >> values in the binary format with the layout identifiers. > I experimented with a number of similar ideas for storing JSON when I > was first designing the JSON components for SQLite. I was never able > to find anything that was as fast or as compact as just storing the > original JSON text. But I could have overlooked something. If you > have example code for a mechanism that is more space efficient and/or > faster, please share it with us. If you want to be as space efficient as possible, look into succinct data structures. The balanced parenthesis representation of a tree can be stored in a bit vector (each node 2 bits) and there are (succinct) index structures to query that efficiently. See e.g. https://core.ac.uk/download/pdf/81941172.pdf and https://github.com/simongog/sdsl-lite/blob/master/include/sdsl/bp_support_g.hpp as an implementation. Making this work with JSON would be a lot of work, though, I guess. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] more efficient JSON encoding: idle musing
> On Feb 21, 2020, at 4:20 AM, Wout Mertens wrote: > > I was wondering if the JSON extension could not do the same thing: for each > table, keep a hidden stash of object layouts, and store the values as > sqlite primitives. (you'd be able to disable this, in case the layouts > rarely repeat) We do something a bit like this in Couchbase Lite[1]. We don't directly store JSON, we store a binary data format we designed called Fleece, which has the same semantics but is faster to traverse and more compact. https://github.com/couchbaselabs/fleece In Fleece a JSON object is represented as an array of fixed-width {key, value} pairs. - The keys are typically 16-bit integers that map into a global (per-database) table of common key strings. - The values are either inline if they fit (small ints, boolean, null) or else offsets to where the real data is. The array is sorted by key, so lookup is a binary search, usually on 16-bit ints. Numbers are stored in a compact variable-width encoding. Strings are unescaped UTF-8, and multiple occurrences of the same string are only stored once. (I've recently discovered that this is similar to FlexBuffers. Fleece has been around since 2015, so I don't know if the designers of FlexBuffers looked at it…) I've thought about the 'layouts' concept but haven't tried implementing it yet. (Most JavaScript runtimes use something like it, btw.) It would only save about 4*n bytes per record, where n is the average number of keys in a layout, and a small number of clock cycles per key lookup. > On Feb 21, 2020, at 6:03 AM, Richard Hipp wrote: > > I experimented with a number of similar ideas for storing JSON when I > was first designing the JSON components for SQLite. I was never able > to find anything that was as fast or as compact as just storing the > original JSON text. I had similar experiences with BSON, Bencode, etc. In my use case I found that the major expense wasn't parsing, rather allocating an object tree. So I designed the Fleece format to be fully useable in-place without requiring parsing. A secondary factor is that traversing objects and arrays is expensive because the items are variable width, so you have to parse through all the (potentially nested) items before the one you're looking for. That's why Fleece objects and arrays are fixed-width, using offsets to point to variable-size values. —Jens [1] https://www.couchbase.com/products/mobile ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] more efficient JSON encoding: idle musing
> On Feb 21, 2020, at 4:20 AM, Wout Mertens wrote: > > In JavaScript, objects are key-value collections with unique keys, where the > order of the keys is important. JSON is not JavaScript. The order of keys is NOT significant in JSON, and many, many JSON implementations parse JSON objects into data structures* that do not preserve key order. Clients using such implementations cannot determine the original order of keys, nor can they re-serialize the object with the same ordering. I have run into multiple issues over time with systems that assume JSON key ordering is significant and preserved (CouchDB and Scuttlebutt come to mind), which then later have to redesign protocols or data schema because of interoperability problems this causes. —Jens * The platform's standard "dictionary" / "map" class is typically implemented as a hash table or binary tree, as in Java, .NET, Objective-C, Swift, Go, Lua, C++. (I'm unsure about Python and Ruby.) A few platforms besides JS keep an auxiliary key array to preserve ordering. Erlang uses a linked-list, which is ordered. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Performance Issue on Large Table
> On Feb 20, 2020, at 5:24 PM, Chip Beaulieu wrote: > > CREATE INDEX `INX_Verses_BID` ON `Verses` (`BibleID`) > CREATE INDEX `INX_Verses_BID_B_C` ON `Verses` (`BibleID`,`Book`,`Chapter`) > CREATE INDEX `INX_Verses_BI_B_C_V` ON `Verses` > (`BibleID`,`Book`,`Chapter`,`VerseNumber`) I believe the first two of those are unnecessary, since their column-lists are prefixes of the third. So removing them would buy you some performance. (Disclaimer: not a guru.) But you’ve said most of the problem is with FTS. When you update the database, does all of the indexed text really change? Or do many of the 30,000 new records contain the same text as their deleted counterparts? If the latter, you could optimize by not touching those rows. It’s also possible that dropping the FTS table before the update, and re-creating it afterwards, would be faster than incrementally changing it. —Jens ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] more efficient JSON encoding: idle musing
If you have example code for a mechanism that is more space efficient and/or faster, please share it with us. "Bencode" is approximately the same space-wise as JSON, but encoding/decoding is potentially faster since it doesn't have to do any escaping for strings: https://en.wikipedia.org/wiki/Bencode On Fri, 21 Feb 2020, Richard Hipp wrote: On 2/21/20, Wout Mertens wrote: The idea is that upon storing the JSON data, the JSON1 extension parses it, extracts the layouts recursively, stores them when they are not known yet, and then only stores the values in the binary format with the layout identifiers. I experimented with a number of similar ideas for storing JSON when I was first designing the JSON components for SQLite. I was never able to find anything that was as fast or as compact as just storing the original JSON text. But I could have overlooked something. If you have example code for a mechanism that is more space efficient and/or faster, please share it with us. -- D. Richard Hipp d...@sqlite.org ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] more efficient JSON encoding: idle musing
On Fri, Feb 21, 2020 at 3:03 PM Richard Hipp wrote: > If you > have example code for a mechanism that is more space efficient and/or > faster, please share it with us. I'll see if I can prototype something in JS - I'd be keeping the layouts in a helper table, and I wouldn't be storing the values in binary but it'd be a starting point. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] more efficient JSON encoding: idle musing
On 2/21/20, Wout Mertens wrote: > The idea is that upon storing the JSON > data, the JSON1 extension parses it, extracts the layouts recursively, > stores them when they are not known yet, and then only stores the > values in the binary format with the layout identifiers. I experimented with a number of similar ideas for storing JSON when I was first designing the JSON components for SQLite. I was never able to find anything that was as fast or as compact as just storing the original JSON text. But I could have overlooked something. If you have example code for a mechanism that is more space efficient and/or faster, please share it with us. -- D. Richard Hipp d...@sqlite.org ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] more efficient JSON encoding: idle musing
On Fri, Feb 21, 2020 at 2:37 PM Warren Young wrote: > > On Feb 21, 2020, at 5:20 AM, Wout Mertens wrote: > > Queries can go faster, because a query like `where json_extract(json, > > '$.foo') = 'bar'` can first check the layouts to see which ones apply, > > SQLite’s JSON1 extension is a storage and query mechanism, not a run-time > object system. I se that things like json_remove() exist, but my assumption > is that 99.manynines% of the time, objects are stored, retrieved, and queried > without being modified at the SQLite level, if at all. > > Therefore, 99.manynines% of the time, there is only one “layout.” Hmm, that is not what I meant. The idea is that upon storing the JSON data, the JSON1 extension parses it, extracts the layouts recursively, stores them when they are not known yet, and then only stores the values in the binary format with the layout identifiers. So yes, somewhat more work than storing and retrieving a plain string. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] more efficient JSON encoding: idle musing
On Feb 21, 2020, at 5:20 AM, Wout Mertens wrote: > > In JavaScript, objects are key-value collections with unique keys, where > the order of the keys is important. ECMAScript §13.7.5.15 (2019 edition) says, "The mechanics and order of enumerating the properties is not specified but must conform to the rules specified below.” You can read the rules if you like, but it doesn’t say that every JavaScript give the same ordering: https://www.ecma-international.org/ecma-262/10.0/index.html#sec-enumerate-object-properties The JSON spec then says, “The JSON syntax...does not assign any significance to the ordering of name/value pairs." http://www.ecma-international.org/publications/files/ECMA-ST/ECMA-404.pdf > Most JSVMs I’m going to take a wild guess that “Most” here means “a whole bunch of different browsers and server-side JS stacks all using V8.” > Queries can go faster, because a query like `where json_extract(json, > '$.foo') = 'bar'` can first check the layouts to see which ones apply, SQLite’s JSON1 extension is a storage and query mechanism, not a run-time object system. I se that things like json_remove() exist, but my assumption is that 99.manynines% of the time, objects are stored, retrieved, and queried without being modified at the SQLite level, if at all. Therefore, 99.manynines% of the time, there is only one “layout.” To the extent that that is true, I don’t see how the rest of your proposal matters. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Performance Issue on Large Table
On 21-2-2020 02:24, Chip Beaulieu wrote: I have a table with 4.5 million records with full text indexing. Reads are very fast, but deleting / inserting / updating takes on average about 50 seconds per record. I often do batches of 30,000 deletes / inserts at a time. The last batch took 10 hours to complete. Here are the details: Table Schema: CREATE TABLE `Verses` ( `ID`integer PRIMARY KEY AUTOINCREMENT, `BibleID` integer NOT NULL, `Book` integer NOT NULL, `Chapter` integer NOT NULL, `VerseNumber` integer NOT NULL, `Scripture` text, `Comments` text, I would recommend to not store `Comments` in the same table as `Scripture`, or are you trying to re-write the Bible? This will avoid the deletion of unchanged data, which is inserted later on (only because `Comments` did change?) -- Luuk ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] more efficient JSON encoding: idle musing
Hi, I use SQLite as a MaybeSQL store, mixing fixed columns with schemaless JSON columns. It's really great. In JavaScript, objects are key-value collections with unique keys, where the order of the keys is important. Most JSVMs store them as a pointer to a layout and then the values. The layout lists the keys in order (and possibly the types, to get byte-perfect layouts). If you delete a key from an object, it generates a new layout. I was wondering if the JSON extension could not do the same thing: for each table, keep a hidden stash of object layouts, and store the values as sqlite primitives. (you'd be able to disable this, in case the layouts rarely repeat) This way: - it's more space efficient, since they keys are only stored once per layout - loading is faster, because the values are stored as their primitive type - querying can go faster Queries can go faster, because a query like `where json_extract(json, '$.foo') = 'bar'` can first check the layouts to see which ones apply, allowing to skip other layouts, and then quickly find the value to test thanks to the binary encoding You could also allow an optimization that makes key order unimportant, reducing the number of layouts. So, smaller and faster. Thoughts? Wout. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Is randomblob(16) a good guid generation across multiple computers?
On 2/21/20, Andy KU7T wrote: > Are you saying the PRNG on Windows is not good enough to use > randomblob(16) in Sqlite? All I need is a reasonable assurance that is are > unique... The default PRNG on Windows is fine for generating globally unique identifiers. The complaint is that the seeding of the PRNG on Windows is such that an attacker could by brute force discover the seed of the PRNG by examining a sequence of generated UUIDs. In that scenario, the attacker might be able to guess the next UUID that your system will be generating. If that is a problem for your application, then fix it by compiling with -DSQLITE_WIN32_USE_UUID=1 and linking against RPCRT4.LIB. -- D. Richard Hipp d...@sqlite.org ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users