I’m experimenting with querying databases of JSON documents. These data-sets are schemaless and there’s no guarantee that they all have a common set of properties; in fact it’s common for them to have the equivalent of multiple ‘tables’ in the same data-set, i.e. groups of documents with distinct sets of common properties.
Unindexed queries just using SELECT with json_extract() are faster than I expected, and of course they become blazing fast when I create indexes on those json_extract() calls. Great stuff! Here’s my issue/question: It’s common for a JSON property to exist in only a subset of the rows/documents. This seems like a perfect use case for partial indexes … except that the WHERE clause in a CREATE INDEX statement explicitly disallows function calls, so I can’t constrain the index to only the rows that contain the JSON property. Is this limitation something that might be lifted soon (now that functions can be identified as ‘pure’), or is it somehow a direct consequence of the way partial indexes work? To give an example: Let’s say the data-set contains JSON documents describing both students and courses. There are 20,000 students but only 200 courses being taught. Courses have a “professor” property. I would like to query by professor, but only 1% of the rows contain that property, so unless I constrain the index it’s going to be 99% full of useless null values. I’d like to say: CREATE INDEX profs ON dataset (json_extract(doc, ‘$.professor’)) WHERE json_extract(doc, ‘$.professor’) IS NOT NULL but SQLite rejects this because of the function call in the WHERE clause. (Now, maybe I’m prematurely optimizing, and in reality all those null values don’t have much overhead, or are suppressed entirely; I don’t know.) I can think of some workarounds for this, but they require changing the SQL schema, for instance adding a boolean ‘has_professor’ column that's accessible when indexing. This can be awkward because I’m writing general-purpose code that doesn’t know ahead of time what sort of JSON data-sets might be loaded into it, or what queries might be run. So it would have to alter the tables on the fly as the caller makes queries or requests indexing. Any advice gratefully appreciated. (I’ve looked through the list archives to see if this has come up before, but I couldn’t find any threads.) —Jens _______________________________________________ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users