[sqlite] EXPLAIN output; profiling index usage; LMDB backend

Dave Baggett Tue, 18 Feb 2014 03:25:25 -0800

I've been using SQLite very heavily for about a year now via Python and RogerBinns' APSW and have a few questions/suggestions about this incredibly awesomepiece of code. Please take these in the positive spirit intended.

1) EXPLAIN output is rather hard to understand. Is there a way to produceannotated output that's more human-readable? I see that each opcode isextensively documented in the source; perhaps if the EXPLAIN output could beenhanced to include some of the commentary, or a higher-level syntax ratherthan VBDE opcodes?

2) Likewise, EXPLAIN QUERY PLAN could be annotated further. For example,"1|0|0|USE TEMP B-TREE FOR ORDER BY" could tell me what's being ordered; hintsabout what the relevant tables/columns are.

3) The profiling capabilities offered via xProfile are incredibly helpful forfinding hot spots. One thing I've done that makes this even more useful is to"genericize" the SQL query text by replacing numbers, dates, etc. withplaceholder symbols (e.g,. #) and then aggregating the results to produce atop 100 list by time used. Genericizing is simple pattern matching, and helpscoalesce queries that would otherwise appear to be unrelated by simpleidentity.

4) Re: profiling, it would be great to know which indices are actually beingused so that one could prune useless ones. Furthermore, it would be extremelyvaluable to know which indices are most expensive to maintain — i.e., whichcause writing to become disproportionately expensive. Likewise, expensiveTRIGGERS would be nice to know about, independently of the SQL queries thattriggered them.

5) I've seeen that Howard Chu ported SQLite 3.7 to his LMDB B-Treeimplementation and saw performance improvements. Is there a reason this isn'tbeing mainlined? I assume there are trade-offs involved? For purely read-onlydatabases, it would be nice to be able to select the LMDB back-end. (I realizemaking this switchable is a big undertaking; it's just a suggestion.)

6) Anecdotally, I've found it difficult to make sense of all the tuningparameters and achieve good performance. The settings required seem ratherdifferent for Windows than OS X, and for a while I was using fullfsync underOS X — thinking it was the recommended safe option — only to find a commentfrom DRH that it's not recommended, and not even used by Apple (who asked forit). And, indeed, without it writes are 10x faster! I guess my suggestionwould be to update the various tuning documents to reflect the current stateof things — even it's something along the lines of "top 10 tuning suggestionsfor {Windows, OSX, Android, iOS, Ubuntu}" or something like that.

7) I find it very useful to keep SQLite's heap separated from the Python heapso I can see who's using memory. I've patched my SQLite source to allowforcing this behavior even when there are multiple cores, and it seems to workfine. I suggest making this a mainlined compile-time option; it's a trivialpatch. I only do this under OS X, because that's my primary developmentplatform; I don't know if other platforms let you tag heaps the way OS X does.

8) The virtual table mechanism is incredibly powerful; I've used it to speedup performance-critical operations immensely. (This is a shout-out rather thana suggestion.)


Thanks for an amazing piece of software.

Dave

_______________________________________________
sqlite-users mailing list
[email protected]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

[sqlite] EXPLAIN output; profiling index usage; LMDB backend

Reply via email to