I've been using SQLite very heavily for about a year now via Python and Roger Binns' APSW and have a few questions/suggestions about this incredibly awesome piece of code. Please take these in the positive spirit intended.

1) EXPLAIN output is rather hard to understand. Is there a way to produce annotated output that's more human-readable? I see that each opcode is extensively documented in the source; perhaps if the EXPLAIN output could be enhanced to include some of the commentary, or a higher-level syntax rather than VBDE opcodes?

2) Likewise, EXPLAIN QUERY PLAN could be annotated further. For example, "1|0|0|USE TEMP B-TREE FOR ORDER BY" could tell me what's being ordered; hints about what the relevant tables/columns are.

3) The profiling capabilities offered via xProfile are incredibly helpful for finding hot spots. One thing I've done that makes this even more useful is to "genericize" the SQL query text by replacing numbers, dates, etc. with placeholder symbols (e.g,. #) and then aggregating the results to produce a top 100 list by time used. Genericizing is simple pattern matching, and helps coalesce queries that would otherwise appear to be unrelated by simple identity.

4) Re: profiling, it would be great to know which indices are actually being used so that one could prune useless ones. Furthermore, it would be extremely valuable to know which indices are most expensive to maintain — i.e., which cause writing to become disproportionately expensive. Likewise, expensive TRIGGERS would be nice to know about, independently of the SQL queries that triggered them.

5) I've seeen that Howard Chu ported SQLite 3.7 to his LMDB B-Tree implementation and saw performance improvements. Is there a reason this isn't being mainlined? I assume there are trade-offs involved? For purely read-only databases, it would be nice to be able to select the LMDB back-end. (I realize making this switchable is a big undertaking; it's just a suggestion.)

6) Anecdotally, I've found it difficult to make sense of all the tuning parameters and achieve good performance. The settings required seem rather different for Windows than OS X, and for a while I was using fullfsync under OS X — thinking it was the recommended safe option — only to find a comment from DRH that it's not recommended, and not even used by Apple (who asked for it). And, indeed, without it writes are 10x faster! I guess my suggestion would be to update the various tuning documents to reflect the current state of things — even it's something along the lines of "top 10 tuning suggestions for {Windows, OSX, Android, iOS, Ubuntu}" or something like that.


7) I find it very useful to keep SQLite's heap separated from the Python heap so I can see who's using memory. I've patched my SQLite source to allow forcing this behavior even when there are multiple cores, and it seems to work fine. I suggest making this a mainlined compile-time option; it's a trivial patch. I only do this under OS X, because that's my primary development platform; I don't know if other platforms let you tag heaps the way OS X does.

8) The virtual table mechanism is incredibly powerful; I've used it to speed up performance-critical operations immensely. (This is a shout-out rather than a suggestion.)

Thanks for an amazing piece of software.

Dave

_______________________________________________
sqlite-users mailing list
[email protected]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to