I've been using SQLite very heavily for about a year now via Python and Roger
Binns' APSW and have a few questions/suggestions about this incredibly awesome
piece of code. Please take these in the positive spirit intended.
1) EXPLAIN output is rather hard to understand. Is there a way to produce
annotated output that's more human-readable? I see that each opcode is
extensively documented in the source; perhaps if the EXPLAIN output could be
enhanced to include some of the commentary, or a higher-level syntax rather
than VBDE opcodes?
2) Likewise, EXPLAIN QUERY PLAN could be annotated further. For example,
"1|0|0|USE TEMP B-TREE FOR ORDER BY" could tell me what's being ordered; hints
about what the relevant tables/columns are.
3) The profiling capabilities offered via xProfile are incredibly helpful for
finding hot spots. One thing I've done that makes this even more useful is to
"genericize" the SQL query text by replacing numbers, dates, etc. with
placeholder symbols (e.g,. #) and then aggregating the results to produce a
top 100 list by time used. Genericizing is simple pattern matching, and helps
coalesce queries that would otherwise appear to be unrelated by simple
identity.
4) Re: profiling, it would be great to know which indices are actually being
used so that one could prune useless ones. Furthermore, it would be extremely
valuable to know which indices are most expensive to maintain — i.e., which
cause writing to become disproportionately expensive. Likewise, expensive
TRIGGERS would be nice to know about, independently of the SQL queries that
triggered them.
5) I've seeen that Howard Chu ported SQLite 3.7 to his LMDB B-Tree
implementation and saw performance improvements. Is there a reason this isn't
being mainlined? I assume there are trade-offs involved? For purely read-only
databases, it would be nice to be able to select the LMDB back-end. (I realize
making this switchable is a big undertaking; it's just a suggestion.)
6) Anecdotally, I've found it difficult to make sense of all the tuning
parameters and achieve good performance. The settings required seem rather
different for Windows than OS X, and for a while I was using fullfsync under
OS X — thinking it was the recommended safe option — only to find a comment
from DRH that it's not recommended, and not even used by Apple (who asked for
it). And, indeed, without it writes are 10x faster! I guess my suggestion
would be to update the various tuning documents to reflect the current state
of things — even it's something along the lines of "top 10 tuning suggestions
for {Windows, OSX, Android, iOS, Ubuntu}" or something like that.
7) I find it very useful to keep SQLite's heap separated from the Python heap
so I can see who's using memory. I've patched my SQLite source to allow
forcing this behavior even when there are multiple cores, and it seems to work
fine. I suggest making this a mainlined compile-time option; it's a trivial
patch. I only do this under OS X, because that's my primary development
platform; I don't know if other platforms let you tag heaps the way OS X does.
8) The virtual table mechanism is incredibly powerful; I've used it to speed
up performance-critical operations immensely. (This is a shout-out rather than
a suggestion.)
Thanks for an amazing piece of software.
Dave
_______________________________________________
sqlite-users mailing list
[email protected]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users