I would appreciate any help with improving the query times I'm experiencing. The database is composed of n-grams drawn from a set of texts, which are dynamically labelled (some with no label, which are not counted) and differences and intersections are found between the n-grams of each labelled set of texts.
The schema is: CREATE TABLE Text ( id INTEGER PRIMARY KEY ASC, filename TEXT UNIQUE NOT NULL, checksum TEXT NOT NULL, label TEXT NOT NULL ); CREATE TABLE TextNGram ( text INTEGER NOT NULL REFERENCES Text (id), ngram TEXT NOT NULL, size INTEGER NOT NULL, count INTEGER NOT NULL ); CREATE INDEX TextIndex ON Text (id, label); CREATE INDEX TextNGramIndexNGram ON TextNGram (ngram); CREATE INDEX TextNGramIndexSize ON TextNGram (size, ngram); There are over 4000 rows in Text and over a billion in TextNGram. The slightly unnormalised schema is due to wanting to keep the time spent inserting all of the data low. The two queries (one to get a symmetric difference between sets of Texts' n-grams, the other to get an intersection) are: SELECT TextNGram.ngram, TextNGram.count freq_count, Text.filename, Text.label FROM TextNGram, Text WHERE Text.label IN (%s) AND TextNGram.size BETWEEN ? AND ? AND TextNGram.text = Text.id AND NOT EXISTS (SELECT tn.ngram FROM TextNGram tn, Text t WHERE t.label != Text.label AND t.label != '' AND tn.text = t.id AND tn.ngram = TextNGram.ngram ORDER BY TextNGram.size, TextNGram.ngram and SELECT TextNgram.ngram, TextNGram.count freq_count, Text.filename, Text.label FROM TextNGram, Text WHERE Text.label IN (%s) AND Text.id = TextNGram.text AND TextNGram.size BETWEEN ? AND ? [ The EXISTS sub-select is repeated for each label ] AND EXISTS (SELECT t.label FROM Text t, TextNGram tn WHERE t.label = ? AND tn.text = t.id AND tn.ngram = TextNGram.ngram) ORDER BY TextNGram.size, TextNGram.ngram I don't know whether my indices can be improved, or the queries themselves, but currently queries can take hours to run. Thank you for any help! Jamie _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users