https://bugzilla.wikimedia.org/show_bug.cgi?id=45529
Web browser: ---
Bug ID: 45529
Summary: use composite indexes for efficient term search
Product: MediaWiki extensions
Version: unspecified
Hardware: All
OS: All
Status: NEW
Severity: normal
Priority: Unprioritized
Component: WikidataRepo
Assignee: [email protected]
Reporter: [email protected]
CC: [email protected],
[email protected]
Classification: Unclassified
Mobile Platform: ---
Currently, the wb_terms table has indexes on individual fields:
CREATE INDEX /*i*/wb_terms_entity_id ON /*_*/wb_terms (term_entity_id);
CREATE INDEX /*i*/wb_terms_entity_type ON /*_*/wb_terms (term_entity_type);
CREATE INDEX /*i*/wb_terms_language ON /*_*/wb_terms (term_language);
CREATE INDEX /*i*/wb_terms_type ON /*_*/wb_terms (term_type);
CREATE INDEX /*i*/wb_terms_text ON /*_*/wb_terms (term_text);
CREATE INDEX /*i*/wb_terms_search_key ON /*_*/wb_terms (term_search_key);
Since only a single index will be used for any given query, this is not
helpful, and results in inefficient searches. We really need a composite key
for each kind of query. As far as I remember, the kinds of queries we do are:
* by term_language, term_term, and term_entity_type
* by term_language, term_term, and term_type
"term" here may be an exact match, or a soft match - so we need indexes
covering either. The term type has very low cardinality (3 or 4, iirc), so it
can probably be admitted.
From this, I gather we need the following two composite indexes:
* for exact matches: ( term_language, term_term )
* for soft matches: ( term_language, term_search_key )
We also need to access terms by entity ID (especially for deleting and
updating, so we also want an index on
* ( term_entity_type, term_entity_id )
And of course, row_id will remain a primary key.
--
You are receiving this mail because:
You are on the CC list for the bug.
You are watching all bug changes.
_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l