https://bugzilla.wikimedia.org/show_bug.cgi?id=45529

--- Comment #3 from Asher Feldman <[email protected]> ---
(In reply to comment #0)
> Since only a single index will be used for any given query, this is not
> helpful, and results in inefficient searches. 

This isn't strictly true.  Since the select queries in production against
wb_terms aren't join queries, index merge optimizations are possible and are
being utilized.  I.e. a query containing WHERE (term_language='oc' AND
term_type='label' AND term_entity_type='property') is executed via an
intersection of two indexes - "Using
intersect(wb_terms_entity_type,wb_terms_language)".  

That said, index merging isn't as efficient as composite indexes, and there's
one query case that isn't performing well at all. 

It looks like there are four common select query types in production:

1) /* Wikibase\TermSqlIndex::getTermsOfEntity */ select term_language,
term_type, term_text from `wb_terms` where term_entity_id = ? and
term_entity_type = ? 

2) /* Wikibase\TermSqlIndex::getMatchingIDs */ select distinct term_entity_id
from `wb_terms` where (term_language=? and term_search_key like ? and
term_type=? and term_entity_type=?) or (term_language=? and term_search_key
like ? and term_type=? and term_entity_type=?) limit ? 

3) /* Wikibase\TermSqlIndex::getMatchingTerms */ SELECT term_entity_type,
term_type, term_language, term_text, term_entity_id FROM `wb_terms` WHERE
(term_language=? AND term_type=? AND term_entity_type=?)

4) /* Wikibase\TermSqlIndex::getMatchingTermCombination */ select
terms?term_entity_type as terms?term_entity_type, terms?term_type as
terms?term_type, terms?term_language as terms?term_language, terms?term_text as
terms?term_text, terms?term_entity_id as terms?term_entity_id,
terms?term_entity_type as terms?term_entity_type, terms?term_type as
terms?term_type, terms?term_language as terms?term_language, terms?term_text as
terms?term_text, terms?term_entity_id as terms?term_entity_id from `wb_terms`
`terms?` inner join `wb_terms` `terms?` on
((terms?term_entity_id=terms?term_entity_id) and
(terms?term_entity_type=terms?term_entity_type)) where (terms?term_language=?
and terms?term_text=? and terms?term_type=? and terms?term_entity_type=?) and
(terms?term_language=? and terms?term_text=? and terms?term_type=? and
terms?term_entity_type=?) and (terms?term_entity_id <> ? or
terms?term_entity_type <> ?) limit ?

Query type 3 above can have a large number of results returned and should
probably include a limit.

I think your proposed indexes should be good, but please limit the length on
term_search_key.  Most queries use a few letters only ('abc%'), indexing
term_search_key(6) should be good.  

If the combination ( term_entity_type, term_entity_id ) is guaranteed to be
unique, and it will be used to delete rows, define this index as unique to
minimize locking. 

Submit the changes (with drop index statements) as a regular schema change and
let me know when it's available in gerrit for review.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to