https://bugzilla.wikimedia.org/show_bug.cgi?id=41635

--- Comment #5 from jeb...@gmail.com 2012-11-04 10:28:51 UTC ---
The two scoring functions I know works on this kind of problem are one for
sorting on full terms "sort all found terms on prefixed matches on probability
or inverse document frequency or a similar function", possibly with some
weighting on shorter terms to make absolute matches go first, and one for
sorting on boundary effects "sort all found terms on the probability that
syllables start within the right side of the boundary (aka within the prefix)
and continues into the found term", possibly with some simplification with
Markov chains.

The first form is the most common, and as I recall some comments aso the form
used in the live search on Wikipedia, aka the existing Lucene-search.

-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.

_______________________________________________
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to