Hi Everyone,

I've done further analysis on the ~1400 zero-results non-DOI query corpus,
looking at the effects of perfect (or at least human-level) language
detection, and the effects of running all queries against many wikis.

In summary:

> More that 85% of failed queries to enwiki are in English, or are not in a
> particular language. Only about 35% of non-English queries in some language
> (<4.5% of zero-results queries), if funneled to the right language wiki,
> get any results.
>


The types of queries most likely to get results from the non-enwikis are
> names and queries in English. There are lots of English words in
> non-English wikis (enough that they can do decent spelling correction!),
> and the idiosyncrasies of language processing on other wikis allow certain
> classes of typos in names and English words to match, or the typos happen
> to exist uncorrected in the non-enwiki.
>


Perhaps a better approach to handling non-English queries is user-specified
> alternate languages.


More details:

https://www.mediawiki.org/wiki/User:TJones_(WMF)/Notes/Cross_Language_Wiki_Searching

—Trey

Trey Jones
Software Engineer, Discovery
Wikimedia Foundation
_______________________________________________
Wikimedia-search mailing list
Wikimedia-search@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikimedia-search

Reply via email to