Hi Everyone, I've done further analysis on the ~1400 zero-results non-DOI query corpus, looking at the effects of perfect (or at least human-level) language detection, and the effects of running all queries against many wikis.
In summary: > More that 85% of failed queries to enwiki are in English, or are not in a > particular language. Only about 35% of non-English queries in some language > (<4.5% of zero-results queries), if funneled to the right language wiki, > get any results. > The types of queries most likely to get results from the non-enwikis are > names and queries in English. There are lots of English words in > non-English wikis (enough that they can do decent spelling correction!), > and the idiosyncrasies of language processing on other wikis allow certain > classes of typos in names and English words to match, or the typos happen > to exist uncorrected in the non-enwiki. > Perhaps a better approach to handling non-English queries is user-specified > alternate languages. More details: https://www.mediawiki.org/wiki/User:TJones_(WMF)/Notes/Cross_Language_Wiki_Searching —Trey Trey Jones Software Engineer, Discovery Wikimedia Foundation
_______________________________________________ Wikimedia-search mailing list Wikimedia-search@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search