That is super awesome! :D

-- brion

On Tue, Jul 19, 2016 at 6:41 PM, Deborah Tankersley <
[email protected]> wrote:

> We're happy to announce that after numerous tests and analyses[1] and a
> fully operational demo[2], the Discovery Team is ready to release
> TextCat[3] into production on wiki.
>
> What is TextCat? It detects the language that the search query was written
> in which allows us to look for results on a different wiki. TextCat is a
> language detection library based on n-grams[4]. During a search, TextCat
> will only kick in when the following three things occur:
>     1. fewer than 3 results are returned from the query on the current wiki
>     2. language detection is successful (meaning that TextCat is reasonably
> certain what language the query is in, and that it is different from the
> language of the current wiki)
>     3. the other wiki (in the detected language) has results
>
> Our analysis of the A/B test[5] (for English, French, Spanish, Italian and
> German Wikipedia's) showed that:
>
> "...The test groups not only had a substantially lower zero results rate
> (57% in control group vs 46% in the two test groups), but they had a higher
> clickthrough rate (44% in the control group vs 49-50% in the two test
> groups), indicating that we may be providing users with relevant results
> that they would not have gotten otherwise."
>
>
> This update will be scheduled for production release during the week of
> July 25, 2016 on the following Wikipedia's:
>
>    - English [6]
>    - German [7]
>    - Spanish [8]
>    - Italian [9]
>    - French [10]
>
> TextCat will then be added to this next group of Wikipedia's at a later
> date:
>
>    - Portugese[11]
>    - Russian[12]
>    - Japanese[13]
>
> This is a huge step forward in creating a search mechanism that is able to
> detect - with a high level of accuracy - the language that was used and
> produce results in that language. Another forward-looking aspect of TextCat
> is investigating a confidence measuring algorithm[14], to ensure that the
> language detection results are the best they can be.
>
> We will also be doing more[15] A/B tests using TextCat on non Wikipedia
> sites, such as Wikibooks and Wikivoyage. These new tests will give us
> insight into whether applying the same language detection configuration
> across projects would be helpful.
>
> Please let us know if you have any questions or concerns, on the TextCat
> discussion page[16]. Also, for screenshots of what this update will look
> like, please see this one[17] showing an existing search typed in on enwiki
> in Russian "первым экспериментом" and this one[18] for showing what it will
> look like once TextCat is in production on enwiki.
>
>
> Thanks!
>
>
> [1] https://phabricator.wikimedia.org/T118278
> [2] https://tools.wmflabs.org/textcatdemo/
> [3] https://www.mediawiki.org/wiki/TextCat
> [4] https://en.wikipedia.org/wiki/N-gram
> [5]
>
> https://commons.wikimedia.org/wiki/File:Report_on_Cirrus_Search_TextCat_AB_Test_-_Language_Detection_on_English,_French,_Spanish,_Italian,_and_German_Wikipedias.pdf
> [6] https://en.wikipedia.org/
> [7] https://de.wikipedia.org/
> [8] https://es.wikipedia.org/
> [9] https://it.wikipedia.org/
> [10] https://fr.wikipedia.org/
> [11] https://pt.wikipedia.org/
> [12] https://ru.wikipedia.org/
> [13] https://ja.wikipedia.org/
> [14] https://phabricator.wikimedia.org/T140289
> [15] https://phabricator.wikimedia.org/T140292
> [16] https://www.mediawiki.org/wiki/Talk:TextCat
> [17]
> https://commons.wikimedia.org/wiki/File:Existing-search_no-textcat.png
> [18] https://commons.wikimedia.org/wiki/File:New-search_with-textcat.png
>
> --
> Deb Tankersley
> Product Manager, Discovery
> IRC: debt
> Wikimedia Foundation
> _______________________________________________
> Wikitech-l mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to