Hello,
Here are this past week's updates from the Discovery department.

== Highlights ==
* Finalized the second BM25 testing analysis and linked to the pdf here. [0]

==Search ==
* Migrated Phan for CirrusSearch to Jenkins. (technical debt) [1] [2]
* Finished writing up, summarizing, and recommending extensive changes to
TextCat for language identification. [3] Overall improvement to F0.5
accuracy was a mean of just under 5% across the corpora from nine
Wikipedias. The two worst performing corpora, from enwiki and nlwiki, each
went up around 10%! All nine are now above 90% F0.5 score. Next step is to
deploy the recommended changes. [4]
* Completed (a round of) refactoring and cleanup of Special:Search code [5]
[6]

[0] https://www.mediawiki.org/wiki/Discovery_Analysis#Past_analyses
[1] https://www.mediawiki.org/wiki/Continuous_integration/Phan
[2] https://phabricator.wikimedia.org/T153040
[3]
https://www.mediawiki.org/wiki/User:TJones_(WMF)/Notes/TextCat_Improvements#Final_Summary_.26_Recommendations
[4] https://en.wikipedia.org/wiki/F1_score
[5] https://phabricator.wikimedia.org/T150217
[6] https://phabricator.wikimedia.org/T150390

----

The archive of all past updates can be found on MediaWiki.org:

https://www.mediawiki.org/wiki/Discovery/Status_updates

Interested in getting involved? See tasks marked as "Easy" or "Volunteer
needed" in Phabricator.

[1] https://phabricator.wikimedia.org/maniphest/query/qW51XhCCd8.7/#R
[2] https://phabricator.wikimedia.org/maniphest/query/5KEPuEJh9TPS/#R


Yours,
Chris Koerner
Community Liaison - Discovery
Wikimedia Foundation
_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to