Thanks, Oliver!

I'm not sure what's up next. We could look around for other available
detectors, algorithms, or ideas to try. Fortunately we don't need to
integrate them to test them—we can just run the queries and evaluate the
results.

We could also try something of our own devising, because it's some
combination of easier, better, faster, and good enough.

I'm open to suggestions. Next week I'll ask Dan & Erik about how much
effort to put into alternatives.

—Trey

Trey Jones
Software Engineer, Discovery
Wikimedia Foundation

On Fri, Sep 4, 2015 at 7:26 PM, Oliver Keyes <oke...@wikimedia.org> wrote:

> Yay! Thank you for this awesome research, Trey. Evaluating language
> plugins sounds like it would make a /great/ blog post. What
> alternatives are up next?
>
> On 4 September 2015 at 18:45, Trey Jones <tjo...@wikimedia.org> wrote:
> > I've written up my analysis of the ElasticSearch language detection
> plugin
> > that Erik recently enabled:
> >
> >
> https://www.mediawiki.org/wiki/User:TJones_(WMF)/Notes/Language_Detection_Evaluation
> >
> > The short version is that it really likes Romanian (and Italian, and has
> a
> > bit of a thing for French), and precision on English is great, but
> recall is
> > poor (probably because of all the typos and other crap that go to enwiki
> > that is still technically "English"). Chinese and Arabic are good.
> >
> > I think we could do better, and we should evaluate (a) other language
> > detectors and (b) the effect of a good language detector on zero results
> > rate (i.e., simulate sending queries to the right place and see how much
> of
> > a difference it makes).
> >
> > Moderately pretty pictures included.
> >
> > —Trey
> >
> > Trey Jones
> > Software Engineer, Discovery
> > Wikimedia Foundation
> >
> > _______________________________________________
> > Wikimedia-search mailing list
> > Wikimedia-search@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
> >
>
>
>
> --
> Oliver Keyes
> Count Logula
> Wikimedia Foundation
>
> _______________________________________________
> Wikimedia-search mailing list
> Wikimedia-search@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
>
_______________________________________________
Wikimedia-search mailing list
Wikimedia-search@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikimedia-search

Reply via email to