Well, we have the implementation of Kolkus's algorithm in Java - although it's a training-based model so it'll need a known dataset to run off.
On 4 September 2015 at 20:08, Trey Jones <[email protected]> wrote: > Thanks, Oliver! > > I'm not sure what's up next. We could look around for other available > detectors, algorithms, or ideas to try. Fortunately we don't need to > integrate them to test them—we can just run the queries and evaluate the > results. > > We could also try something of our own devising, because it's some > combination of easier, better, faster, and good enough. > > I'm open to suggestions. Next week I'll ask Dan & Erik about how much effort > to put into alternatives. > > —Trey > > Trey Jones > Software Engineer, Discovery > Wikimedia Foundation > > On Fri, Sep 4, 2015 at 7:26 PM, Oliver Keyes <[email protected]> wrote: >> >> Yay! Thank you for this awesome research, Trey. Evaluating language >> plugins sounds like it would make a /great/ blog post. What >> alternatives are up next? >> >> On 4 September 2015 at 18:45, Trey Jones <[email protected]> wrote: >> > I've written up my analysis of the ElasticSearch language detection >> > plugin >> > that Erik recently enabled: >> > >> > >> > https://www.mediawiki.org/wiki/User:TJones_(WMF)/Notes/Language_Detection_Evaluation >> > >> > The short version is that it really likes Romanian (and Italian, and has >> > a >> > bit of a thing for French), and precision on English is great, but >> > recall is >> > poor (probably because of all the typos and other crap that go to enwiki >> > that is still technically "English"). Chinese and Arabic are good. >> > >> > I think we could do better, and we should evaluate (a) other language >> > detectors and (b) the effect of a good language detector on zero results >> > rate (i.e., simulate sending queries to the right place and see how much >> > of >> > a difference it makes). >> > >> > Moderately pretty pictures included. >> > >> > —Trey >> > >> > Trey Jones >> > Software Engineer, Discovery >> > Wikimedia Foundation >> > >> > _______________________________________________ >> > Wikimedia-search mailing list >> > [email protected] >> > https://lists.wikimedia.org/mailman/listinfo/wikimedia-search >> > >> >> >> >> -- >> Oliver Keyes >> Count Logula >> Wikimedia Foundation >> >> _______________________________________________ >> Wikimedia-search mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/wikimedia-search > > > > _______________________________________________ > Wikimedia-search mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/wikimedia-search > -- Oliver Keyes Count Logula Wikimedia Foundation _______________________________________________ Wikimedia-search mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
