Well, we have the implementation of Kolkus's algorithm in Java -
although it's a training-based model so it'll need a known dataset to
run off.

On 4 September 2015 at 20:08, Trey Jones <[email protected]> wrote:
> Thanks, Oliver!
>
> I'm not sure what's up next. We could look around for other available
> detectors, algorithms, or ideas to try. Fortunately we don't need to
> integrate them to test them—we can just run the queries and evaluate the
> results.
>
> We could also try something of our own devising, because it's some
> combination of easier, better, faster, and good enough.
>
> I'm open to suggestions. Next week I'll ask Dan & Erik about how much effort
> to put into alternatives.
>
> —Trey
>
> Trey Jones
> Software Engineer, Discovery
> Wikimedia Foundation
>
> On Fri, Sep 4, 2015 at 7:26 PM, Oliver Keyes <[email protected]> wrote:
>>
>> Yay! Thank you for this awesome research, Trey. Evaluating language
>> plugins sounds like it would make a /great/ blog post. What
>> alternatives are up next?
>>
>> On 4 September 2015 at 18:45, Trey Jones <[email protected]> wrote:
>> > I've written up my analysis of the ElasticSearch language detection
>> > plugin
>> > that Erik recently enabled:
>> >
>> >
>> > https://www.mediawiki.org/wiki/User:TJones_(WMF)/Notes/Language_Detection_Evaluation
>> >
>> > The short version is that it really likes Romanian (and Italian, and has
>> > a
>> > bit of a thing for French), and precision on English is great, but
>> > recall is
>> > poor (probably because of all the typos and other crap that go to enwiki
>> > that is still technically "English"). Chinese and Arabic are good.
>> >
>> > I think we could do better, and we should evaluate (a) other language
>> > detectors and (b) the effect of a good language detector on zero results
>> > rate (i.e., simulate sending queries to the right place and see how much
>> > of
>> > a difference it makes).
>> >
>> > Moderately pretty pictures included.
>> >
>> > —Trey
>> >
>> > Trey Jones
>> > Software Engineer, Discovery
>> > Wikimedia Foundation
>> >
>> > _______________________________________________
>> > Wikimedia-search mailing list
>> > [email protected]
>> > https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
>> >
>>
>>
>>
>> --
>> Oliver Keyes
>> Count Logula
>> Wikimedia Foundation
>>
>> _______________________________________________
>> Wikimedia-search mailing list
>> [email protected]
>> https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
>
>
>
> _______________________________________________
> Wikimedia-search mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
>



-- 
Oliver Keyes
Count Logula
Wikimedia Foundation

_______________________________________________
Wikimedia-search mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikimedia-search

Reply via email to