I'm working on a score function to rank suggestions with the new API we're building.

I'd like to share with you some data. The initial score function uses the variables available today. If someone want to have a look you can find a dataset (from simplewiki) here[1].
And a small R script to play with the score here[2].

The score works correctly to discount pages that have high number of incoming links like small villages that link to each others but fails to discount "List articles" and "Date/Years" articles. I'm not sure how to deal with that. In the data set you'll find two columns named "good" and "very good", they are set to true when the article is flagged with Template:Good or Template:Very Good.


[1] https://people.wikimedia.org/~dcausse/simplewiki_score_vars.csv
[2] https://github.com/nomoa/suggester-prototype/blob/master/score.R

Wikimedia-search mailing list

Reply via email to