Hi,
I'm working on a score function to rank suggestions with the new API
we're building.
I'd like to share with you some data. The initial score function uses
the variables available today.
If someone want to have a look you can find a dataset (from simplewiki)
here[1].
And a small R script to play with the score here[2].
The score works correctly to discount pages that have high number of
incoming links like small villages that link to each others but fails to
discount "List articles" and "Date/Years" articles. I'm not sure how to
deal with that. In the data set you'll find two columns named "good" and
"very good", they are set to true when the article is flagged with
Template:Good or Template:Very Good.
Thanks!
[1] https://people.wikimedia.org/~dcausse/simplewiki_score_vars.csv
[2] https://github.com/nomoa/suggester-prototype/blob/master/score.R
_______________________________________________
Wikimedia-search mailing list
Wikimedia-search@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikimedia-search