Wiki Research Junkies,
I am investigating the comparative quality of articles about Cote d'Ivoire and
Uganda versus other countries. I wanted to answer the question of what makes
high-quality articles? Can anyone point me to any existing research on
heuristics of Article Quality? That is, determining an articles quality by the
wikitext properties, without human rating? I would also consider using data
from the Article Feedback Tools, if there were dumps available for each Article
in English, French, and Swahili Wikipedias. This is all the raw data I can
seem to find http://toolserver.org/~dartar/aft5/dumps/
The heuristic technique that I currently using is training a naive Bayesian
filter based on:
* Per Section.
* Text length in each section
* Infoboxes in each section.
* Filled parameters in each infobox
* Images in each section
* Good Article, Featured Article?
* Then Normalize on Page Views per on population / speakers of native
language
Can you also think of any other dimensions or heuristics to programatically
rate?
Best,
Maximilian Klein
Wikipedian in Residence, OCLC
+17074787023
_______________________________________________
Wiki-research-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l