One of ORES [1] applications is determining article quality. For example,
What would be the best assessment of an article in the given revision.
Users in wikiprojects use ORES data to check if articles need
re-assessment. e.g. if an article is in "Start" level and now good it's
enough to be a "B" article.

As part of Q4 goals, we made a dataset of article quality scores of all
articles in English Wikipedia [2] (Here's the link to download the dataset
[3]) and we are publishing it in figshare as something you can cite [4]
also we are working on publishing monthly data for researchers to track
article quality data change over time. [5]

As a pet project of mine, I always wanted to put these data in a database.
So we can query the database and get much more useful data. For example
quality of articles in category 'History_of_Essex' [6] [7]. The weighed sum
is a measure of quality which is a decimal number between 0 (really stub)
to 5 (a definitely featured article). We have also prediction column which
is a number in this map [8] for example if prediction is 5, it means ORES
thinks it should be a featured article.

I leave more use cases to your imagination :)

I'm looking for a more permanent place to put these data, please tell me if
it's useful for you.
[1] ORES is not a anti-vandalism tool, it's an infrastructure to use AI in
Wikipedia.
[2] https://phabricator.wikimedia.org/T135684
[3] (117 MBs)
https://datasets.wikimedia.org/public-datasets/enwiki/article_quality/wp10-scores-enwiki-20160820.tsv.bz2
[4] https://phabricator.wikimedia.org/T145332
[5] https://phabricator.wikimedia.org/T145655
[6] https://quarry.wmflabs.org/query/12647
[7] https://quarry.wmflabs.org/query/12662
[8]
https://github.com/wiki-ai/wikiclass/blob/3ff2f6c44c52905c7202515c5c8b525fb1ceb291/wikiclass/utilities/extract_scores.py#L37

Have fun!
Amir
_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to