It's been a long since I wanted to include wp10 in our search indices to
experiment with this data as a relevance signal.
This is now possible with your dataset and I've built a test index
which uses the following signals to rank results:
- incoming links
- weekly pageviews
The weights for these signals have not been properly tuned yet but they
can be adjusted at query time with uri query param:
- cirrusIncLinksW: weight for a value that ranges from 0 to 1
- cirrusPageViewsW: weight for a value that ranges from 0 to 1
- cirrusWP10W: weight for a value that ranges from 0 to 5
- articles in category 'History_of_Essex' sorted by WP10 best first 
- articles in category 'History_of_Essex' sorted by WP10 worst first 
I'd love to make this data available in a more convenient way with query
keywords like wp10:0 and then allow playing other signals like pageviews.
Concerning internal search ranking we will soon evaluate how wp10
compares with existing signals (inclinks/pageviews) and I'd like to use
it as a replacement for the naive scoring method we use for autocomplete
Well... everything is at an early stage but I believe we can do very
interesting things with wp10 and search, I still don't know exactly
what, nor how :)
Le 21/09/2016 à 11:11, Amir Ladsgroup a écrit :
One of ORES  applications is determining article quality. For example,
What would be the best assessment of an article in the given revision.
Users in wikiprojects use ORES data to check if articles need
re-assessment. e.g. if an article is in "Start" level and now good it's
enough to be a "B" article.
As part of Q4 goals, we made a dataset of article quality scores of all
articles in English Wikipedia  (Here's the link to download the dataset
) and we are publishing it in figshare as something you can cite 
also we are working on publishing monthly data for researchers to track
article quality data change over time. 
As a pet project of mine, I always wanted to put these data in a database.
So we can query the database and get much more useful data. For example
quality of articles in category 'History_of_Essex'  . The weighed sum
is a measure of quality which is a decimal number between 0 (really stub)
to 5 (a definitely featured article). We have also prediction column which
is a number in this map  for example if prediction is 5, it means ORES
thinks it should be a featured article.
I leave more use cases to your imagination :)
I'm looking for a more permanent place to put these data, please tell me if
it's useful for you.
 ORES is not a anti-vandalism tool, it's an infrastructure to use AI in
 (117 MBs)
Wikitech-l mailing list
Wikitech-l mailing list