Yurik created this task.
Yurik added projects: Wikidata-Query-Service, Analytics.
Herald added a subscriber: Aklapper.
Herald added projects: Wikidata, Discovery.

TASK DESCRIPTION

At this point, the only way to rank various Wikidata results is to order them by sitelink-count. This offers a fairly good indicator of how many different languages/cultures are interested in a topic, but is not very accurate, especially when a topic is mostly related to a single language

I propose we introduce a new type of entries to WDQS:

sparql
  # Naming is TBD
  <https://en.wikipedia.org/wiki/Albert_Einstein>   prefix:total_page_views   [integer] .
  <https://en.wikipedia.org/wiki/Albert_Einstein>   prefix:last_24h_page_views   [integer] .

Some script would download files from dumps, and increment the counters once an hour. The updates should happen in bulk. Each file is about 5 million entries (<40MB gz).

Additionally, we may want to keep the total for the last 24 hours - a bit trickier, but also very doable - e.g. by keeping the totals for the last 24 files in memory, and uploading the deltas.

P.S. I am hacking on it at the moment (python). Need naming suggestions for the predicate.


TASK DETAIL
https://phabricator.wikimedia.org/T174981

EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Yurik
Cc: Smalyshev, Aklapper, Yurik, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, Avner, debt, Gehel, Jonas, FloNight, Xmlizer, Izno, JAllemandou, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331, jeremyb
_______________________________________________
Wikidata-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to