abian created this task.
abian added projects: Wikidata, Wikidata-Query-Service.
Herald added a subscriber: Aklapper.
Herald added a project: Discovery.

TASK DESCRIPTION

There are many entities in Wikidata and processing them all is too expensive for certain purposes. However, for statistical purposes (for example, to get any kind of proportion of completeness, consistency, etc.), it's not necessary to retrieve and process them all, a small subset can be enough if representative (random).

Currently, it's hard to retrieve a random data set from Wikidata because:

  • the Wikidata Query Service doesn't retrieve entities randomly;
  • Special:Random requires two requests for every retrieved entity (first, a HTTP GET to Special:Random; then, a HTTP GET to the suggested item), doesn't support filters, and offers no significant advantage over directly generating random integers and addressing HTTP requests to the corresponding URIs.

It would be useful to have either:

  • the possibility of randomly retrieving data through the Wikidata Query Service (best option), or
  • a new tool to download an arbitrary number of random entities from Wikidata as a single file on demand.

TASK DETAIL
https://phabricator.wikimedia.org/T194884

EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: abian
Cc: abian, Aklapper, Lahi, Gq86, Darkminds3113, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, Avner, Gehel, Jonas, FloNight, Xmlizer, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
_______________________________________________
Wikidata-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to