MichaelSchoenitzer added a comment.

So far all the examples for queries that are timing out are queries that produce massive amount of data or are very complex. But a simple pattern of a timing-out query I see very often is the following:

SELECT ?item ?sitelinks WHERE {
  ?item wdt:P31 wd:Q;
        wikibase:sitelinks ?sitelinks .
  MINUS { ?item ^schema:about/schema:isPartOf <https://de.wikipedia.org/> }
} ORDER BY DESC(?sitelinks) LIMIT 10

Getting the items of one area with the most sitelinks and optionally a specific sitelink missing. This is a widely used pattern which is used often to find out what the most "important" topics of an area are, often used to find white spots in Wikipedias coverage or to generate working list. These queries timeout as soon as the amount of articles in the area is to big. Which is quite often the case and getting more common the more wikidata grows. It doesn't matter what the limit is.

The query can be tuned a bit, if you can guess the amount of sitelinks needed to get into the top-10. But that is hard and only a small improvement and therefore often not sufficient. For example this query runs sometimes but most of the time it times-out:

SELECT ?item ?sitelinks WHERE {
  ?item wdt:P31 wd:Q5;
        wikibase:sitelinks ?sitelinks .
  filter (?sitelinks > 40).
  MINUS { ?item ^schema:about/schema:isPartOf <https://de.wikipedia.org/> }
} ORDER BY DESC(?sitelinks) LIMIT 10

Due to growth of Wikidata it will always time out in near future. Maybe you see any other tricks to improve this type of query?



To: MichaelSchoenitzer
Cc: MichaelSchoenitzer, Edgars2007, chasemp, Lydia_Pintscher, Magnus, MichaelSchoenitzer_WMDE, MisterSynergy, doctaxon, Jonas, Ash_Crow, Daniel_Mietchen, Lucas_Werkmeister_WMDE, Jane023, Base, Gehel, Smalyshev, Ijon, Aklapper, Lahi, Gq86, Darkminds3113, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, Avner, FloNight, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
Wikidata-bugs mailing list

Reply via email to