MichaelSchoenitzer added a comment.

That's the reason why queries time out. There's no magic to it - queries that time out are those that require the engine to process a huge amount of data.

Well that's obvious. But what I was referring to is that while above examples produce huge amount of data in the output and can therefore never be significantly faster by definition, the example I gave does not give big amounts of data as output.
It is slow because the SPARQL-Query first generates a list of all humans – which is huge – then sorts the list and only then applies the limit. Also the filter is only applied after generating the list – thus not improving the query time much. There is no way to avoid the generation of the huge interim list in the first place now - but there could be!
So in contrary to the queries my Magnus and others this type of query could be faster.
A (not very nice but working) way to do so would be to add a binned or qualitative information to the RDF-Database containing an estimate of the number of site-links, so that on could add something like:

?item wikibase:popularity wikibase:verypopular.

to restrain the list to items with more than some-constant number of site links. Not very pretty, but something like that could be implemented to improve this very common and still very inefficient pattern described above.



To: MichaelSchoenitzer
Cc: MichaelSchoenitzer, Edgars2007, chasemp, Lydia_Pintscher, Magnus, MichaelSchoenitzer_WMDE, MisterSynergy, doctaxon, Jonas, Ash_Crow, Daniel_Mietchen, Lucas_Werkmeister_WMDE, Jane023, Base, Gehel, Smalyshev, Ijon, Aklapper, Lahi, Gq86, Darkminds3113, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, Avner, FloNight, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
Wikidata-bugs mailing list

Reply via email to