Smalyshev added a subscriber: Smalyshev.
Smalyshev added a comment.

Yeah, 13 min queries is not really the best idea I'm afraid. Also, `?wds a 
wikibase:Statement` should not have worked on query.wikidata.org since it 
strips those (see 
https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format#WDQS_data_differences)
 but can be run from raw dump of course. In fact, you do not need the "a 
statement" part since only statements are ever on the right of p: predicates 
anyway.

I also don't think distinct is needed in the last query since having the same 
reference twice it pretty rare I think. And, "OPTIONAL" part may be omitted too 
maybe since if you enumerate all statements and then remove the ones with 
non-zero counts, you get the ones with zero counts (e.g. MINUS operator could 
do it).

With these modifications, query like:

  prefix wikibase: <http://wikiba.se/ontology#>
  prefix wdt: <http://www.wikidata.org/prop/direct/>
  prefix prov: <http://www.w3.org/ns/prov#>
  prefix wd: <http://www.wikidata.org/entity/>
  prefix p: <http://www.wikidata.org/prop/>
  
  SELECT ?wds (count(?o) AS ?ocount) WHERE {
    ?s p:P227 ?wds .
    ?wds prov:wasDerivedFrom ?o .
  } GROUP BY ?wds

runs for me in 26 s. Of course, I may be missing something here.

In general, the query service may not be very suited for queries that require 
touching whole or significant part of the database, they will be slow. Going 
over 300K+ entities one by one has to take some time.


TASK DETAIL
  https://phabricator.wikimedia.org/T120166

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Smalyshev
Cc: Smalyshev, Jheald, daniel, Lydia_Pintscher, Aklapper, Christopher, 
StudiesWorld, jkroll, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, 
JeroenDeDauw, Mbch331



_______________________________________________
Wikidata-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to