VladimirAlexiev added a comment.
The full dump is 15B triples (you can see this here https://query.wikidata.org/bigdata/ldf). WDtruthy is 6.5B triples (we have it in GraphDB, continuously updating). Adding the counts will add 320M, or 5%. Actually I'm wrong about site links: they are present (or were added hence). This query times out on WD. It takes 5min on GraphDB select ?type (count(*) as ?c) { ?x schema:about ?y; a ?type } group by ?type and returns: - schema:Dataset 29,045,390: that's only a third of what I expected - schema:Article 79,001,363 (site links): that's also a bit small. - https://stats.wikimedia.org/#/all-wikipedia-projects says "198M pages to date" - but that probably includes non-content pages - on the other hand, there are many other sites, eg Commons categories... - I tried to check with this query on WD but it also timed out: select (sum(?links) as ?total) { ?x wikibase:sitelinks ?links } TASK DETAIL https://phabricator.wikimedia.org/T270764 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: VladimirAlexiev Cc: Lydia_Pintscher, mkroetzsch, Nicksinch, Aklapper, VladimirAlexiev, Danny_Benjafield_WMDE, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
_______________________________________________ Wikidata-bugs mailing list -- [email protected] To unsubscribe send an email to [email protected]
