VladimirAlexiev added a comment.

  The full dump is 15B triples (you can see this here 
https://query.wikidata.org/bigdata/ldf).
  WDtruthy is 6.5B triples (we have it in GraphDB, continuously updating).
  Adding the counts will add 320M, or 5%.
  
  Actually I'm wrong about site links: they are present (or were added hence).
  This query times out on WD. It takes 5min on GraphDB
  
    select ?type (count(*) as ?c) { 
        ?x schema:about ?y; a ?type
    } group by ?type
  
  and returns:
  
  - schema:Dataset 29,045,390: that's only a third of what I expected
  - schema:Article 79,001,363 (site links): that's also a bit small.
    - https://stats.wikimedia.org/#/all-wikipedia-projects says "198M pages to 
date"
      - but that probably includes non-content pages
      - on the other hand, there are many other sites, eg Commons categories...
    - I tried to check with this query on WD but it also timed out:
  
    select (sum(?links) as ?total) { 
      ?x wikibase:sitelinks ?links
    }

TASK DETAIL
  https://phabricator.wikimedia.org/T270764

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: VladimirAlexiev
Cc: Lydia_Pintscher, mkroetzsch, Nicksinch, Aklapper, VladimirAlexiev, 
Danny_Benjafield_WMDE, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, 
ItamarWMDE, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
_______________________________________________
Wikidata-bugs mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to