VladimirAlexiev added a comment.

  I used queries like this to compare counts between wikidata and our wdtruthy 
service.
  
  - For the first 2 queries we use `count(distinct ?x)`: they have prop path 
but a small result population
  - For the other queries we use `count(*)` because it's much faster
  
    PREFIX wikibase: <http://wikiba.se/ontology#>
    PREFIX wd: <http://www.wikidata.org/entity/>
    PREFIX wdt: <http://www.wikidata.org/prop/direct/>
    select (count(distinct ?x) as ?c) { 
      ?x wdt:P31/wdt:P279* wd:Q783794
    }
  
  
  
  | count of                               | pattern                            
                 | wikidata    | wdtruthy    | wdtruthy note                    
          |
  | -------------------------------------- | 
--------------------------------------------------- | ----------- | ----------- 
| ------------------------------------------ |
  | companies                              | `?x wdt:P31/wdt:P279* wd:Q783794`  
                 | 283,010     | 287,330     | Has 4320 (1.5%) more??           
          |
  | companies with sameAs (i.e. merged TO) | `?x wdt:P31/wdt:P279* wd:Q783794. 
?y owl:sameAs ?x` | 26,805      | 26,345      | Missed 460 (1.8%) of merges     
           |
  | items with statements (i.e. all items) | `?x wikibase:statements []`        
                 | 107,281,020 | 79,341,571  | Items that were incrementally 
updated      |
  | items with type (i.e. all items)       | `?x a wikibase:Item`               
                 | 0           | 25,681,023  | Items that were initially loaded 
from dump |
  | total items                            |                                    
                 | 107,281,020 | 105,022,594 | Missing 2,258,426 (2.1%) of all 
items      |
  |
  
  The "missing" in the last column pertain to our "wdtruthy" service.
  But the two rows "Items with" confirm the observation above:
  
  - incremental updates include `wikibase:statements` but not `wikibase:Item`
  - the dump misses `wikibase:statements` but has `wikibase:Item`

TASK DETAIL
  https://phabricator.wikimedia.org/T270764

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: VladimirAlexiev
Cc: Lydia_Pintscher, mkroetzsch, Nicksinch, Aklapper, VladimirAlexiev, 
Danny_Benjafield_WMDE, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, 
ItamarWMDE, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
_______________________________________________
Wikidata-bugs mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to