VladimirAlexiev added a comment.
I used queries like this to compare counts between wikidata and our wdtruthy
service.
- For the first 2 queries we use `count(distinct ?x)`: they have prop path
but a small result population
- For the other queries we use `count(*)` because it's much faster
PREFIX wikibase: <http://wikiba.se/ontology#>
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
select (count(distinct ?x) as ?c) {
?x wdt:P31/wdt:P279* wd:Q783794
}
| count of | pattern
| wikidata | wdtruthy | wdtruthy note
|
| -------------------------------------- |
--------------------------------------------------- | ----------- | -----------
| ------------------------------------------ |
| companies | `?x wdt:P31/wdt:P279* wd:Q783794`
| 283,010 | 287,330 | Has 4320 (1.5%) more??
|
| companies with sameAs (i.e. merged TO) | `?x wdt:P31/wdt:P279* wd:Q783794.
?y owl:sameAs ?x` | 26,805 | 26,345 | Missed 460 (1.8%) of merges
|
| items with statements (i.e. all items) | `?x wikibase:statements []`
| 107,281,020 | 79,341,571 | Items that were incrementally
updated |
| items with type (i.e. all items) | `?x a wikibase:Item`
| 0 | 25,681,023 | Items that were initially loaded
from dump |
| total items |
| 107,281,020 | 105,022,594 | Missing 2,258,426 (2.1%) of all
items |
|
The "missing" in the last column pertain to our "wdtruthy" service.
But the two rows "Items with" confirm the observation above:
- incremental updates include `wikibase:statements` but not `wikibase:Item`
- the dump misses `wikibase:statements` but has `wikibase:Item`
TASK DETAIL
https://phabricator.wikimedia.org/T270764
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: VladimirAlexiev
Cc: Lydia_Pintscher, mkroetzsch, Nicksinch, Aklapper, VladimirAlexiev,
Danny_Benjafield_WMDE, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja,
ItamarWMDE, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden,
LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
_______________________________________________
Wikidata-bugs mailing list -- [email protected]
To unsubscribe send an email to [email protected]