Lucas_Werkmeister_WMDE added a comment.
> Do you have a picture of how many violations, and violations split by type
/ kind exist on Wikidata at any given time?
Looks like we’re averaging about 1⅔ constraint violations per item (excluding
redirects, if I’m not mistaken):
$ curl -s
'https://www.wikidata.org/w/api.php?action=query&list=random&rnnamespace=0&rnlimit=500&format=json&formatversion=2'
| jq -r '.query.random | .[] | .title' | while IFS= read -r id; do curl -s
"https://www.wikidata.org/w/api.php?action=wbcheckconstraints&id=$id&format=json&formatversion=2"
| jq -c '.. | .results? | select(length > 0) | .[] | 1'; done | wc -l
853
$ curl -s
'https://www.wikidata.org/w/api.php?action=query&list=random&rnnamespace=0&rnlimit=500&format=json&formatversion=2'
| jq -r '.query.random | .[] | .title' | while IFS= read -r id; do curl -s
"https://www.wikidata.org/w/api.php?action=wbcheckconstraints&id=$id&format=json&formatversion=2"
| jq -c '.. | .results? | select(length > 0) | .[] | 1'; done | wc -l
806
$ curl -s
'https://www.wikidata.org/w/api.php?action=query&list=random&rnnamespace=0&rnlimit=500&format=json&formatversion=2'
| jq -r '.query.random | .[] | .title' | while IFS= read -r id; do curl -s
"https://www.wikidata.org/w/api.php?action=wbcheckconstraints&id=$id&format=json&formatversion=2"
| jq -c '.. | .results? | select(length > 0) | .[] | 1'; done | wc -l
811
$ units -t '(853 + 806 + 811) / 1500'
1.6466667
Properties have substantially more:
$ curl -s
'https://www.wikidata.org/w/api.php?action=query&list=random&rnnamespace=120&rnlimit=500&format=json&formatversion=2'
| jq -r '.query.random | .[] | .title' | while IFS= read -r title; do curl -s
"https://www.wikidata.org/w/api.php?action=wbcheckconstraints&id=${title#*:}&format=json&formatversion=2"
| jq -c '
.. | .results? | select(length > 0) | .[] | 1'; done | wc -l
2987
$ units -t '2987 / 500'
5.974
Lexemes less:
$ curl -s
'https://www.wikidata.org/w/api.php?action=query&list=random&rnnamespace=146&rnlimit=500&format=json&formatversion=2'
| jq -r '.query.random | .[] | .title' | while IFS= read -r title; do curl -s
"https://www.wikidata.org/w/api.php?action=wbcheckconstraints&id=${title#*:}&format=json&formatversion=2"
| jq -c '.. | .results? | select(length > 0) | .[] | 1'; done | wc -l
187
$ units -t '187 / 500'
0.374
That would suggest some 155 million constraint violations in total (almost
all of them on items):
$ units -t '((853 + 806 + 811) * 94504342 / 1500) + (2987 * 9166 / 500) +
(187 * 578263 / 500)' 'million'
155.88818
TASK DETAIL
https://phabricator.wikimedia.org/T201150
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: Lucas_Werkmeister_WMDE
Cc: Lucas_Werkmeister_WMDE, So9q, VladimirAlexiev, CamelCaseNick,
Harmonia_Amanda, Addshore, Jonas, Aklapper, Invadibot, MPhamWMF, maantietaja,
CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, GoranSMilovanovic,
QZanden, EBjune, Esc3300, merbst, LawExplorer, _jensen, rosalieper, Agabi10,
Scott_WUaS, Xmlizer, abian, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984,
Manybubbles, Mbch331
_______________________________________________
Wikidata-bugs mailing list -- [email protected]
To unsubscribe send an email to [email protected]