Lucas_Werkmeister_WMDE added a comment.

  > Do you have a picture of how many violations, and violations split by type 
/ kind exist on Wikidata at any given time?
  
  Looks like we’re averaging about 1⅔ constraint violations per item (excluding 
redirects, if I’m not mistaken):
  
    $ curl -s 
'https://www.wikidata.org/w/api.php?action=query&list=random&rnnamespace=0&rnlimit=500&format=json&formatversion=2'
 | jq -r '.query.random | .[] | .title' | while IFS= read -r id; do curl -s 
"https://www.wikidata.org/w/api.php?action=wbcheckconstraints&id=$id&format=json&formatversion=2";
 | jq -c '.. | .results? | select(length > 0) | .[] | 1'; done | wc -l          
                                                                                
                                                                                
                                                                                
                                                  
    853
    $ curl -s 
'https://www.wikidata.org/w/api.php?action=query&list=random&rnnamespace=0&rnlimit=500&format=json&formatversion=2'
 | jq -r '.query.random | .[] | .title' | while IFS= read -r id; do curl -s 
"https://www.wikidata.org/w/api.php?action=wbcheckconstraints&id=$id&format=json&formatversion=2";
 | jq -c '.. | .results? | select(length > 0) | .[] | 1'; done | wc -l
    806
    $ curl -s 
'https://www.wikidata.org/w/api.php?action=query&list=random&rnnamespace=0&rnlimit=500&format=json&formatversion=2'
 | jq -r '.query.random | .[] | .title' | while IFS= read -r id; do curl -s 
"https://www.wikidata.org/w/api.php?action=wbcheckconstraints&id=$id&format=json&formatversion=2";
 | jq -c '.. | .results? | select(length > 0) | .[] | 1'; done | wc -l
    811
    $ units -t '(853 + 806 + 811) / 1500'
    1.6466667
  
  Properties have substantially more:
  
    $ curl -s 
'https://www.wikidata.org/w/api.php?action=query&list=random&rnnamespace=120&rnlimit=500&format=json&formatversion=2'
 | jq -r '.query.random | .[] | .title' | while IFS= read -r title; do curl -s 
"https://www.wikidata.org/w/api.php?action=wbcheckconstraints&id=${title#*:}&format=json&formatversion=2";
 | jq -c '
    .. | .results? | select(length > 0) | .[] | 1'; done | wc -l
    2987
    $ units -t '2987 / 500'
    5.974
  
  Lexemes less:
  
    $ curl -s 
'https://www.wikidata.org/w/api.php?action=query&list=random&rnnamespace=146&rnlimit=500&format=json&formatversion=2'
 | jq -r '.query.random | .[] | .title' | while IFS= read -r title; do curl -s 
"https://www.wikidata.org/w/api.php?action=wbcheckconstraints&id=${title#*:}&format=json&formatversion=2";
 | jq -c '.. | .results? | select(length > 0) | .[] | 1'; done | wc -l
    187
    $ units -t '187 / 500'
    0.374
  
  That would suggest some 155 million constraint violations in total (almost 
all of them on items):
  
    $ units -t '((853 + 806 + 811) * 94504342 / 1500) + (2987 * 9166 / 500) + 
(187 * 578263 / 500)' 'million'
    155.88818

TASK DETAIL
  https://phabricator.wikimedia.org/T201150

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Lucas_Werkmeister_WMDE
Cc: Lucas_Werkmeister_WMDE, So9q, VladimirAlexiev, CamelCaseNick, 
Harmonia_Amanda, Addshore, Jonas, Aklapper, Invadibot, MPhamWMF, maantietaja, 
CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, GoranSMilovanovic, 
QZanden, EBjune, Esc3300, merbst, LawExplorer, _jensen, rosalieper, Agabi10, 
Scott_WUaS, Xmlizer, abian, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, 
Manybubbles, Mbch331
_______________________________________________
Wikidata-bugs mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to