Lucas_Werkmeister_WMDE added a comment.

  A closer look at the statement usages on Swedish Wikipedia (probably the 
closest approximation of Cebuano Wikipedia, with both having many 
Ljsbot-generated articles?):
  
    MariaDB [svwiki]> SELECT COUNT(*), AVG(count) FROM (SELECT eu_entity_id, 
eu_page_id, COUNT(*) AS count FROM wbc_entity_usage WHERE eu_aspect LIKE 'C.%' 
GROUP BY eu_entity_id, eu_page_id) AS counts;
    +----------+------------+
    | COUNT(*) | AVG(count) |
    +----------+------------+
    |  4162100 |     3.7754 |
    +----------+------------+
    1 row in set (4 min 37.571 sec)
    
    MariaDB [svwiki]> SELECT COUNT(*) FROM wbc_entity_usage WHERE eu_aspect = 
'C';
    +----------+
    | COUNT(*) |
    +----------+
    |   238389 |
    +----------+
    1 row in set (6.527 sec)
  
  There are only some 240k “merged” statement usages, compared to 4.2M 
page-entity pairs having less than 33 statement usages (i.e. they didn’t get 
their “C” usages merged); out of those, the average is just under 4 statement 
usages per page-entity pair. If we assume that each “O” row gets turned into 
four new rows (three point something “C” rows plus a few remaining actual “O” 
rows), we arrive at a new estimate of 28181823 cebwiki rows, just a 2× increase.
  
    MariaDB [cebwiki]> SELECT SUM(IF(eu_aspect = 'O', 4, 1)) AS estimate, 
COUNT(*) AS current, SUM(IF(eu_aspect = 'O', 4, 1)) / COUNT(*) AS factor FROM 
wbc_entity_usage;
    +----------+----------+--------+
    | estimate | current  | factor |
    +----------+----------+--------+
    | 28181823 | 14499273 | 1.9437 |
    +----------+----------+--------+
    1 row in set (6.906 sec)
  
  For Wikimedia Commons, I suspect the average number of statement usages per 
page-entity pair would be rather higher, but I’m not sure if we can get a 
decent estimate for it.

TASK DETAIL
  https://phabricator.wikimedia.org/T188730

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Lucas_Werkmeister_WMDE
Cc: Michael, Lucas_Werkmeister_WMDE, hoo, daniel, zhuyifei1999, Eloquence, 
Lydia_Pintscher, Sannita, Ainali, Liuxinyu970226, MZMcBride, Ricordisamoa, 
iecetcwcpggwqpgciazwvzpfjpwomjxn, jayvdb, Daniel_Mietchen, Tobi_WMDE_SW, 
Legoktm, Abraham, greg, Wikidata-bugs, liangent, jeremyb, aude, Bianjiang, 
Aklapper, DixonD, PokestarFan, Ladsgroup, karapayneWMDE, Invadibot, 
maantietaja, Y.ssk, Muchiri124, CBogen, Akuckartz, Nandana, lucamauri, Lahi, 
Gq86, GoranSMilovanovic, QZanden, LawExplorer, Poyekhali, _jensen, rosalieper, 
Taiwania_Justo, Scott_WUaS, Ixocactus, Wong128hk, El_Grafo, Dinoguy1000, 
Addshore, Steinsplitter, Mbch331
_______________________________________________
Wikidata-bugs mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to