Lucas_Werkmeister_WMDE added a comment.
A closer look at the statement usages on Swedish Wikipedia (probably the
closest approximation of Cebuano Wikipedia, with both having many
Ljsbot-generated articles?):
MariaDB [svwiki]> SELECT COUNT(*), AVG(count) FROM (SELECT eu_entity_id,
eu_page_id, COUNT(*) AS count FROM wbc_entity_usage WHERE eu_aspect LIKE 'C.%'
GROUP BY eu_entity_id, eu_page_id) AS counts;
+----------+------------+
| COUNT(*) | AVG(count) |
+----------+------------+
| 4162100 | 3.7754 |
+----------+------------+
1 row in set (4 min 37.571 sec)
MariaDB [svwiki]> SELECT COUNT(*) FROM wbc_entity_usage WHERE eu_aspect =
'C';
+----------+
| COUNT(*) |
+----------+
| 238389 |
+----------+
1 row in set (6.527 sec)
There are only some 240k “merged” statement usages, compared to 4.2M
page-entity pairs having less than 33 statement usages (i.e. they didn’t get
their “C” usages merged); out of those, the average is just under 4 statement
usages per page-entity pair. If we assume that each “O” row gets turned into
four new rows (three point something “C” rows plus a few remaining actual “O”
rows), we arrive at a new estimate of 28181823 cebwiki rows, just a 2× increase.
MariaDB [cebwiki]> SELECT SUM(IF(eu_aspect = 'O', 4, 1)) AS estimate,
COUNT(*) AS current, SUM(IF(eu_aspect = 'O', 4, 1)) / COUNT(*) AS factor FROM
wbc_entity_usage;
+----------+----------+--------+
| estimate | current | factor |
+----------+----------+--------+
| 28181823 | 14499273 | 1.9437 |
+----------+----------+--------+
1 row in set (6.906 sec)
For Wikimedia Commons, I suspect the average number of statement usages per
page-entity pair would be rather higher, but I’m not sure if we can get a
decent estimate for it.
TASK DETAIL
https://phabricator.wikimedia.org/T188730
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: Lucas_Werkmeister_WMDE
Cc: Michael, Lucas_Werkmeister_WMDE, hoo, daniel, zhuyifei1999, Eloquence,
Lydia_Pintscher, Sannita, Ainali, Liuxinyu970226, MZMcBride, Ricordisamoa,
iecetcwcpggwqpgciazwvzpfjpwomjxn, jayvdb, Daniel_Mietchen, Tobi_WMDE_SW,
Legoktm, Abraham, greg, Wikidata-bugs, liangent, jeremyb, aude, Bianjiang,
Aklapper, DixonD, PokestarFan, Ladsgroup, karapayneWMDE, Invadibot,
maantietaja, Y.ssk, Muchiri124, CBogen, Akuckartz, Nandana, lucamauri, Lahi,
Gq86, GoranSMilovanovic, QZanden, LawExplorer, Poyekhali, _jensen, rosalieper,
Taiwania_Justo, Scott_WUaS, Ixocactus, Wong128hk, El_Grafo, Dinoguy1000,
Addshore, Steinsplitter, Mbch331
_______________________________________________
Wikidata-bugs mailing list -- [email protected]
To unsubscribe send an email to [email protected]