| GoranSMilovanovic added a comment. |
@abian @Lydia_Pintscher We have the results.
Method
- The power-law was estimated from 27,394,027 WD items that are currently used across the Wikimedia websites;
- that makes approximately 50% of items that are now present in WD (54,195,898 is the today's number);
- the statistic from which the power-law was estimated is the number of pages that make use of a particular item;
- estimation procedures from the {poweRlaw} R package were used.
Results
- Power-law behavior cannot be excluded,
- with the value of the scaling parameter (alpha) of 2.050451 (infinite distribution variance), and
- the value of the xmin parameter of 9 (in effect, this means: the distribution for all items with usage frequency >=9 does exhibit a power-law behavior).
- The following is the log(Rank) vs log(Pages) plot for all WD items with usage frequency >= 9 across the pages in our projects:
F28030400: logRank-logPages.png
Recommendation
- Protect all items that are used on 9 or more pages across the Wikimedia websites.
- There are 1,656,137 such items, which makes only 3.06% of the total number of items in WD, and only 6.05% of WD items that are currently in use.
Discussion
- If you can automate this, protecting 1,656,137 should not be a problem, I guess.
- Currently, the list of items that are recommended for protection encompasses only item IDs and the number of pages that make use of them;
- the list will be shared with @Lydia_Pintscher;
- it would take some time/engineering to get the English labels in, and
- the procedure to generate this list updated on regular daily basis would take approx. 3 - 4 hours for each run, but
- it cannot be established on our infrastructure before we have R upgraded, see my request in T214598.
So, until we have R upgraded on our systems, I recommend you ask for an updated list whenever you need one.
TASK DETAIL
EMAIL PREFERENCES
To: GoranSMilovanovic
Cc: AfroThundr3007730, GoranSMilovanovic, Lydia_Pintscher, abian, Aklapper, Nandana, Lahi, Gq86, QZanden, LawExplorer, _jensen, Wikidata-bugs, aude, Mbch331
Cc: AfroThundr3007730, GoranSMilovanovic, Lydia_Pintscher, abian, Aklapper, Nandana, Lahi, Gq86, QZanden, LawExplorer, _jensen, Wikidata-bugs, aude, Mbch331
_______________________________________________ Wikidata-bugs mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
