GoranSMilovanovic added a comment.
@Manuel - A new dataset is produced, encompassing the following fields: - **class**: a Wikidata class - **num_items**: number of items in the class (via instanceOf, subclassOf, or partOf) - **avg_score**: the average ORES score in this class (A = 5, B = 4, C = 4, D = 2, E = 1) - **med_score**: the median ORES scores in this class - **sum_reuse**: the sum of WDCM re-use statistics for all items in this class (the class "total reuse") - **avg_reuse**: the average of WDCM re-use statistics for all items in this class (the class "mean reuse") - **med_reuse**: the median of WDCM re-use statistics for all items in this class (the class "median reuse") - **num_reused**: the number of re-used items in this class - **last_revision**: the timestamp of the latest revision made on any item in this class - **human_edits**: the total number of human edits made on the items in this class - **bot_edits**: the total number of bot edits made on the items in this class. All statistics are based on the latest available Wikidata JSON dump snapshots in hdfs and the latest snapshot of the wmf.mediawiki_history table. The previously encountered discrepancy in the number of classes present in the data quality (ORES) dataset and the Human vs Bot edits dataset is resolved. The dataset `.csv` is large and will be shared via Google Drive. TASK DETAIL https://phabricator.wikimedia.org/T285458 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: GoranSMilovanovic Cc: Ladsgroup, Lydia_Pintscher, Tobi_WMDE_SW, Manuel, GoranSMilovanovic, Aklapper, Invadibot, maantietaja, Akuckartz, Nandana, Lahi, Gq86, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
_______________________________________________ Wikidata-bugs mailing list -- [email protected] To unsubscribe send an email to [email protected]
