GoranSMilovanovic added a comment.

  @Manuel
  
  - A new dataset is produced, encompassing the following fields:
  
  - **class**: a Wikidata class
  - **num_items**: number of items in the class (via instanceOf, subclassOf, or 
partOf)
  - **avg_score**: the average ORES score in this class (A = 5, B = 4, C = 4, D 
= 2, E = 1)
  - **med_score**: the median ORES scores in this class
  - **sum_reuse**: the sum of WDCM re-use statistics for all items in this 
class (the class "total reuse")
  - **avg_reuse**: the average of WDCM re-use statistics for all items in this 
class (the class "mean reuse")
  - **med_reuse**:  the median of WDCM re-use statistics for all items in this 
class (the class "median reuse")
  - **num_reused**: the number of re-used items in this class
  - **last_revision**: the timestamp of the latest revision made on any item in 
this class
  - **human_edits**: the total number of human edits made on the items in this 
class
  - **bot_edits**: the total number of bot edits made on the items in this 
class.
  
  All statistics are based on the latest available Wikidata JSON dump snapshots 
in hdfs and the latest snapshot of the wmf.mediawiki_history table.
  
  The previously encountered discrepancy in the number of classes present in 
the data quality (ORES) dataset and the Human vs Bot edits dataset is resolved.
  
  The dataset `.csv` is large and will be shared via Google Drive.

TASK DETAIL
  https://phabricator.wikimedia.org/T285458

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: GoranSMilovanovic
Cc: Ladsgroup, Lydia_Pintscher, Tobi_WMDE_SW, Manuel, GoranSMilovanovic, 
Aklapper, Invadibot, maantietaja, Akuckartz, Nandana, Lahi, Gq86, QZanden, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
_______________________________________________
Wikidata-bugs mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to