Jan_Dittrich created this task.
Jan_Dittrich added a project: Wikidata.
Restricted Application added a subscriber: Aklapper.

TASK DESCRIPTION
  **User Story:**: As a PM/UX-Person I want to use an inequality score as part 
of assessing community health
  
  Based on the input of @GoranSMilovanovic, @guergana.tzatchkova  and me, I 
suggest and document the following:
  
  Data retrieval
  --------------
  
  To calculate the inequality of account edit contributions, we need to count 
edits for accounts. This means we need a way to select the accounts and a way 
to count edits.
  
  Account selection: All accounts that have been active in a certain timeframe. 
I suggest two levels of granularity: Month and Year.
  
  Counting Edits: I suggest two ways to count the edits: Edits done within that 
timeframe and total edits ever done, measured at the end of this timeframe. 
This is equivalent to measuring income and wealth, respectively in economics.
  
  This gives us 4 tables:
  
  |                 | Month | Year  |
  | --------------- | ----- | ----- |
  | Edits-over-time | edits | edits |
  | Total-at-time   | edits | edits |
  |
  
  These tables could either be just a list of edits ("long form") or be 
aggregated to a two column table that shows how often which edit count was in 
the set: We counted X accounts to have an edit count of Y ("aggregated form")
  
  Calculate hoover score
  ----------------------
  
  The score could be calculated with existing R packages. If they are not fast 
enough, we might need to consider optimization. The formula for hoover is quite 
simple and vectorizable afaic , so I guess it might run just fine. Let's keep 
in mind, that we need to run the scoring monthly and yearly, so it is not a 
continuous load.
  
  The resulting hover scores should be appended to tables of hoover scores. 
Again, these will be 4 tables:
  
  |                 | Monthly | Yearly |
  | --------------- | ------- | ------ |
  | Edits-over-time | hoover  | hoover |
  | Total-at-time   | hoover  | hoover |
  |
  
  and each table has two columns, one for the point in time and one for the 
according score at this moment (which way of measurement needs to be stated in 
the table’s name or in some metadata or in an extra "type-cell" for each row 
(which would be quite redundant)
  
  Presenting the scores
  ---------------------
  
  In the simplest case, we only provide the data and can see how it changes 
over time by importing it to excel or the like. Even better if we have a tool 
that visualizes the data:
  
  F33945562: image.png <https://phabricator.wikimedia.org/F33945562>
  
  (The wireframe has no timeframe selection, as the data will be small enough 
to just scroll back)

TASK DETAIL
  https://phabricator.wikimedia.org/T270109

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Jan_Dittrich
Cc: WMDE-leszek, Jan_Dittrich, GoranSMilovanovic, Aklapper, 
guergana.tzatchkova, Akuckartz, Nandana, Lahi, Gq86, QZanden, LawExplorer, 
_jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
_______________________________________________
Wikidata-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to