Jan_Dittrich created this task.
Jan_Dittrich added a project: Wikidata.
Restricted Application added a subscriber: Aklapper.
TASK DESCRIPTION
**User Story:**: As a PM/UX-Person I want to use an inequality score as part
of assessing community health
Based on the input of @GoranSMilovanovic, @guergana.tzatchkova and me, I
suggest and document the following:
Data retrieval
--------------
To calculate the inequality of account edit contributions, we need to count
edits for accounts. This means we need a way to select the accounts and a way
to count edits.
Account selection: All accounts that have been active in a certain timeframe.
I suggest two levels of granularity: Month and Year.
Counting Edits: I suggest two ways to count the edits: Edits done within that
timeframe and total edits ever done, measured at the end of this timeframe.
This is equivalent to measuring income and wealth, respectively in economics.
This gives us 4 tables:
| | Month | Year |
| --------------- | ----- | ----- |
| Edits-over-time | edits | edits |
| Total-at-time | edits | edits |
|
These tables could either be just a list of edits ("long form") or be
aggregated to a two column table that shows how often which edit count was in
the set: We counted X accounts to have an edit count of Y ("aggregated form")
Calculate hoover score
----------------------
The score could be calculated with existing R packages. If they are not fast
enough, we might need to consider optimization. The formula for hoover is quite
simple and vectorizable afaic , so I guess it might run just fine. Let's keep
in mind, that we need to run the scoring monthly and yearly, so it is not a
continuous load.
The resulting hover scores should be appended to tables of hoover scores.
Again, these will be 4 tables:
| | Monthly | Yearly |
| --------------- | ------- | ------ |
| Edits-over-time | hoover | hoover |
| Total-at-time | hoover | hoover |
|
and each table has two columns, one for the point in time and one for the
according score at this moment (which way of measurement needs to be stated in
the table’s name or in some metadata or in an extra "type-cell" for each row
(which would be quite redundant)
Presenting the scores
---------------------
In the simplest case, we only provide the data and can see how it changes
over time by importing it to excel or the like. Even better if we have a tool
that visualizes the data:
F33945562: image.png <https://phabricator.wikimedia.org/F33945562>
(The wireframe has no timeframe selection, as the data will be small enough
to just scroll back)
TASK DETAIL
https://phabricator.wikimedia.org/T270109
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: Jan_Dittrich
Cc: WMDE-leszek, Jan_Dittrich, GoranSMilovanovic, Aklapper,
guergana.tzatchkova, Akuckartz, Nandana, Lahi, Gq86, QZanden, LawExplorer,
_jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
_______________________________________________
Wikidata-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs