GoranSMilovanovic added a comment.

  @Jan_Dittrich @awight
  
  In reference to T282563#7186386 
<https://phabricator.wikimedia.org/T282563#7186386> and T282563#7226336 
<https://phabricator.wikimedia.org/T282563#7226336>:
  
  - I have used a fresh dataset, relying on the `2021-06` snapshot of the 
`wmf.mediawiki_history table`;
  - the results are fully replicated (in qualitative sense, of course);
  - I have also filtered out all editors who have less than six (6) months of 
presence in Wikidata, simply because they never really had a chance to leave 
(where "left Wikidata" is defined as five (5) months of inactivity).
  
  **The Lindy Effect**
  
  I have used several different operational definitions of the "length of past 
activity" to illustrate the Lindy Effect in Wikidata editing.
  
  **A. The total number of active months in editor's revision history**
  
  So, and editor can be active and inactive now and than; this measure of 
"length of past activity" is defined as the count of months in which and editor 
was active given the whole course of their presence in Wikidata since 
registration.
  The vertical axis represents the probability to leave Wikidata given the 
count of active months.
  
  F34570577: 01_LindyA.png <https://phabricator.wikimedia.org/F34570577>
  
  **B. The probability of an active month**
  
  The previous measure could be criticized on the grounds that it is not the 
same if (a) someone has ten active months while being registered a year ago and 
if (b) someone has ten active months while being registered three years ago. I 
have turned the absolute counts of active months per editor into proportions of 
their total stay in Wikidata since registration (effectively calculating the 
probability of any given month in the editor's revision history being an active 
month). 
  The horizontal axis is the probability to have an active month in course of 
one's revision history, binned into 100 intervals. The vertical axis represents 
the probability to leave Wikidata given the count of active months.
  
  F34570592: 02_LindyA.png <https://phabricator.wikimedia.org/F34570592>
  
  **C. The age of the account**
  This is simple yet probably inconclusive in respect to the Lindy Effect 
itself: how old is their account vs what is the probability that they have left 
Wikidata (i.e. are now inactive for five months at least)?
  
  F34570590: 03_LindyA.png <https://phabricator.wikimedia.org/F34570590>
  
  **The distribution of the number of revisions vs left or did not left 
Wikidata**
  The horizontal axis represents the log of the number of revisions, while the 
vertical axis is probability density. Obviously, those who are still with us 
are those who made more edits until now - as expected.
  
  F34570594: 04_RevisionsVSLeftWikidata.png 
<https://phabricator.wikimedia.org/F34570594>
  
  Here are the descriptive statistics on revisions:
  
  **Left Wikidata:**
  
    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
    1       1       2     203       7 5891740
  
  **Active on Wikidata:**
  
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
    2       19      108    15268      720 31003903
  
  **The distribution of the length of inactivity periods vs left or did not 
left Wikidata**
  A single editor can have several periods of inactivity of varying length in 
months. I have analyzed the distribution of both mean and median length of 
inactivity periods per user, grouped according to whether they are still 
editing or not.
  
  Mean length of inactivity periods first:
  
  F34570597: 05_MeanLengthInactiveVSLeftWikidata.png 
<https://phabricator.wikimedia.org/F34570597>
  
  Obviously, the editors who are still active typically have way less prolonged 
sequences of inactive months.
  
  The descriptive statistics on mean length of inactivity periods:
  
  **Left Wikidata:**
  
    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
    1.429  14.500  30.000  37.185  56.000 105.000 
  
  **Active on Wikidata:**
  
    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
      1.000   1.875   3.000   4.942   5.600  77.000      88
  
  **N.B.** `NA's` represent those editors who did not have a single inactive 
month in their revision history.
  
  And now for the median length of inactivity periods:
  
  F34570602: 06_MedianLengthInactiveVSLeftWikidata.png 
<https://phabricator.wikimedia.org/F34570602>
  
  The descriptive statistics:
  
  **Left Wikidata:**
  
    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
    1.00   13.00   30.00   36.52   56.00  105.00
  
  **Active on Wikidata:**
  
    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
    1.000   1.000   2.000   3.609   4.000  77.000      88
  
  **N.B.** `NA's` represent those editors who did not have a single inactive 
month in their revision history.
  
  My present conclusions:
  
  - The Lindy Effect holds in Wikidata editing: the lengthier the past editing 
behavior higher the chances that it will persist;
  - As expected, currently active Wikidata editors made more revisions in the 
past in comparison to those who are now inactive;
  - Currently active Wikidata editors have less prolonged periods of inactivity 
on the average (and measured in months) relative to those who are now inactive.
  
  **What is missing from this analysis?**
  
  This is missing from T282563#7186386 
<https://phabricator.wikimedia.org/T282563#7186386>:
  
  > ... user behavior on talk pages
  
  because it takes another ETL run through the `wmf.mediawiki_history` table; I 
will try to produce that dataset tonight, join with the existing data, and 
report upon it. Sincerely: I do not expected any other finding to emerge then 
that active editors make more revisions on talk pages.
  
  @Jan_Dittrich I did not find enough time to focus on all the papers that you 
have shared (and for which I am thankful). I will focus on them tonight, as 
much as I can (there are other tickets calling for my attention too), and then 
get in touch on our idea to publish this finding. Thank you a very 
inspirational question that you have raised here in relation to the Lindy 
Effect!

TASK DETAIL
  https://phabricator.wikimedia.org/T282563

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: GoranSMilovanovic
Cc: Pablo, Mohammed_Sadat_WMDE, Tobi_WMDE_SW, MGerlach, awight, WMDE-leszek, 
Manuel, Lydia_Pintscher, Aklapper, Jan_Dittrich, Invadibot, maantietaja, 
Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, 
_jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
_______________________________________________
Wikidata-bugs mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to