GoranSMilovanovic added a comment.

  @Jan_Dittrich
  
  Please find the analytics dataset attached.
  
  Columns:
  
  - **userId**: the anonymized Wikidata user Id
  - **registrationYM**: the `YYYY-MM` timestamp of user registration on Wikidata
  - **revisionYM**: the `YYYY-MM` timestamp of user revisions on Wikidata
  - **revisions**: the count of revisions made in the **revisionYM** month.
  
  **Next steps:**
  
  - We will proceed to test the Lindy effect for Wikidata by (a) calculating 
the difference in months since user registration and revisions, and (b) 
searching for pauses in editing behavior.
  - All hypotheses/research questions will be addressed from the derived time 
lags between user revisions and registrations.
  
  F34458261: WD_UserRetention.csv <https://phabricator.wikimedia.org/F34458261>
  
  **Notes.**
  
  - Bot and anonymous revisions were filtered out.
  - Only item (0), property (120), and lexeme (146) namespaces are taken into 
account.
  
  The ETL was performed in HiveQL from wmf.mediawiki_history's 
<https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Edits/MediaWiki_history>
 current snapshot (and that would be `2021-04`). Here's the query:
  
    USE wmf; 
              SELECT 
                event_user_id, event_user_registration_timestamp, 
                substring(event_timestamp, 1, 4) AS year, 
                substring(event_timestamp, 6, 2) AS month, 
                COUNT(*) AS revisions FROM mediawiki_history 
              WHERE (
                event_entity = \'revision\' AND 
                event_type = \'create\' AND 
                wiki_db = \'wikidatawiki\' AND 
                event_user_is_anonymous = FALSE AND 
                NOT ARRAY_CONTAINS(event_user_is_bot_by, \'name\') AND 
                NOT ARRAY_CONTAINS(event_user_is_bot_by, \'group\') AND 
                NOT ARRAY_CONTAINS(event_user_is_bot_by_historical, \'name\') 
AND 
                NOT ARRAY_CONTAINS(event_user_is_bot_by_historical, \'group\') 
AND 
                NOT ARRAY_CONTAINS(event_user_groups, \'bot\') AND 
                NOT ARRAY_CONTAINS(event_user_groups_historical, \'bot\') AND 
                event_user_id != 0 AND 
                page_is_redirect = FALSE AND 
                revision_is_deleted_by_page_deletion = FALSE AND 
                (page_namespace = 1 OR page_namespace = 120 OR page_namespace = 
146) AND 
                snapshot = \'2021-04\'
              ) 
              GROUP BY 
                event_user_id, 
                event_user_registration_timestamp, 
                substring(event_timestamp, 1, 4), 
                substring(event_timestamp, 6, 2);

TASK DETAIL
  https://phabricator.wikimedia.org/T282563

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: GoranSMilovanovic
Cc: Manuel, Lydia_Pintscher, Aklapper, Jan_Dittrich, Invadibot, maantietaja, 
Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, 
_jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
_______________________________________________
Wikidata-bugs mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to