GoranSMilovanovic added a comment.
@Jan_Dittrich Please find the analytics dataset attached. Columns: - **userId**: the anonymized Wikidata user Id - **registrationYM**: the `YYYY-MM` timestamp of user registration on Wikidata - **revisionYM**: the `YYYY-MM` timestamp of user revisions on Wikidata - **revisions**: the count of revisions made in the **revisionYM** month. **Next steps:** - We will proceed to test the Lindy effect for Wikidata by (a) calculating the difference in months since user registration and revisions, and (b) searching for pauses in editing behavior. - All hypotheses/research questions will be addressed from the derived time lags between user revisions and registrations. F34458261: WD_UserRetention.csv <https://phabricator.wikimedia.org/F34458261> **Notes.** - Bot and anonymous revisions were filtered out. - Only item (0), property (120), and lexeme (146) namespaces are taken into account. The ETL was performed in HiveQL from wmf.mediawiki_history's <https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Edits/MediaWiki_history> current snapshot (and that would be `2021-04`). Here's the query: USE wmf; SELECT event_user_id, event_user_registration_timestamp, substring(event_timestamp, 1, 4) AS year, substring(event_timestamp, 6, 2) AS month, COUNT(*) AS revisions FROM mediawiki_history WHERE ( event_entity = \'revision\' AND event_type = \'create\' AND wiki_db = \'wikidatawiki\' AND event_user_is_anonymous = FALSE AND NOT ARRAY_CONTAINS(event_user_is_bot_by, \'name\') AND NOT ARRAY_CONTAINS(event_user_is_bot_by, \'group\') AND NOT ARRAY_CONTAINS(event_user_is_bot_by_historical, \'name\') AND NOT ARRAY_CONTAINS(event_user_is_bot_by_historical, \'group\') AND NOT ARRAY_CONTAINS(event_user_groups, \'bot\') AND NOT ARRAY_CONTAINS(event_user_groups_historical, \'bot\') AND event_user_id != 0 AND page_is_redirect = FALSE AND revision_is_deleted_by_page_deletion = FALSE AND (page_namespace = 1 OR page_namespace = 120 OR page_namespace = 146) AND snapshot = \'2021-04\' ) GROUP BY event_user_id, event_user_registration_timestamp, substring(event_timestamp, 1, 4), substring(event_timestamp, 6, 2); TASK DETAIL https://phabricator.wikimedia.org/T282563 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: GoranSMilovanovic Cc: Manuel, Lydia_Pintscher, Aklapper, Jan_Dittrich, Invadibot, maantietaja, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
_______________________________________________ Wikidata-bugs mailing list -- [email protected] To unsubscribe send an email to [email protected]
