[MediaWiki-commits] [Gerrit] analytics/limn-edit-data[master]: Report session funnel weekly instead of daily
Milimetric has submitted this change and it was merged. Change subject: Report session funnel weekly instead of daily .. Report session funnel weekly instead of daily To save space, we aggregate the existing output weekly and remove any unique sequence of actions that represent less than 0.08% of the total for that week (because these would be hidden anyway in the most granular view of the visualization since they would create slices that are too small to occupy pixels). Going forward, we query data weekly. Sadly, I couldn't find an efficient way to remove the "less than < 0.08%" outliers and so we'd have to trim them later with the squish script if we wanted. Hopefully moving to Hadoop will make this query perform better. NOTE: before merging this, make sure to run the included python script on the existing sessions.sql output. Bug: T147492 Change-Id: Id59e97dc650213f34207cbaacf538642815db658 --- M edit/config.yaml M edit/sessions.sql A scripts/aggregate-and-filter-sessions.py D static/adhoc.html D static/funnel-prototype.html D static/funnel-prototype.js D static/stacked-bar-prototype.html D static/stacked-bar-prototype.js D static/util.js 9 files changed, 138 insertions(+), 1,271 deletions(-) Approvals: Milimetric: Verified; Looks good to me, approved diff --git a/edit/config.yaml b/edit/config.yaml index 7932197..8675c5d 100644 --- a/edit/config.yaml +++ b/edit/config.yaml @@ -11,7 +11,7 @@ reports: sessions: -granularity: days +granularity: weeks funnel: true starts: 2015-04-01 explode_by: diff --git a/edit/sessions.sql b/edit/sessions.sql index f90aeb5..130f090 100644 --- a/edit/sessions.sql +++ b/edit/sessions.sql @@ -1,70 +1,55 @@ - select day, + select date('{from_timestamp}') as day, actions, count(*) as repeated - from ( - select - date(timestamp) as day, - group_concat(event_action order by timestamp, action_order.ord separator '-') as actions - from ( - select -timestamp, -event_action, -clientIp, -event_editingSessionId - from Edit_11448630 - where -event_editor = '{editor}' and -('{wiki}' = 'all' or wiki = '{wiki}') and -timestamp >= '{from_timestamp}' and -timestamp < '{to_timestamp}' + from (select group_concat(event_action order by timestamp, action_order.ord separator '-') as actions + from (select timestamp, +event_action, +clientIp, +event_editingSessionId - # Add in events using the new schema. To be removed once everything - # is switched over. - union all + from Edit_13457736 +# NOTE: used to union all with Edit_11448630 +# That schema is still used apparently, but very little - select -timestamp, -event_action, -clientIp, -event_editingSessionId - from Edit_13457736 - where -event_editor = '{editor}' and -('{wiki}' = 'all' or wiki = '{wiki}') and -timestamp >= '{from_timestamp}' and -timestamp < '{to_timestamp}' - ) raw_events + where event_editor = '{editor}' +and ('{wiki}' = 'all' or wiki = '{wiki}') +and timestamp >= '{from_timestamp}' +and timestamp < '{to_timestamp}' +) raw_events - inner join -(select 'init' as action, 0 as ord -union all - select 'ready' as action, 1 as ord -union all - select 'saveIntent' as action, 2 as ord -union all - select 'saveAttempt' as action, 3 as ord -union all - select 'saveFailure' as action, 4 as ord -union all - select 'saveSuccess' as action, 5 as ord -union all - select 'abort' as action, 6 as ord -) action_order - on action = event_action +inner join + +(select 'init' as action, 0 as ord +union all + select 'ready' as action, 1 as ord +union all + select 'saveIntent' as action, 2 as ord +union all + select 'saveAttempt' as action, 3 as ord +union all + select 'saveFailure' as action, 4 as ord +union all + select 'saveSuccess' as action, 5 as ord +union all + select 'abort' as action, 6 as ord + +) action_order on action = event_action # client side, the ip is set but
[MediaWiki-commits] [Gerrit] analytics/limn-edit-data[master]: Report session funnel weekly instead of daily
Milimetric has uploaded a new change for review. https://gerrit.wikimedia.org/r/315829 Change subject: Report session funnel weekly instead of daily .. Report session funnel weekly instead of daily To save space, we aggregate the existing output weekly and remove any unique sequence of actions that represent less than 0.08% of the total for that week (because these would be hidden anyway in the most granular view of the visualization since they would create slices that are too small to occupy pixels). Going forward, we query data weekly. Sadly, I couldn't find an efficient way to remove the "less than < 0.08%" outliers and so we'd have to trim them later with the squish script if we wanted. Hopefully moving to Hadoop will make this query perform better. NOTE: before merging this, make sure to run the included python script on the existing sessions.sql output. Bug: T147492 Change-Id: Id59e97dc650213f34207cbaacf538642815db658 --- M edit/config.yaml M edit/sessions.sql A scripts/aggregate-and-filter-sessions.py D static/adhoc.html D static/funnel-prototype.html D static/funnel-prototype.js D static/stacked-bar-prototype.html D static/stacked-bar-prototype.js D static/util.js 9 files changed, 123 insertions(+), 1,271 deletions(-) git pull ssh://gerrit.wikimedia.org:29418/analytics/limn-edit-data refs/changes/29/315829/1 diff --git a/edit/config.yaml b/edit/config.yaml index 7932197..8675c5d 100644 --- a/edit/config.yaml +++ b/edit/config.yaml @@ -11,7 +11,7 @@ reports: sessions: -granularity: days +granularity: weeks funnel: true starts: 2015-04-01 explode_by: diff --git a/edit/sessions.sql b/edit/sessions.sql index f90aeb5..130f090 100644 --- a/edit/sessions.sql +++ b/edit/sessions.sql @@ -1,70 +1,55 @@ - select day, + select date('{from_timestamp}') as day, actions, count(*) as repeated - from ( - select - date(timestamp) as day, - group_concat(event_action order by timestamp, action_order.ord separator '-') as actions - from ( - select -timestamp, -event_action, -clientIp, -event_editingSessionId - from Edit_11448630 - where -event_editor = '{editor}' and -('{wiki}' = 'all' or wiki = '{wiki}') and -timestamp >= '{from_timestamp}' and -timestamp < '{to_timestamp}' + from (select group_concat(event_action order by timestamp, action_order.ord separator '-') as actions + from (select timestamp, +event_action, +clientIp, +event_editingSessionId - # Add in events using the new schema. To be removed once everything - # is switched over. - union all + from Edit_13457736 +# NOTE: used to union all with Edit_11448630 +# That schema is still used apparently, but very little - select -timestamp, -event_action, -clientIp, -event_editingSessionId - from Edit_13457736 - where -event_editor = '{editor}' and -('{wiki}' = 'all' or wiki = '{wiki}') and -timestamp >= '{from_timestamp}' and -timestamp < '{to_timestamp}' - ) raw_events + where event_editor = '{editor}' +and ('{wiki}' = 'all' or wiki = '{wiki}') +and timestamp >= '{from_timestamp}' +and timestamp < '{to_timestamp}' +) raw_events - inner join -(select 'init' as action, 0 as ord -union all - select 'ready' as action, 1 as ord -union all - select 'saveIntent' as action, 2 as ord -union all - select 'saveAttempt' as action, 3 as ord -union all - select 'saveFailure' as action, 4 as ord -union all - select 'saveSuccess' as action, 5 as ord -union all - select 'abort' as action, 6 as ord -) action_order - on action = event_action +inner join + +(select 'init' as action, 0 as ord +union all + select 'ready' as action, 1 as ord +union all + select 'saveIntent' as action, 2 as ord +union all + select 'saveAttempt' as action, 3 as ord +union all + select 'saveFailure' as action, 4 as ord +union all + select 'saveSuccess' as action, 5 as ord +union all + select 'abort' as action, 6 as ord + +) action_order on