[MediaWiki-commits] [Gerrit] analytics/limn-edit-data[master]: Report session funnel weekly instead of daily

2016-10-14 Thread Milimetric (Code Review)
Milimetric has submitted this change and it was merged.

Change subject: Report session funnel weekly instead of daily
..


Report session funnel weekly instead of daily

To save space, we aggregate the existing output weekly and remove any
unique sequence of actions that represent less than 0.08% of the total
for that week (because these would be hidden anyway in the most granular
view of the visualization since they would create slices that are too
small to occupy pixels).  Going forward, we query data weekly.  Sadly,
I couldn't find an efficient way to remove the "less than < 0.08%"
outliers and so we'd have to trim them later with the squish script if
we wanted.  Hopefully moving to Hadoop will make this query perform
better.

NOTE: before merging this, make sure to run the included python script
on the existing sessions.sql output.

Bug: T147492
Change-Id: Id59e97dc650213f34207cbaacf538642815db658
---
M edit/config.yaml
M edit/sessions.sql
A scripts/aggregate-and-filter-sessions.py
D static/adhoc.html
D static/funnel-prototype.html
D static/funnel-prototype.js
D static/stacked-bar-prototype.html
D static/stacked-bar-prototype.js
D static/util.js
9 files changed, 138 insertions(+), 1,271 deletions(-)

Approvals:
  Milimetric: Verified; Looks good to me, approved



diff --git a/edit/config.yaml b/edit/config.yaml
index 7932197..8675c5d 100644
--- a/edit/config.yaml
+++ b/edit/config.yaml
@@ -11,7 +11,7 @@
 reports:
 
 sessions:
-granularity: days
+granularity: weeks
 funnel: true
 starts: 2015-04-01
 explode_by:
diff --git a/edit/sessions.sql b/edit/sessions.sql
index f90aeb5..130f090 100644
--- a/edit/sessions.sql
+++ b/edit/sessions.sql
@@ -1,70 +1,55 @@
- select day,
+ select date('{from_timestamp}') as day,
 actions,
 count(*) as repeated
-   from (
-  select
- date(timestamp) as day,
- group_concat(event_action order by timestamp, action_order.ord 
separator '-') as actions
-  from (
- select
-timestamp,
-event_action,
-clientIp,
-event_editingSessionId
- from Edit_11448630
- where
-event_editor = '{editor}' and
-('{wiki}' = 'all' or wiki = '{wiki}') and
-timestamp >= '{from_timestamp}' and
-timestamp < '{to_timestamp}'
+   from (select group_concat(event_action order by timestamp, action_order.ord 
separator '-') as actions
+   from (select timestamp,
+event_action,
+clientIp,
+event_editingSessionId
 
- # Add in events using the new schema. To be removed once everything
- # is switched over.
- union all
+   from Edit_13457736
+# NOTE: used to union all with Edit_11448630
+# That schema is still used apparently, but very little
 
- select
-timestamp,
-event_action,
-clientIp,
-event_editingSessionId
- from Edit_13457736
- where
-event_editor = '{editor}' and
-('{wiki}' = 'all' or wiki = '{wiki}') and
-timestamp >= '{from_timestamp}' and
-timestamp < '{to_timestamp}'
- ) raw_events
+  where event_editor = '{editor}'
+and ('{wiki}' = 'all' or wiki = '{wiki}')
+and timestamp >= '{from_timestamp}'
+and timestamp < '{to_timestamp}'
+) raw_events
 
- inner join
-(select 'init' as action, 0 as ord
-union all
- select 'ready' as action, 1 as ord
-union all
- select 'saveIntent' as action, 2 as ord
-union all
- select 'saveAttempt' as action, 3 as ord
-union all
- select 'saveFailure' as action, 4 as ord
-union all
- select 'saveSuccess' as action, 5 as ord
-union all
- select 'abort' as action, 6 as ord
-) action_order
- on action = event_action
+inner join
+
+(select 'init' as action, 0 as ord
+union all
+ select 'ready' as action, 1 as ord
+union all
+ select 'saveIntent' as action, 2 as ord
+union all
+ select 'saveAttempt' as action, 3 as ord
+union all
+ select 'saveFailure' as action, 4 as ord
+union all
+ select 'saveSuccess' as action, 5 as ord
+union all
+ select 'abort' as action, 6 as ord
+
+) action_order  on action = event_action
 
  # client side, the ip is set but 

[MediaWiki-commits] [Gerrit] analytics/limn-edit-data[master]: Report session funnel weekly instead of daily

2016-10-13 Thread Milimetric (Code Review)
Milimetric has uploaded a new change for review.

  https://gerrit.wikimedia.org/r/315829

Change subject: Report session funnel weekly instead of daily
..

Report session funnel weekly instead of daily

To save space, we aggregate the existing output weekly and remove any
unique sequence of actions that represent less than 0.08% of the total
for that week (because these would be hidden anyway in the most granular
view of the visualization since they would create slices that are too
small to occupy pixels).  Going forward, we query data weekly.  Sadly,
I couldn't find an efficient way to remove the "less than < 0.08%"
outliers and so we'd have to trim them later with the squish script if
we wanted.  Hopefully moving to Hadoop will make this query perform
better.

NOTE: before merging this, make sure to run the included python script
on the existing sessions.sql output.

Bug: T147492
Change-Id: Id59e97dc650213f34207cbaacf538642815db658
---
M edit/config.yaml
M edit/sessions.sql
A scripts/aggregate-and-filter-sessions.py
D static/adhoc.html
D static/funnel-prototype.html
D static/funnel-prototype.js
D static/stacked-bar-prototype.html
D static/stacked-bar-prototype.js
D static/util.js
9 files changed, 123 insertions(+), 1,271 deletions(-)


  git pull ssh://gerrit.wikimedia.org:29418/analytics/limn-edit-data 
refs/changes/29/315829/1

diff --git a/edit/config.yaml b/edit/config.yaml
index 7932197..8675c5d 100644
--- a/edit/config.yaml
+++ b/edit/config.yaml
@@ -11,7 +11,7 @@
 reports:
 
 sessions:
-granularity: days
+granularity: weeks
 funnel: true
 starts: 2015-04-01
 explode_by:
diff --git a/edit/sessions.sql b/edit/sessions.sql
index f90aeb5..130f090 100644
--- a/edit/sessions.sql
+++ b/edit/sessions.sql
@@ -1,70 +1,55 @@
- select day,
+ select date('{from_timestamp}') as day,
 actions,
 count(*) as repeated
-   from (
-  select
- date(timestamp) as day,
- group_concat(event_action order by timestamp, action_order.ord 
separator '-') as actions
-  from (
- select
-timestamp,
-event_action,
-clientIp,
-event_editingSessionId
- from Edit_11448630
- where
-event_editor = '{editor}' and
-('{wiki}' = 'all' or wiki = '{wiki}') and
-timestamp >= '{from_timestamp}' and
-timestamp < '{to_timestamp}'
+   from (select group_concat(event_action order by timestamp, action_order.ord 
separator '-') as actions
+   from (select timestamp,
+event_action,
+clientIp,
+event_editingSessionId
 
- # Add in events using the new schema. To be removed once everything
- # is switched over.
- union all
+   from Edit_13457736
+# NOTE: used to union all with Edit_11448630
+# That schema is still used apparently, but very little
 
- select
-timestamp,
-event_action,
-clientIp,
-event_editingSessionId
- from Edit_13457736
- where
-event_editor = '{editor}' and
-('{wiki}' = 'all' or wiki = '{wiki}') and
-timestamp >= '{from_timestamp}' and
-timestamp < '{to_timestamp}'
- ) raw_events
+  where event_editor = '{editor}'
+and ('{wiki}' = 'all' or wiki = '{wiki}')
+and timestamp >= '{from_timestamp}'
+and timestamp < '{to_timestamp}'
+) raw_events
 
- inner join
-(select 'init' as action, 0 as ord
-union all
- select 'ready' as action, 1 as ord
-union all
- select 'saveIntent' as action, 2 as ord
-union all
- select 'saveAttempt' as action, 3 as ord
-union all
- select 'saveFailure' as action, 4 as ord
-union all
- select 'saveSuccess' as action, 5 as ord
-union all
- select 'abort' as action, 6 as ord
-) action_order
- on action = event_action
+inner join
+
+(select 'init' as action, 0 as ord
+union all
+ select 'ready' as action, 1 as ord
+union all
+ select 'saveIntent' as action, 2 as ord
+union all
+ select 'saveAttempt' as action, 3 as ord
+union all
+ select 'saveFailure' as action, 4 as ord
+union all
+ select 'saveSuccess' as action, 5 as ord
+union all
+ select 'abort' as action, 6 as ord
+
+) action_order  on