Re: [Wikitech-l] [Analytics] [WikimediaMobile] Mobile stats

2013-09-20 Thread Andrew Otto
Oh awesome!  Glad y'all found it!


On Sep 19, 2013, at 5:01 PM, Adam Baso ab...@wikimedia.org wrote:

 +Analytics
 
 
 On Thu, Sep 19, 2013 at 1:57 PM, Adam Baso ab...@wikimedia.org wrote:
 A run on yesterday's valid Wikipedia Zero hits showed that user agents NOT 
 supporting HTML (i.e., only supporting WAP) is only 0.098 - 0.108 *percent*.
 
 Assuming a bunch of complaints don't come in (e.g., I'm getting tag soup!, 
 as Max might say), I think we could make a reasonable case to stop supporting 
 WAP through the formal channels (blog, mailing list(s), etc.).
 
 -Adam
 
 
 On Tue, Sep 17, 2013 at 1:11 PM, Arthur Richards aricha...@wikimedia.org 
 wrote:
 That's awesome - thanks Max and Adam; it's great to see the last vestiges of 
 X-Device finally disappear!
 
 
 On Tue, Sep 17, 2013 at 1:07 PM, Max Semenik maxsem.w...@gmail.com wrote:
 After looking at Varnish VCL with Adam, we discovered a bug in regex 
 resulting in many phones being detected as WAP when they shouldn't be. Since 
 the older change[1] simplifying detection had also fixed this bug, Brandon 
 Black deployed it and since today the usage share of WAP should seriously 
 drop. We will be monitoring the situation and revisit the issue of WAP 
 popularity once we have enough data.
 
 [1] https://gerrit.wikimedia.org/r/83919
 
 On Tue, Sep 10, 2013 at 4:39 PM, Adam Baso ab...@wikimedia.org wrote:
 Thanks. 7-9% of responses on Wikipedia Zero being WAP is pretty substantial.
 
 
 On Tue, Sep 10, 2013 at 2:01 PM, Andrew Otto o...@wikimedia.org wrote:
  These
  zero.tsv.log*
  files to which I refer seem to be, basically Varnish log lines that
  correspond to Wikipedia Zero-targeted traffic.
 Yup!  Correct.  zero.tsv.log* files are captured unsampled and based on the 
 presence of a zero= tag in the X-Analytics header:
 
 http://git.wikimedia.org/blob/operations%2Fpuppet.git/37ffb0ccc1cd7d3f5612df8779e9a3bdb69066b2/templates%2Fudp2log%2Ffilters.oxygen.erb#L10
 
  Do I understand correctly that field as Content-Type?
 Yup again!  The varnishncsa format string that is currently being beamed at 
 udp2log is here:
 
 http://git.wikimedia.org/blob/operations%2Fpuppet.git/37ffb0ccc1cd7d3f5612df8779e9a3bdb69066b2/modules%2Fvarnish%2Ffiles%2Fvarnishncsa.default
 
 
 -- 
 Best regards,
 Max Semenik ([[User:MaxSem]])
 
 ___
 Mobile-l mailing list
 mobil...@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/mobile-l
 
 
 
 
 -- 
 Arthur Richards
 Software Engineer, Mobile
 [[User:Awjrichards]]
 IRC: awjr
 +1-415-839-6885 x6687
 
 
 ___
 Analytics mailing list
 analyt...@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/analytics

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Analytics] [WikimediaMobile] Mobile stats

2013-09-05 Thread Max Semenik
On 05.09.2013, 4:04 Diederik wrote:

 Heya,
 I would suggest to at least run it for a 7 day period so you
 capture at least the weekly time-trends, increasing the sample size
 should also be recommendable. We can help setup a udp-filter for
 this purpose as long as the data can be extracted from the user-agent string.

Unfortunately, accept is no less important here.
So, to enumerate our requirements as a result of this thread:
* Sampling rate the same as wikistats (1/1000).
* No less than a week worth of data.
* User-agent:
* Accept:
* Country from GeoIP to determine the share of developing countries.
* Wiki to determine if some wikis are more dependant on WAP than other
  ones.

Anything else?

-- 
Best regards,
  Max Semenik ([[User:MaxSem]])


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Analytics] [WikimediaMobile] Mobile stats

2013-09-05 Thread Erik Zachte
For a breakdown per country, the higher the sampling rate the better, as the 
data will become reliable even for smaller countries with a not so great 
adoption rate of Wikipedia.

-Original Message-
From: analytics-boun...@lists.wikimedia.org 
[mailto:analytics-boun...@lists.wikimedia.org] On Behalf Of Max Semenik
Sent: Thursday, September 05, 2013 12:28 PM
To: Diederik van Liere
Cc: A mailing list for the Analytics Team at WMF and everybody who has an 
interest in Wikipedia and analytics.; mobile-l; Wikimedia developers
Subject: Re: [Analytics] [WikimediaMobile] Mobile stats

On 05.09.2013, 4:04 Diederik wrote:

 Heya,
 I would suggest to at least run it for a 7 day period so you capture 
 at least the weekly time-trends, increasing the sample size should 
 also be recommendable. We can help setup a udp-filter for this purpose 
 as long as the data can be extracted from the user-agent string.

Unfortunately, accept is no less important here.
So, to enumerate our requirements as a result of this thread:
* Sampling rate the same as wikistats (1/1000).
* No less than a week worth of data.
* User-agent:
* Accept:
* Country from GeoIP to determine the share of developing countries.
* Wiki to determine if some wikis are more dependant on WAP than other
  ones.

Anything else?

--
Best regards,
  Max Semenik ([[User:MaxSem]])


___
Analytics mailing list
analyt...@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Analytics] [WikimediaMobile] Mobile stats

2013-09-04 Thread Diederik van Liere
Heya,
I would suggest to at least run it for a 7 day period so you capture at
least the weekly time-trends, increasing the sample size should also be
recommendable. We can help setup a udp-filter for this purpose as long as
the data can be extracted from the user-agent string.

D
On Wed, Sep 4, 2013 at 1:50 PM, Arthur Richards aricha...@wikimedia.orgwrote:

 Thanks Max for digging into this :)

 I'm no analytics guy, but I am a little concerned about the sample size
 and duration of the internal logging that we've done - sampling 1/1 for
 only a few days for data about something we generally know usage to already
 be low seems to me like it might be difficult to get accurate numbers. Can
 someone from the analytics team chime in and let us know if the approach is
 sound and if we should trust the data Max has come up with? This has big
 implications as it will play role in determining whether or not we continue
 supporting WAP devices and providing WAP access to the sites.

 Thanks everyone!


 On Tue, Sep 3, 2013 at 10:40 AM, Erik Zachte ezac...@wikimedia.orgwrote:

 Sadly you need to take squid log based reports with a grain of salt.
 Several incomplete maintenance jobs have taken their toll.

 Each report starts with a long list of unsolved bugs.
 Among those https://bugzilla.wikimedia.org/show_bug.cgi?id=46273

 So yeah better trust your own data.

 Erik


 -Original Message-
 From: analytics-boun...@lists.wikimedia.org [mailto:
 analytics-boun...@lists.wikimedia.org] On Behalf Of Max Semenik
 Sent: Tuesday, September 03, 2013 5:33 PM
 To: analyt...@lists.wikimedia.org; Wikimedia developers; mobile-l
 Subject: [Analytics] Mobile stats

 Hi, I have a few questions regarding mobile stats.

 I need to determine a real percentage of WAP browsers. At first glance,
 [1] looks interesting: ratio of text/html to text/vnd.wap.wml is 92M /
 3987M = 2.3% on m.wikipedia.org. However, this contradicts the stats at
 [2] which have different numbers and a different ratio.

 I did my own research: because during browser detection in Varnish
 WAPness is detected mostly by looking at accept header and because our
 current analytics infrastructure doesn't log it, I quickly whipped up a
 code that recorded user-agent and accept of every 10,000th request for
 mobile page views hitting apaches.

 According to several days worth of data, out of 14917 logged requests
 1445 contained vnd.wap.wml in Accept: headers in any form. That's more
 than what is logged for frontend responses, however it is expected as WAP
 should have worse cache hit rate and thus should hit apaches more often.

 Next, our WAP detection code is very simple: user-agent is checked
 against a few major browser IDs (all of them are HTML-capable and this
 check is not actually needed anymore and will go away soon) and if still
 not known, we consider every device that sends Accept:
 header vnd.wap.wml (but not application/vnd.wap.xhtml+xml), to be
 WAP-only. If we apply these rules, we get only 68 entries that qualify as
 WAP which is 0.05% of all mobile requests.

 The question is, what's wrong: my research or stats.wikimedia.org?

 And if it's indeed just 0.05%, we should probably^W definitely kill WAP
 support on our mobile site as it's virtually unmaintained.

 -
 [1] http://stats.wikimedia.org/wikimedia/squids/SquidReportRequests.htm
 [2] http://stats.wikimedia.org/wikimedia/squids/SquidReportClients.htm



 --
 Best regards,
   Max Semenik ([[User:MaxSem]])


 ___
 Analytics mailing list
 analyt...@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/analytics


 ___
 Mobile-l mailing list
 mobil...@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/mobile-l




 --
 Arthur Richards
 Software Engineer, Mobile
 [[User:Awjrichards]]
 IRC: awjr
 +1-415-839-6885 x6687

 ___
 Analytics mailing list
 analyt...@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/analytics


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l