Re: [Wikitech-l] [Engineering] RFC: Introducing two new HTTP headers to track mobile pageviews

2013-02-15 Thread Asher Feldman
Just to tie this thread up - the issue of how to count ajax driven
pageviews loaded from the api and of how to differentiate those requests
from secondary api page requests has been resolved without the need for
code or logging changes.

Tagging of the mobile beta site will be accomplished via a new generic
mediawiki http response header dedicated to logging containing key value
pairs.

-Asher

On Tue, Feb 12, 2013 at 9:56 AM, Asher Feldman afeld...@wikimedia.orgwrote:

 On Tuesday, February 12, 2013, Diederik van Liere wrote:

  It does still seem to me that the data to determine secondary api
 requests
  should already be present in the existing log line. If the value of the
  page param in an action=mobileview api request matches the page in the
  referrer (perhaps with normalization), it's a secondary request as per
 case
  1 below.  Otherwise, it's a pageview as per case 2.  Difficult or
 expensive
  to reconcile?  Not when you're doing distributed log analysis via
 hadoop.
 
 So I did look into this prior to writing the RFC and the issue is that a
 lot of API referrers don't contain the querystring. I don't know what
 triggers this so if we can fix this then we can definitely derive the
 secondary pageview request from the referrer field.
 D


 If you can point me to some examples, I'll see if I can find any insights
 into the behavior.



  On Mon, Feb 11, 2013 at 7:11 PM, Arthur Richards 
 aricha...@wikimedia.org
  wrote:
 
   Thanks, Jon. To try and clarify a bit more about the API requests...
 they
   are not made on a per-section basis. As I mentioned earlier, there are
  two
   cases in which article content gets loaded by the API:
  
   1) Going directly to a page (eg clicking a link from a Google search)
  will
   result in the backend serving a page with ONLY summary section content
  and
   section headers. The rest of the page is lazily loaded via API request
  once
   the JS for the page gets loaded. The idea is to increase
 responsiveness
  by
   reducing the delay for an article to load (further details in the
 article
   Jon previously linked to). The API request looks like:
  
  
 
 http://en.m.wikipedia.org/w/api.php?format=jsonaction=mobileviewpage=Liverpool+F.C.+in+European+footballvariant=enredirects=yesprop=sections%7Ctextnoheadings=yessectionprop=level%7Cline%7Canchorsections=all
  
   2) Loading an article entirely via Javascript - like when a link is
  clicked
   in an article to another article, or an article is loaded via search.
  This
   will make ONE call to the API to load article content. API request
 looks
   like:
  
  
 
 http://en.m.wikipedia.org/w/api.php?format=jsonaction=mobileviewpage=Liverpool+F.C.+in+European+footballvariant=enredirects=yesprop=sections%7Ctextnoheadings=yessectionprop=level%7Cline%7Canchorsections=all
  
   These API requests are identical, but only #2 should be counted as a
   'pageview' - #1 is a secondary API request and should not be counted
 as a
   'pageview'. You could make the argument that we just count all of
 these
  API
   requests as pageviews, but there are cases when we can't load article
   content from the API (like devices that do not support JS), so we
 need to
   be able to count the traditional page request as a pageview - thus we
  need
   a way to differentiate the types of API requests being made when they
   otherwise share the same URL.
  
  
  
   On Mon, Feb 11, 2013 at 6:42 PM, Jon Robson jdlrob...@gmail.com
 wrote:
  
I'm a bit worried that now we are asking why pages are lazy loaded
rather than focusing on the fact that they currently __are doing
this___ and how we can log these (if we want to discuss this further
let's start another thread as I'm getting extremely confused doing
 so
on this one).
   
Lazy loading sections

For motivation behind moving MobileFrontend into the direction of
 lazy
loading section content and subsequent pages can be found here [1],
 I
just gave it a refresh as it was a little out of date.
   
In summary the reason is to
1) make the app feel more responsive by simply loading content
 rather
than reloading the entire interface
2) reducing the payload sent to a device.
   
Session Tracking

   
Going back to the discussion of tracking mobile page views, it
 sounds
like a header stating whether a page is being viewed in alpha, beta
 or
stable works fine for standard page views.
   
As for the situations where an entire page is loaded via the api it
makes no dif


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Engineering] RFC: Introducing two new HTTP headers to track mobile pageviews

2013-02-15 Thread Diederik van Liere
Thanks Asher for tying this up! I was about to write a similar email :)
One final question, just to make sure we are all on the same page: is the
X-CS field becoming a generic key/value pair for tracking purposes?

D


On Fri, Feb 15, 2013 at 11:16 AM, Asher Feldman afeld...@wikimedia.orgwrote:

 Just to tie this thread up - the issue of how to count ajax driven
 pageviews loaded from the api and of how to differentiate those requests
 from secondary api page requests has been resolved without the need for
 code or logging changes.

 Tagging of the mobile beta site will be accomplished via a new generic
 mediawiki http response header dedicated to logging containing key value
 pairs.

 -Asher

 On Tue, Feb 12, 2013 at 9:56 AM, Asher Feldman afeld...@wikimedia.org
 wrote:

  On Tuesday, February 12, 2013, Diederik van Liere wrote:
 
   It does still seem to me that the data to determine secondary api
  requests
   should already be present in the existing log line. If the value of
 the
   page param in an action=mobileview api request matches the page in the
   referrer (perhaps with normalization), it's a secondary request as per
  case
   1 below.  Otherwise, it's a pageview as per case 2.  Difficult or
  expensive
   to reconcile?  Not when you're doing distributed log analysis via
  hadoop.
  
  So I did look into this prior to writing the RFC and the issue is that a
  lot of API referrers don't contain the querystring. I don't know what
  triggers this so if we can fix this then we can definitely derive the
  secondary pageview request from the referrer field.
  D
 
 
  If you can point me to some examples, I'll see if I can find any insights
  into the behavior.
 
 
 
   On Mon, Feb 11, 2013 at 7:11 PM, Arthur Richards 
  aricha...@wikimedia.org
   wrote:
  
Thanks, Jon. To try and clarify a bit more about the API requests...
  they
are not made on a per-section basis. As I mentioned earlier, there
 are
   two
cases in which article content gets loaded by the API:
   
1) Going directly to a page (eg clicking a link from a Google
 search)
   will
result in the backend serving a page with ONLY summary section
 content
   and
section headers. The rest of the page is lazily loaded via API
 request
   once
the JS for the page gets loaded. The idea is to increase
  responsiveness
   by
reducing the delay for an article to load (further details in the
  article
Jon previously linked to). The API request looks like:
   
   
  
 
 http://en.m.wikipedia.org/w/api.php?format=jsonaction=mobileviewpage=Liverpool+F.C.+in+European+footballvariant=enredirects=yesprop=sections%7Ctextnoheadings=yessectionprop=level%7Cline%7Canchorsections=all
   
2) Loading an article entirely via Javascript - like when a link is
   clicked
in an article to another article, or an article is loaded via
 search.
   This
will make ONE call to the API to load article content. API request
  looks
like:
   
   
  
 
 http://en.m.wikipedia.org/w/api.php?format=jsonaction=mobileviewpage=Liverpool+F.C.+in+European+footballvariant=enredirects=yesprop=sections%7Ctextnoheadings=yessectionprop=level%7Cline%7Canchorsections=all
   
These API requests are identical, but only #2 should be counted as a
'pageview' - #1 is a secondary API request and should not be counted
  as a
'pageview'. You could make the argument that we just count all of
  these
   API
requests as pageviews, but there are cases when we can't load
 article
content from the API (like devices that do not support JS), so we
  need to
be able to count the traditional page request as a pageview - thus
 we
   need
a way to differentiate the types of API requests being made when
 they
otherwise share the same URL.
   
   
   
On Mon, Feb 11, 2013 at 6:42 PM, Jon Robson jdlrob...@gmail.com
  wrote:
   
 I'm a bit worried that now we are asking why pages are lazy loaded
 rather than focusing on the fact that they currently __are doing
 this___ and how we can log these (if we want to discuss this
 further
 let's start another thread as I'm getting extremely confused doing
  so
 on this one).

 Lazy loading sections
 
 For motivation behind moving MobileFrontend into the direction of
  lazy
 loading section content and subsequent pages can be found here
 [1],
  I
 just gave it a refresh as it was a little out of date.

 In summary the reason is to
 1) make the app feel more responsive by simply loading content
  rather
 than reloading the entire interface
 2) reducing the payload sent to a device.

 Session Tracking
 

 Going back to the discussion of tracking mobile page views, it
  sounds
 like a header stating whether a page is being viewed in alpha,
 beta
  or
 stable works fine for standard page views.

 As for the situations where an entire page is loaded via the api
 it

Re: [Wikitech-l] [Engineering] RFC: Introducing two new HTTP headers to track mobile pageviews

2013-02-12 Thread Diederik van Liere
 It does still seem to me that the data to determine secondary api requests
 should already be present in the existing log line. If the value of the
 page param in an action=mobileview api request matches the page in the
 referrer (perhaps with normalization), it's a secondary request as per case
 1 below.  Otherwise, it's a pageview as per case 2.  Difficult or expensive
 to reconcile?  Not when you're doing distributed log analysis via hadoop.

So I did look into this prior to writing the RFC and the issue is that a
lot of API referrers don't contain the querystring. I don't know what
triggers this so if we can fix this then we can definitely derive the
secondary pageview request from the referrer field.
D



 On Mon, Feb 11, 2013 at 7:11 PM, Arthur Richards aricha...@wikimedia.org
 wrote:

  Thanks, Jon. To try and clarify a bit more about the API requests... they
  are not made on a per-section basis. As I mentioned earlier, there are
 two
  cases in which article content gets loaded by the API:
 
  1) Going directly to a page (eg clicking a link from a Google search)
 will
  result in the backend serving a page with ONLY summary section content
 and
  section headers. The rest of the page is lazily loaded via API request
 once
  the JS for the page gets loaded. The idea is to increase responsiveness
 by
  reducing the delay for an article to load (further details in the article
  Jon previously linked to). The API request looks like:
 
 
 http://en.m.wikipedia.org/w/api.php?format=jsonaction=mobileviewpage=Liverpool+F.C.+in+European+footballvariant=enredirects=yesprop=sections%7Ctextnoheadings=yessectionprop=level%7Cline%7Canchorsections=all
 
  2) Loading an article entirely via Javascript - like when a link is
 clicked
  in an article to another article, or an article is loaded via search.
 This
  will make ONE call to the API to load article content. API request looks
  like:
 
 
 http://en.m.wikipedia.org/w/api.php?format=jsonaction=mobileviewpage=Liverpool+F.C.+in+European+footballvariant=enredirects=yesprop=sections%7Ctextnoheadings=yessectionprop=level%7Cline%7Canchorsections=all
 
  These API requests are identical, but only #2 should be counted as a
  'pageview' - #1 is a secondary API request and should not be counted as a
  'pageview'. You could make the argument that we just count all of these
 API
  requests as pageviews, but there are cases when we can't load article
  content from the API (like devices that do not support JS), so we need to
  be able to count the traditional page request as a pageview - thus we
 need
  a way to differentiate the types of API requests being made when they
  otherwise share the same URL.
 
 
 
  On Mon, Feb 11, 2013 at 6:42 PM, Jon Robson jdlrob...@gmail.com wrote:
 
   I'm a bit worried that now we are asking why pages are lazy loaded
   rather than focusing on the fact that they currently __are doing
   this___ and how we can log these (if we want to discuss this further
   let's start another thread as I'm getting extremely confused doing so
   on this one).
  
   Lazy loading sections
   
   For motivation behind moving MobileFrontend into the direction of lazy
   loading section content and subsequent pages can be found here [1], I
   just gave it a refresh as it was a little out of date.
  
   In summary the reason is to
   1) make the app feel more responsive by simply loading content rather
   than reloading the entire interface
   2) reducing the payload sent to a device.
  
   Session Tracking
   
  
   Going back to the discussion of tracking mobile page views, it sounds
   like a header stating whether a page is being viewed in alpha, beta or
   stable works fine for standard page views.
  
   As for the situations where an entire page is loaded via the api it
   makes no difference to us to whether we
   1) send the same header (set via javascript) or
   2) add a query string parameter.
  
   The only advantage I can see of using a header is that an initial page
   load of the article San Francisco currently uses the same api url as a
   page load of the article San Francisco via javascript (e.g. I click a
   link to 'San Francisco' on the California article).
  
   In this new method they would use different urls (as the data sent is
   different). I'm not sure how that would effect caching.
  
   Let us know which method is preferred. From my perspective
   implementation of either is easy.
  
   [1] http://www.mediawiki.org/wiki/MobileFrontend/Dynamic_Sections
  
   On Mon, Feb 11, 2013 at 12:50 PM, Asher Feldman 
 afeld...@wikimedia.org
   wrote:
Max - good answers re: caching concerns.  That leaves studying if the
   bytes
transferred on average mobile article view increases or decreases
 with
   lazy
section loading.  If it increases, I'd say this isn't a positive
   direction
to go in and stop there.  If it decreases, then we should look at the
effect on total latency, number of 

Re: [Wikitech-l] [Engineering] RFC: Introducing two new HTTP headers to track mobile pageviews

2013-02-12 Thread Asher Feldman
On Tuesday, February 12, 2013, Diederik van Liere wrote:

  It does still seem to me that the data to determine secondary api
 requests
  should already be present in the existing log line. If the value of the
  page param in an action=mobileview api request matches the page in the
  referrer (perhaps with normalization), it's a secondary request as per
 case
  1 below.  Otherwise, it's a pageview as per case 2.  Difficult or
 expensive
  to reconcile?  Not when you're doing distributed log analysis via hadoop.
 
 So I did look into this prior to writing the RFC and the issue is that a
 lot of API referrers don't contain the querystring. I don't know what
 triggers this so if we can fix this then we can definitely derive the
 secondary pageview request from the referrer field.
 D


If you can point me to some examples, I'll see if I can find any insights
into the behavior.



  On Mon, Feb 11, 2013 at 7:11 PM, Arthur Richards 
 aricha...@wikimedia.org
  wrote:
 
   Thanks, Jon. To try and clarify a bit more about the API requests...
 they
   are not made on a per-section basis. As I mentioned earlier, there are
  two
   cases in which article content gets loaded by the API:
  
   1) Going directly to a page (eg clicking a link from a Google search)
  will
   result in the backend serving a page with ONLY summary section content
  and
   section headers. The rest of the page is lazily loaded via API request
  once
   the JS for the page gets loaded. The idea is to increase responsiveness
  by
   reducing the delay for an article to load (further details in the
 article
   Jon previously linked to). The API request looks like:
  
  
 
 http://en.m.wikipedia.org/w/api.php?format=jsonaction=mobileviewpage=Liverpool+F.C.+in+European+footballvariant=enredirects=yesprop=sections%7Ctextnoheadings=yessectionprop=level%7Cline%7Canchorsections=all
  
   2) Loading an article entirely via Javascript - like when a link is
  clicked
   in an article to another article, or an article is loaded via search.
  This
   will make ONE call to the API to load article content. API request
 looks
   like:
  
  
 
 http://en.m.wikipedia.org/w/api.php?format=jsonaction=mobileviewpage=Liverpool+F.C.+in+European+footballvariant=enredirects=yesprop=sections%7Ctextnoheadings=yessectionprop=level%7Cline%7Canchorsections=all
  
   These API requests are identical, but only #2 should be counted as a
   'pageview' - #1 is a secondary API request and should not be counted
 as a
   'pageview'. You could make the argument that we just count all of these
  API
   requests as pageviews, but there are cases when we can't load article
   content from the API (like devices that do not support JS), so we need
 to
   be able to count the traditional page request as a pageview - thus we
  need
   a way to differentiate the types of API requests being made when they
   otherwise share the same URL.
  
  
  
   On Mon, Feb 11, 2013 at 6:42 PM, Jon Robson jdlrob...@gmail.com
 wrote:
  
I'm a bit worried that now we are asking why pages are lazy loaded
rather than focusing on the fact that they currently __are doing
this___ and how we can log these (if we want to discuss this further
let's start another thread as I'm getting extremely confused doing so
on this one).
   
Lazy loading sections

For motivation behind moving MobileFrontend into the direction of
 lazy
loading section content and subsequent pages can be found here [1], I
just gave it a refresh as it was a little out of date.
   
In summary the reason is to
1) make the app feel more responsive by simply loading content rather
than reloading the entire interface
2) reducing the payload sent to a device.
   
Session Tracking

   
Going back to the discussion of tracking mobile page views, it sounds
like a header stating whether a page is being viewed in alpha, beta
 or
stable works fine for standard page views.
   
As for the situations where an entire page is loaded via the api it
makes no dif
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Engineering] RFC: Introducing two new HTTP headers to track mobile pageviews

2013-02-11 Thread Max Semenik
On 11.02.2013, 22:11 Asher wrote:

 And then I'd wonder about the server side implementation. How will frontend
 cache invalidation work? Are we going to need to purge every individual
 article section relative to /w/api.php on edit?

Since the API doesn't require pretty URLs, we could simply append the
current revision ID to the mobileview URLs.

 Article HTML in memcached
 (parser cache), mobile processed HTML in memcached.. Now individual
 sections in memcached? If so, should we calculate memcached space needs for
 article text as 3x the current parser cache utilization? More memcached
 usage is great, not asking to dissuade its use but because its better to
 capacity plan than to react.

action=mobileview caches pages only in full and serves
only sections requested, so no changes in request patterns will result
in increased memcached usage.

-- 
Best regards,
  Max Semenik ([[User:MaxSem]])


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Engineering] RFC: Introducing two new HTTP headers to track mobile pageviews

2013-02-11 Thread Asher Feldman
Max - good answers re: caching concerns.  That leaves studying if the bytes
transferred on average mobile article view increases or decreases with lazy
section loading.  If it increases, I'd say this isn't a positive direction
to go in and stop there.  If it decreases, then we should look at the
effect on total latency, number of requests required per pageview, and the
impact on backend apache utilization which I'd expect to be  0.

Does the mobile team have specific goals that this project aims to
accomplish?  If so, we can use those as the measure against which to
compare an impact analysis.

On Mon, Feb 11, 2013 at 12:21 PM, Max Semenik maxsem.w...@gmail.com wrote:

 On 11.02.2013, 22:11 Asher wrote:

  And then I'd wonder about the server side implementation. How will
 frontend
  cache invalidation work? Are we going to need to purge every individual
  article section relative to /w/api.php on edit?

 Since the API doesn't require pretty URLs, we could simply append the
 current revision ID to the mobileview URLs.

  Article HTML in memcached
  (parser cache), mobile processed HTML in memcached.. Now individual
  sections in memcached? If so, should we calculate memcached space needs
 for
  article text as 3x the current parser cache utilization? More memcached
  usage is great, not asking to dissuade its use but because its better to
  capacity plan than to react.

 action=mobileview caches pages only in full and serves
 only sections requested, so no changes in request patterns will result
 in increased memcached usage.

 --
 Best regards,
   Max Semenik ([[User:MaxSem]])


 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Engineering] RFC: Introducing two new HTTP headers to track mobile pageviews

2013-02-11 Thread Jon Robson
I'm a bit worried that now we are asking why pages are lazy loaded
rather than focusing on the fact that they currently __are doing
this___ and how we can log these (if we want to discuss this further
let's start another thread as I'm getting extremely confused doing so
on this one).

Lazy loading sections

For motivation behind moving MobileFrontend into the direction of lazy
loading section content and subsequent pages can be found here [1], I
just gave it a refresh as it was a little out of date.

In summary the reason is to
1) make the app feel more responsive by simply loading content rather
than reloading the entire interface
2) reducing the payload sent to a device.

Session Tracking


Going back to the discussion of tracking mobile page views, it sounds
like a header stating whether a page is being viewed in alpha, beta or
stable works fine for standard page views.

As for the situations where an entire page is loaded via the api it
makes no difference to us to whether we
1) send the same header (set via javascript) or
2) add a query string parameter.

The only advantage I can see of using a header is that an initial page
load of the article San Francisco currently uses the same api url as a
page load of the article San Francisco via javascript (e.g. I click a
link to 'San Francisco' on the California article).

In this new method they would use different urls (as the data sent is
different). I'm not sure how that would effect caching.

Let us know which method is preferred. From my perspective
implementation of either is easy.

[1] http://www.mediawiki.org/wiki/MobileFrontend/Dynamic_Sections

On Mon, Feb 11, 2013 at 12:50 PM, Asher Feldman afeld...@wikimedia.org wrote:
 Max - good answers re: caching concerns.  That leaves studying if the bytes
 transferred on average mobile article view increases or decreases with lazy
 section loading.  If it increases, I'd say this isn't a positive direction
 to go in and stop there.  If it decreases, then we should look at the
 effect on total latency, number of requests required per pageview, and the
 impact on backend apache utilization which I'd expect to be  0.

 Does the mobile team have specific goals that this project aims to
 accomplish?  If so, we can use those as the measure against which to
 compare an impact analysis.

 On Mon, Feb 11, 2013 at 12:21 PM, Max Semenik maxsem.w...@gmail.com wrote:

 On 11.02.2013, 22:11 Asher wrote:

  And then I'd wonder about the server side implementation. How will
 frontend
  cache invalidation work? Are we going to need to purge every individual
  article section relative to /w/api.php on edit?

 Since the API doesn't require pretty URLs, we could simply append the
 current revision ID to the mobileview URLs.

  Article HTML in memcached
  (parser cache), mobile processed HTML in memcached.. Now individual
  sections in memcached? If so, should we calculate memcached space needs
 for
  article text as 3x the current parser cache utilization? More memcached
  usage is great, not asking to dissuade its use but because its better to
  capacity plan than to react.

 action=mobileview caches pages only in full and serves
 only sections requested, so no changes in request patterns will result
 in increased memcached usage.

 --
 Best regards,
   Max Semenik ([[User:MaxSem]])


 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l



-- 
Jon Robson
http://jonrobson.me.uk
@rakugojon

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Engineering] RFC: Introducing two new HTTP headers to track mobile pageviews

2013-02-11 Thread Arthur Richards
Thanks, Jon. To try and clarify a bit more about the API requests... they
are not made on a per-section basis. As I mentioned earlier, there are two
cases in which article content gets loaded by the API:

1) Going directly to a page (eg clicking a link from a Google search) will
result in the backend serving a page with ONLY summary section content and
section headers. The rest of the page is lazily loaded via API request once
the JS for the page gets loaded. The idea is to increase responsiveness by
reducing the delay for an article to load (further details in the article
Jon previously linked to). The API request looks like:
http://en.m.wikipedia.org/w/api.php?format=jsonaction=mobileviewpage=Liverpool+F.C.+in+European+footballvariant=enredirects=yesprop=sections%7Ctextnoheadings=yessectionprop=level%7Cline%7Canchorsections=all

2) Loading an article entirely via Javascript - like when a link is clicked
in an article to another article, or an article is loaded via search. This
will make ONE call to the API to load article content. API request looks
like:
http://en.m.wikipedia.org/w/api.php?format=jsonaction=mobileviewpage=Liverpool+F.C.+in+European+footballvariant=enredirects=yesprop=sections%7Ctextnoheadings=yessectionprop=level%7Cline%7Canchorsections=all

These API requests are identical, but only #2 should be counted as a
'pageview' - #1 is a secondary API request and should not be counted as a
'pageview'. You could make the argument that we just count all of these API
requests as pageviews, but there are cases when we can't load article
content from the API (like devices that do not support JS), so we need to
be able to count the traditional page request as a pageview - thus we need
a way to differentiate the types of API requests being made when they
otherwise share the same URL.



On Mon, Feb 11, 2013 at 6:42 PM, Jon Robson jdlrob...@gmail.com wrote:

 I'm a bit worried that now we are asking why pages are lazy loaded
 rather than focusing on the fact that they currently __are doing
 this___ and how we can log these (if we want to discuss this further
 let's start another thread as I'm getting extremely confused doing so
 on this one).

 Lazy loading sections
 
 For motivation behind moving MobileFrontend into the direction of lazy
 loading section content and subsequent pages can be found here [1], I
 just gave it a refresh as it was a little out of date.

 In summary the reason is to
 1) make the app feel more responsive by simply loading content rather
 than reloading the entire interface
 2) reducing the payload sent to a device.

 Session Tracking
 

 Going back to the discussion of tracking mobile page views, it sounds
 like a header stating whether a page is being viewed in alpha, beta or
 stable works fine for standard page views.

 As for the situations where an entire page is loaded via the api it
 makes no difference to us to whether we
 1) send the same header (set via javascript) or
 2) add a query string parameter.

 The only advantage I can see of using a header is that an initial page
 load of the article San Francisco currently uses the same api url as a
 page load of the article San Francisco via javascript (e.g. I click a
 link to 'San Francisco' on the California article).

 In this new method they would use different urls (as the data sent is
 different). I'm not sure how that would effect caching.

 Let us know which method is preferred. From my perspective
 implementation of either is easy.

 [1] http://www.mediawiki.org/wiki/MobileFrontend/Dynamic_Sections

 On Mon, Feb 11, 2013 at 12:50 PM, Asher Feldman afeld...@wikimedia.org
 wrote:
  Max - good answers re: caching concerns.  That leaves studying if the
 bytes
  transferred on average mobile article view increases or decreases with
 lazy
  section loading.  If it increases, I'd say this isn't a positive
 direction
  to go in and stop there.  If it decreases, then we should look at the
  effect on total latency, number of requests required per pageview, and
 the
  impact on backend apache utilization which I'd expect to be  0.
 
  Does the mobile team have specific goals that this project aims to
  accomplish?  If so, we can use those as the measure against which to
  compare an impact analysis.
 
  On Mon, Feb 11, 2013 at 12:21 PM, Max Semenik maxsem.w...@gmail.com
 wrote:
 
  On 11.02.2013, 22:11 Asher wrote:
 
   And then I'd wonder about the server side implementation. How will
  frontend
   cache invalidation work? Are we going to need to purge every
 individual
   article section relative to /w/api.php on edit?
 
  Since the API doesn't require pretty URLs, we could simply append the
  current revision ID to the mobileview URLs.
 
   Article HTML in memcached
   (parser cache), mobile processed HTML in memcached.. Now individual
   sections in memcached? If so, should we calculate memcached space
 needs
  for
   article text as 3x the current parser cache utilization? More
 

Re: [Wikitech-l] [Engineering] RFC: Introducing two new HTTP headers to track mobile pageviews

2013-02-11 Thread Asher Feldman
Thanks for the clarification Arthur, that clears up some misconceptions I
had.  I saw a demo around the allstaff where individual sections were lazy
loaded, so I think I had that in my head.

It does still seem to me that the data to determine secondary api requests
should already be present in the existing log line. If the value of the
page param in an action=mobileview api request matches the page in the
referrer (perhaps with normalization), it's a secondary request as per case
1 below.  Otherwise, it's a pageview as per case 2.  Difficult or expensive
to reconcile?  Not when you're doing distributed log analysis via hadoop.

On Mon, Feb 11, 2013 at 7:11 PM, Arthur Richards aricha...@wikimedia.orgwrote:

 Thanks, Jon. To try and clarify a bit more about the API requests... they
 are not made on a per-section basis. As I mentioned earlier, there are two
 cases in which article content gets loaded by the API:

 1) Going directly to a page (eg clicking a link from a Google search) will
 result in the backend serving a page with ONLY summary section content and
 section headers. The rest of the page is lazily loaded via API request once
 the JS for the page gets loaded. The idea is to increase responsiveness by
 reducing the delay for an article to load (further details in the article
 Jon previously linked to). The API request looks like:

 http://en.m.wikipedia.org/w/api.php?format=jsonaction=mobileviewpage=Liverpool+F.C.+in+European+footballvariant=enredirects=yesprop=sections%7Ctextnoheadings=yessectionprop=level%7Cline%7Canchorsections=all

 2) Loading an article entirely via Javascript - like when a link is clicked
 in an article to another article, or an article is loaded via search. This
 will make ONE call to the API to load article content. API request looks
 like:

 http://en.m.wikipedia.org/w/api.php?format=jsonaction=mobileviewpage=Liverpool+F.C.+in+European+footballvariant=enredirects=yesprop=sections%7Ctextnoheadings=yessectionprop=level%7Cline%7Canchorsections=all

 These API requests are identical, but only #2 should be counted as a
 'pageview' - #1 is a secondary API request and should not be counted as a
 'pageview'. You could make the argument that we just count all of these API
 requests as pageviews, but there are cases when we can't load article
 content from the API (like devices that do not support JS), so we need to
 be able to count the traditional page request as a pageview - thus we need
 a way to differentiate the types of API requests being made when they
 otherwise share the same URL.



 On Mon, Feb 11, 2013 at 6:42 PM, Jon Robson jdlrob...@gmail.com wrote:

  I'm a bit worried that now we are asking why pages are lazy loaded
  rather than focusing on the fact that they currently __are doing
  this___ and how we can log these (if we want to discuss this further
  let's start another thread as I'm getting extremely confused doing so
  on this one).
 
  Lazy loading sections
  
  For motivation behind moving MobileFrontend into the direction of lazy
  loading section content and subsequent pages can be found here [1], I
  just gave it a refresh as it was a little out of date.
 
  In summary the reason is to
  1) make the app feel more responsive by simply loading content rather
  than reloading the entire interface
  2) reducing the payload sent to a device.
 
  Session Tracking
  
 
  Going back to the discussion of tracking mobile page views, it sounds
  like a header stating whether a page is being viewed in alpha, beta or
  stable works fine for standard page views.
 
  As for the situations where an entire page is loaded via the api it
  makes no difference to us to whether we
  1) send the same header (set via javascript) or
  2) add a query string parameter.
 
  The only advantage I can see of using a header is that an initial page
  load of the article San Francisco currently uses the same api url as a
  page load of the article San Francisco via javascript (e.g. I click a
  link to 'San Francisco' on the California article).
 
  In this new method they would use different urls (as the data sent is
  different). I'm not sure how that would effect caching.
 
  Let us know which method is preferred. From my perspective
  implementation of either is easy.
 
  [1] http://www.mediawiki.org/wiki/MobileFrontend/Dynamic_Sections
 
  On Mon, Feb 11, 2013 at 12:50 PM, Asher Feldman afeld...@wikimedia.org
  wrote:
   Max - good answers re: caching concerns.  That leaves studying if the
  bytes
   transferred on average mobile article view increases or decreases with
  lazy
   section loading.  If it increases, I'd say this isn't a positive
  direction
   to go in and stop there.  If it decreases, then we should look at the
   effect on total latency, number of requests required per pageview, and
  the
   impact on backend apache utilization which I'd expect to be  0.
  
   Does the mobile team have specific goals that this project aims to
   

Re: [Wikitech-l] [Engineering] RFC: Introducing two new HTTP headers to track mobile pageviews

2013-02-09 Thread Asher Feldman
On Thu, Feb 7, 2013 at 4:32 AM, Mark Bergsma m...@wikimedia.org wrote:

  - Since we're repurposing X-CS, should we perhaps rename it to something
  more apt to address concerns about cryptic non-standard headers flying
  about?

 I'd like to propose to define *one* request header to be used for all
 analytics purposes. It can be key/value pairs, and be set client side where
 applicable.


There's been some confusion in this thread between headers used by
mediawiki in determining content generation or for cache variance, and
those intended only for logging.  The zero carrier header is used by the
zero extension to return specific content banners and set different default
behaviors (i.e. hide all images) as negotiated with individual mobile
carriers.  A reader familiar with this might note that their are separate
X-CS and X-Carrier headers but X-Carrier is supposed to go away now.

Agreed that there should be a single header for content that's strictly for
analytics purposes.  All changes to the udplog format in the last year or
so could likely be reverted except for the delimiter change, with a
multipurpose analytics key/value field added for all else.


 I think the question of using a URL param vs a request header should
 mainly take into account whether the response varies on the value of the
 parameter. If the responses are otherwise identical, and the value is only
 used for analytics purposes, I would prefer to put that into the above
 header instead, as it will impair cacheability / cache size otherwise (even
 if those requests are currently not cacheable for other reasons). If the
 responses are actually different based on this parameter, I would prefer to
 have it in the URL where possible.


For this particular case, the API requests are for either getting specific
sections of an article as opposed to either the whole thing, or the first
section as part of an initial pageview.  I might not have grokked the
original RFC email well, but I don't understand why this was being
discussed as a logging challenge or necessitating a request header.  A
mobile api request to just get section 3 of the article on otters should
already utilize a query param denoting that section 3 is being fetched, and
is already clearly not a primary request.

Whether or not it makes sense for mobile to move in the direction of
splitting up article views into many api requests is something I'd love to
see backed up by data.  I'm skeptical for multiple reasons.
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Engineering] RFC: Introducing two new HTTP headers to track mobile pageviews

2013-02-07 Thread Mark Bergsma

On Feb 6, 2013, at 9:32 PM, David Schoonover d...@wikimedia.org wrote:

 Just want to summarize and make sure I've got the right conclusions, as
 this thread has wandered a bit.
 
 *1. X-MF-Mode: Alpha/Beta Site Usage*
 *
 *
 We'll roll this into the X-CS header, which will now be KV-pairs (using
 normal URL encoding), and set by Varnish. This will avoid an explosion of
 cryptic headers for analytic purposes.
 
 Questions:
 - It seems there's some confusion around bypassing Varnish. If I
 understand correctly, it's not that Varnish is ever bypassed, just that the
 upstream response is not cached if cookies are present. Is that right?

Yes

 - Since we're repurposing X-CS, should we perhaps rename it to something
 more apt to address concerns about cryptic non-standard headers flying
 about?

I'd like to propose to define *one* request header to be used for all analytics 
purposes. It can be key/value pairs, and be set client side where applicable. 
Varnish can append to it where needed, later keys overriding earlier ones. Then 
we can log that one header across all HTTP/caching clusters without having to 
change the log stream all the time, and without wasting much space, and caching 
edge configuration changes are kept to a minimum as well.

And we might as well be transparent in its naming. header name 
Log-Parameters:?

 *2. X-MF-Req: Primary vs Secondary API Requests*
 
 This header will be replaced with a query parameter set by the client-side
 JS code making the request. Analytics will parse it out at processing time
 and Do The Right Thing.


I think the question of using a URL param vs a request header should mainly 
take into account whether the response varies on the value of the parameter. If 
the responses are otherwise identical, and the value is only used for analytics 
purposes, I would prefer to put that into the above header instead, as it will 
impair cacheability / cache size otherwise (even if those requests are 
currently not cacheable for other reasons). If the responses are actually 
different based on this parameter, I would prefer to have it in the URL where 
possible.

-- 
Mark Bergsma m...@wikimedia.org
Lead Operations Architect
Wikimedia Foundation





___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Engineering] RFC: Introducing two new HTTP headers to track mobile pageviews

2013-02-07 Thread David Schoonover

 I'd like to propose to define *one* request header to be used for all
 analytics purposes. It can be key/value pairs, and be set client side where
 applicable. Varnish can append to it where needed, later keys overriding
 earlier ones. Then we can log that one header across all HTTP/caching
 clusters without having to change the log stream all the time, and without
 wasting much space, and caching edge configuration changes are kept to a
 minimum as well.


Agreed. Instrumentation should ideally never get in the way of production
performance, so if we can cut or optimize header use for logging without
being too onerous, we'll happily do so. afaik, the reasons that custom HTTP
headers are used at all are:
- They're accessible from varnishncsa without code modifications;
- Varnish and/or other parties in the request chain can munge the values
prior to logging to save bytes (examples being X-CS, which replaces the
semantic carrier name with a [vastly shorter] numeric code, and the
proposed X-MF-Mode header, which prevents the need to log the whole cookies
header for post-processing).

Ideally, none of this should need to make a trip to the client. I don't
recall seeing anything in the Varnish docs providing a way to send values
exclusively to the loggers, but if there is, that's an easy win, and it
wouldn't require any changes to our parsing pipeline.

If that's not possible, it makes sense to collapse various headers into a
KV field; that would require changes on our side, including all downstream
consumers of the log stream (which is surprisingly large), so it's not a
trivial move.

--
David Schoonover
d...@wikimedia.org
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Engineering] RFC: Introducing two new HTTP headers to track mobile pageviews

2013-02-06 Thread David Schoonover
Just want to summarize and make sure I've got the right conclusions, as
this thread has wandered a bit.

*1. X-MF-Mode: Alpha/Beta Site Usage*
*
*
We'll roll this into the X-CS header, which will now be KV-pairs (using
normal URL encoding), and set by Varnish. This will avoid an explosion of
cryptic headers for analytic purposes.

Questions:
- It seems there's some confusion around bypassing Varnish. If I
understand correctly, it's not that Varnish is ever bypassed, just that the
upstream response is not cached if cookies are present. Is that right?
- Since we're repurposing X-CS, should we perhaps rename it to something
more apt to address concerns about cryptic non-standard headers flying
about?


*2. X-MF-Req: Primary vs Secondary API Requests*

This header will be replaced with a query parameter set by the client-side
JS code making the request. Analytics will parse it out at processing time
and Do The Right Thing.


Kindly correct me if I've gotten anything wrong.


--
David Schoonover
d...@wikimedia.org


On Tue, Feb 5, 2013 at 2:36 PM, Diederik van Liere
dvanli...@wikimedia.orgwrote:

  Analytics folks, is this workable from your perspective?
 
  Yes, this works fine for us and it's also no problem to set multiple
 key/value pairs in the http header that we are now using for the X-CS
 header.
 Diederik
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] [Engineering] RFC: Introducing two new HTTP headers to track mobile pageviews

2013-02-06 Thread Asher Feldman
On Wednesday, February 6, 2013, David Schoonover wrote:

 Just want to summarize and make sure I've got the right conclusions, as
 this thread has wandered a bit.

 *1. X-MF-Mode: Alpha/Beta Site Usage*
 *
 *
 We'll roll this into the X-CS header, which will now be KV-pairs (using
 normal URL encoding), and set by Varnish.


Nope. There will be a header denoting non-standard MobileFrontend views if
the mobile team wants to leave the caching situation as is. It will be a
response header set by mediawiki, not varnish. The header will have a
unique name, it will not share the name of the zero carrier header. The
udplog field that currently only ever contains carrier information on zero
requests will become a key value field. Udplog fields are not named, they
are positional.


  This will avoid an explosion of
 cryptic headers for analytic purposes.

 Questions:
 - It seems there's some confusion around bypassing Varnish. If I
 understand correctly, it's not that Varnish is ever bypassed, just that the
 upstream response is not cached if cookies are present. Is that right?


Bypasses varnish caching != bypassing varnish.  I don't see any use of
the later in this thread, but if there has been confusion, know that all
m.wikipedia.org requests are served via varnish.


 - Since we're repurposing X-CS, should we perhaps rename it to something
 more apt to address concerns about cryptic non-standard headers flying
 about?


Nope.. We're repurposing the fixed position udplog field, not the zero
carrier code header.




 *2. X-MF-Req: Primary vs Secondary API Requests*

 This header will be replaced with a query parameter set by the client-side
 JS code making the request. Analytics will parse it out at processing time
 and Do The Right Thing.


 Kindly correct me if I've gotten anything wrong.


 --
 David Schoonover
 d...@wikimedia.org


 On Tue, Feb 5, 2013 at 2:36 PM, Diederik van Liere
 dvanli...@wikimedia.orgwrote:

   Analytics folks, is this workable from your perspective?
  
   Yes, this works fine for us and it's also no problem to set multiple
  key/value pairs in the http header that we are now using for the X-CS
  header.
  Diederik
  ___
  Wikitech-l mailing list
  Wikitech-l@lists.wikimedia.org
  https://lists.wikimedia.org/mailman/listinfo/wikitech-l
 
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Engineering] RFC: Introducing two new HTTP headers to track mobile pageviews

2013-02-06 Thread David Schoonover
That all sounds fine to me so long as we're all agreed.

--
David Schoonover
d...@wikimedia.org


On Wed, Feb 6, 2013 at 12:59 PM, Asher Feldman afeld...@wikimedia.orgwrote:

 On Wednesday, February 6, 2013, David Schoonover wrote:

  Just want to summarize and make sure I've got the right conclusions, as
  this thread has wandered a bit.
 
  *1. X-MF-Mode: Alpha/Beta Site Usage*
  *
  *
  We'll roll this into the X-CS header, which will now be KV-pairs (using
  normal URL encoding), and set by Varnish.


 Nope. There will be a header denoting non-standard MobileFrontend views if
 the mobile team wants to leave the caching situation as is. It will be a
 response header set by mediawiki, not varnish. The header will have a
 unique name, it will not share the name of the zero carrier header. The
 udplog field that currently only ever contains carrier information on zero
 requests will become a key value field. Udplog fields are not named, they
 are positional.


   This will avoid an explosion of
  cryptic headers for analytic purposes.
 
  Questions:
  - It seems there's some confusion around bypassing Varnish. If I
  understand correctly, it's not that Varnish is ever bypassed, just that
 the
  upstream response is not cached if cookies are present. Is that right?


 Bypasses varnish caching != bypassing varnish.  I don't see any use of
 the later in this thread, but if there has been confusion, know that all
 m.wikipedia.org requests are served via varnish.


  - Since we're repurposing X-CS, should we perhaps rename it to something
  more apt to address concerns about cryptic non-standard headers flying
  about?


 Nope.. We're repurposing the fixed position udplog field, not the zero
 carrier code header.


 
 
  *2. X-MF-Req: Primary vs Secondary API Requests*
 
  This header will be replaced with a query parameter set by the
 client-side
  JS code making the request. Analytics will parse it out at processing
 time
  and Do The Right Thing.
 
 
  Kindly correct me if I've gotten anything wrong.
 
 
  --
  David Schoonover
  d...@wikimedia.org
 
 
  On Tue, Feb 5, 2013 at 2:36 PM, Diederik van Liere
  dvanli...@wikimedia.orgwrote:
 
Analytics folks, is this workable from your perspective?
   
Yes, this works fine for us and it's also no problem to set multiple
   key/value pairs in the http header that we are now using for the X-CS
   header.
   Diederik
   ___
   Wikitech-l mailing list
   Wikitech-l@lists.wikimedia.org
   https://lists.wikimedia.org/mailman/listinfo/wikitech-l
  
  ___
  Wikitech-l mailing list
  Wikitech-l@lists.wikimedia.org
  https://lists.wikimedia.org/mailman/listinfo/wikitech-l
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Engineering] RFC: Introducing two new HTTP headers to track mobile pageviews

2013-02-06 Thread Asher Feldman
On Wednesday, February 6, 2013, David Schoonover wrote:

 That all sounds fine to me so long as we're all agreed.


Lol. RFC closed.


 --
 David Schoonover
 d...@wikimedia.org javascript:;


 On Wed, Feb 6, 2013 at 12:59 PM, Asher Feldman 
 afeld...@wikimedia.orgjavascript:;
 wrote:

  On Wednesday, February 6, 2013, David Schoonover wrote:
 
   Just want to summarize and make sure I've got the right conclusions, as
   this thread has wandered a bit.
  
   *1. X-MF-Mode: Alpha/Beta Site Usage*
   *
   *
   We'll roll this into the X-CS header, which will now be KV-pairs (using
   normal URL encoding), and set by Varnish.
 
 
  Nope. There will be a header denoting non-standard MobileFrontend views
 if
  the mobile team wants to leave the caching situation as is. It will be a
  response header set by mediawiki, not varnish. The header will have a
  unique name, it will not share the name of the zero carrier header. The
  udplog field that currently only ever contains carrier information on
 zero
  requests will become a key value field. Udplog fields are not named, they
  are positional.
 
 
This will avoid an explosion of
   cryptic headers for analytic purposes.
  
   Questions:
   - It seems there's some confusion around bypassing Varnish. If I
   understand correctly, it's not that Varnish is ever bypassed, just that
  the
   upstream response is not cached if cookies are present. Is that right?
 
 
  Bypasses varnish caching != bypassing varnish.  I don't see any use
 of
  the later in this thread, but if there has been confusion, know that all
  m.wikipedia.org requests are served via varnish.
 
 
   - Since we're repurposing X-CS, should we perhaps rename it to
 something
   more apt to address concerns about cryptic non-standard headers flying
   about?
 
 
  Nope.. We're repurposing the fixed position udplog field, not the zero
  carrier code header.
 
 
  
  
   *2. X-MF-Req: Primary vs Secondary API Requests*
  
   This header will be replaced with a query parameter set by the
  client-side
   JS code making the request. Analytics will parse it out at processing
  time
   and Do The Right Thing.
  
  
   Kindly correct me if I've gotten anything wrong.
  
  
   --
   David Schoonover
   d...@wikimedia.org javascript:;
  
  
   On Tue, Feb 5, 2013 at 2:36 PM, Diederik van Liere
   dvanli...@wikimedia.org javascript:;wrote:
  
 Analytics folks, is this workable from your perspective?

 Yes, this works fine for us and it's also no problem to set
 multiple
key/value pairs in the http header that we are now using for the X-CS
header.
Diederik
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org javascript:;
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
   
   ___
   Wikitech-l mailing list
   Wikitech-l@lists.wikimedia.org javascript:;
   https://lists.wikimedia.org/mailman/listinfo/wikitech-l
  ___
  Wikitech-l mailing list
  Wikitech-l@lists.wikimedia.org javascript:;
  https://lists.wikimedia.org/mailman/listinfo/wikitech-l
 
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org javascript:;
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Engineering] RFC: Introducing two new HTTP headers to track mobile pageviews

2013-02-05 Thread Arthur Richards
On Mon, Feb 4, 2013 at 7:12 PM, Asher Feldman afeld...@wikimedia.orgwrote:

 On Mon, Feb 4, 2013 at 5:21 PM, Asher Feldman afeld...@wikimedia.org
 wrote:

  On Mon, Feb 4, 2013 at 4:59 PM, Arthur Richards aricha...@wikimedia.org
 wrote:
 
  In the case of the cookie, the header would actually get set by the
  backend
  response (from Apache) and I believe Dave cooked up or was planning on
  cooking some magic to somehow make that information discernable when
  results are cached.
 
 
  Opting into the mobile beta as it is currently implemented bypasses
  varnish caching for all future mobile pageviews for the life of the
  cookie.  So this probably isn't quite right (at least the when results
 are
  cached part.)
 

 Thinking about this further.. So long as all beta optins bypass all caching
 and always have to hit an apache, it would be fine for mf to set a response
 header reflecting the version of the site the optin cookie triggers (but
 only if there's an optin, avoid setting on standard.)  I'd just prefer this
 to be logged without adding a field to the entire udplog stream that will
 generally just be wasted space.  Mobile already has one dedicated udplog
 field currently intended for zero carriers, wasted log space for nearly
 every request.  Make it a key/value field that can contain multiple keys,
 i.e. zc:orn;v:b1 (zero carrier = orange whatever, version = beta1)

 If by some chance mobile beta gets implemented in a way that doesn't kill
 frontend caching for its users (maybe solely via different js behavior
 based on the presence of the optin cookie?) the above won't be applicable
 anymore, so using the event log facility / pixel service to note beta usage
 becomes more appropriate.  If beta usage is going to be driven upwards, I
 hope this approach is seriously considered.  Mobile currently only has
 around a 58% edge cache hitrate as it is and it sounds like upcoming
 features will place significant new demands on the apaches and for
 memcached space.  If a non cache busting beta site is doable, go for the
 logging method now that will later be compatible with it to avoid having to
 change processing methods.
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l


OK - this is all making a lot more sense to me now, thanks for your
clarifications and suggestions, Asher.

So, from the mobile team's perspective a straightforward implementation to
get us to our goal might be to:
1) add a query parameter to identify 'secondary' API hits (eg an API
request for page content made after an initial request for that page was
made, all other requests stay the same)
2) use the header solution to identify beta/alpha cookies (HTTP header set
by backend response when user is opted in).

One thing I'd like to double check though is that 'Opting into the mobile
beta as it is currently implemented bypasses varnish caching for all future
mobile pageviews for the life of the cookie' - I thought the Varnish cache
was just varied by the optin cookies, not totally bypassed. I've looked at
headers from some sample requests I've made with the beta opt-in and I'm
not seeing any cache hits, so I gather you are correct. Can you please
confirm this?

Analytics folks, is this workable from your perspective?

-- 
Arthur Richards
Software Engineer, Mobile
[[User:Awjrichards]]
IRC: awjr
+1-415-839-6885 x6687
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Engineering] RFC: Introducing two new HTTP headers to track mobile pageviews

2013-02-05 Thread Diederik van Liere
 Analytics folks, is this workable from your perspective?

 Yes, this works fine for us and it's also no problem to set multiple
key/value pairs in the http header that we are now using for the X-CS
header.
Diederik
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Engineering] RFC: Introducing two new HTTP headers to track mobile pageviews

2013-02-04 Thread Arthur Richards
On Sun, Feb 3, 2013 at 2:35 AM, Asher Feldman afeld...@wikimedia.orgwrote:

 Regarding varnish cacheability of mobile API requests with a logging query
 param - it would probably be worth making frontend varnishes strip out all
 occurrences of that query param and its value from their backend requests
 so they're all the same to the caching instances. A generic param name that
 can take any value would allow for adding as many extra log values as
 needed, limited only by the uri log field length.

 l=mft2l=mfstable etc.

 So still an edge cache change but the result is more flexible
 while avoiding changing the fixed field length log format across unrelated
 systems like text squids or image caches.

 On Sunday, February 3, 2013, Asher Feldman wrote:

  If you want to differentiate categories of API requests in logs, add
  descriptive noop query params to the requests. I.e mfmode=2. Doing this
 in
  request headers and altering edge config is unnecessary and a bad design
  pattern. On the analytics side, if parsing query params seems challenging
  vs. having a fixed field to parse, deal.
 


Asher, I understand your hesitation about using HTTP header fields, but
there are a couple problems I'm seeing with using query string parameters.
Perhaps you or others have some ideas how to get around these:
* We should keep user-facing URLs canonical as much as possible (primarily
for link sharing)
** If we keep user-facing URLs canonical, we could potentially add query
string params via javascript, but that would only work on devices that
support javascript/have javascript enabled (this might not be a huge deal
as we are planning changes such that users that do not support jQuery will
get a simplified version of the stable site)
* How could this work for the first pageview request (eg a user clicking a
link from Google or even just browsing to http://en.wikipedia.org)?

I may be missing other potential problems - it would be great if others
from the mobile team could chime in.

-- 
Arthur Richards
Software Engineer, Mobile
[[User:Awjrichards]]
IRC: awjr
+1-415-839-6885 x6687
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Engineering] RFC: Introducing two new HTTP headers to track mobile pageviews

2013-02-04 Thread Brion Vibber
On Mon, Feb 4, 2013 at 4:24 PM, Arthur Richards aricha...@wikimedia.orgwrote:

 Asher, I understand your hesitation about using HTTP header fields, but
 there are a couple problems I'm seeing with using query string parameters.
 Perhaps you or others have some ideas how to get around these:
 * We should keep user-facing URLs canonical as much as possible (primarily
 for link sharing)
 ** If we keep user-facing URLs canonical, we could potentially add query
 string params via javascript, but that would only work on devices that
 support javascript/have javascript enabled (this might not be a huge deal
 as we are planning changes such that users that do not support jQuery will
 get a simplified version of the stable site)

* How could this work for the first pageview request (eg a user clicking a
 link from Google or even just browsing to http://en.wikipedia.org)?


I think mainly we need the tracking on the API requests... that's all
JavaScript-initiated, and all hidden from the user. The main problem with
adding parameters would be for caching  but none of the API hits are
currently cacheable so that's not an immediate issue perhaps.

-- brion
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Engineering] RFC: Introducing two new HTTP headers to track mobile pageviews

2013-02-04 Thread Arthur Richards
On Mon, Feb 4, 2013 at 5:30 PM, Brion Vibber br...@pobox.com wrote:

 On Mon, Feb 4, 2013 at 4:24 PM, Arthur Richards aricha...@wikimedia.org
 wrote:

  Asher, I understand your hesitation about using HTTP header fields, but
  there are a couple problems I'm seeing with using query string
 parameters.
  Perhaps you or others have some ideas how to get around these:
  * We should keep user-facing URLs canonical as much as possible
 (primarily
  for link sharing)
  ** If we keep user-facing URLs canonical, we could potentially add query
  string params via javascript, but that would only work on devices that
  support javascript/have javascript enabled (this might not be a huge deal
  as we are planning changes such that users that do not support jQuery
 will
  get a simplified version of the stable site)

 * How could this work for the first pageview request (eg a user clicking a
  link from Google or even just browsing to http://en.wikipedia.org)?
 

 I think mainly we need the tracking on the API requests... that's all
 JavaScript-initiated, and all hidden from the user. The main problem with
 adding parameters would be for caching  but none of the API hits are
 currently cacheable so that's not an immediate issue perhaps.


We also need to be able to differentiate between alpha/beta/stable versions
of the mobile site, without having to parse the cookie header (I believe as
a result of performance constraints around this? I think the analytics team
had looked into this previously).

-- 
Arthur Richards
Software Engineer, Mobile
[[User:Awjrichards]]
IRC: awjr
+1-415-839-6885 x6687
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Engineering] RFC: Introducing two new HTTP headers to track mobile pageviews

2013-02-04 Thread Brion Vibber
On Mon, Feb 4, 2013 at 4:38 PM, Arthur Richards aricha...@wikimedia.orgwrote:

 On Mon, Feb 4, 2013 at 5:30 PM, Brion Vibber br...@pobox.com wrote:

  On Mon, Feb 4, 2013 at 4:24 PM, Arthur Richards aricha...@wikimedia.org
  wrote:
  * How could this work for the first pageview request (eg a user clicking
 a
   link from Google or even just browsing to http://en.wikipedia.org)?
  
 
  I think mainly we need the tracking on the API requests... that's all
  JavaScript-initiated, and all hidden from the user. The main problem with
  adding parameters would be for caching  but none of the API hits are
  currently cacheable so that's not an immediate issue perhaps.
 

 We also need to be able to differentiate between alpha/beta/stable versions
 of the mobile site, without having to parse the cookie header (I believe as
 a result of performance constraints around this? I think the analytics team
 had looked into this previously).


Yeah that's probably not possible if you want to track that for initial
page views. Cookie's the only thing guaranteed to have the data available,
and we have no way to inject a header into mobile web browsers except for
the XHR hits to the API.

-- brion
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Engineering] RFC: Introducing two new HTTP headers to track mobile pageviews

2013-02-04 Thread Arthur Richards
On Mon, Feb 4, 2013 at 5:49 PM, Brion Vibber br...@pobox.com wrote:

 On Mon, Feb 4, 2013 at 4:38 PM, Arthur Richards aricha...@wikimedia.org
 wrote:

  On Mon, Feb 4, 2013 at 5:30 PM, Brion Vibber br...@pobox.com wrote:
 
   On Mon, Feb 4, 2013 at 4:24 PM, Arthur Richards 
 aricha...@wikimedia.org
   wrote:
   * How could this work for the first pageview request (eg a user
 clicking
  a
link from Google or even just browsing to http://en.wikipedia.org)?
   
  
   I think mainly we need the tracking on the API requests... that's all
   JavaScript-initiated, and all hidden from the user. The main problem
 with
   adding parameters would be for caching  but none of the API hits
 are
   currently cacheable so that's not an immediate issue perhaps.
  
 
  We also need to be able to differentiate between alpha/beta/stable
 versions
  of the mobile site, without having to parse the cookie header (I believe
 as
  a result of performance constraints around this? I think the analytics
 team
  had looked into this previously).
 

 Yeah that's probably not possible if you want to track that for initial
 page views. Cookie's the only thing guaranteed to have the data available,
 and we have no way to inject a header into mobile web browsers except for
 the XHR hits to the API.

 -- brion
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l


In the case of the cookie, the header would actually get set by the backend
response (from Apache) and I believe Dave cooked up or was planning on
cooking some magic to somehow make that information discernable when
results are cached.


-- 
Arthur Richards
Software Engineer, Mobile
[[User:Awjrichards]]
IRC: awjr
+1-415-839-6885 x6687
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Engineering] RFC: Introducing two new HTTP headers to track mobile pageviews

2013-02-04 Thread Asher Feldman
On Mon, Feb 4, 2013 at 4:24 PM, Arthur Richards aricha...@wikimedia.orgwrote:


 Asher, I understand your hesitation about using HTTP header fields, but
 there are a couple problems I'm seeing with using query string parameters.
 Perhaps you or others have some ideas how to get around these:
 * We should keep user-facing URLs canonical as much as possible (primarily
 for link sharing)
 ** If we keep user-facing URLs canonical, we could potentially add query
 string params via javascript, but that would only work on devices that
 support javascript/have javascript enabled (this might not be a huge deal
 as we are planning changes such that users that do not support jQuery will
 get a simplified version of the stable site)


I was thinking of this as a solution for the X-MF-Req header, based on your
explanation of it earlier in the the thread: Almost correct - I realize I
didn't actually explain it correctly. This would be a request HTTP header
set by the client in API requests made by Javascript provided by
MobileFrontend.

I only meant to apply the query string idea to API requests, which can also
be marked to indicate non-standard versions of the site.  I completely
missed the case of non-api requests about which beta/alpha usage data needs
to be collected.  What about doing so via the eventlog service?  Only for
users actually opted into one of these programs, no need to log anything
special for the majority of users getting the standard site.

* How could this work for the first pageview request (eg a user clicking a
 link from Google or even just browsing to http://en.wikipedia.org)?


I think this is covered by the above, in that the data intended to go into
x-mf-req doesn't apply to this sort of page view, and first views from
users opted into a trial can eventlog the trial usage.
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Engineering] RFC: Introducing two new HTTP headers to track mobile pageviews

2013-02-04 Thread Asher Feldman
On Mon, Feb 4, 2013 at 4:59 PM, Arthur Richards aricha...@wikimedia.orgwrote:

 In the case of the cookie, the header would actually get set by the backend
 response (from Apache) and I believe Dave cooked up or was planning on
 cooking some magic to somehow make that information discernable when
 results are cached.


Opting into the mobile beta as it is currently implemented bypasses varnish
caching for all future mobile pageviews for the life of the cookie.  So
this probably isn't quite right (at least the when results are cached
part.)
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Engineering] RFC: Introducing two new HTTP headers to track mobile pageviews

2013-02-04 Thread Asher Feldman
On Mon, Feb 4, 2013 at 5:21 PM, Asher Feldman afeld...@wikimedia.orgwrote:

 On Mon, Feb 4, 2013 at 4:59 PM, Arthur Richards 
 aricha...@wikimedia.orgwrote:

 In the case of the cookie, the header would actually get set by the
 backend
 response (from Apache) and I believe Dave cooked up or was planning on
 cooking some magic to somehow make that information discernable when
 results are cached.


 Opting into the mobile beta as it is currently implemented bypasses
 varnish caching for all future mobile pageviews for the life of the
 cookie.  So this probably isn't quite right (at least the when results are
 cached part.)


Thinking about this further.. So long as all beta optins bypass all caching
and always have to hit an apache, it would be fine for mf to set a response
header reflecting the version of the site the optin cookie triggers (but
only if there's an optin, avoid setting on standard.)  I'd just prefer this
to be logged without adding a field to the entire udplog stream that will
generally just be wasted space.  Mobile already has one dedicated udplog
field currently intended for zero carriers, wasted log space for nearly
every request.  Make it a key/value field that can contain multiple keys,
i.e. zc:orn;v:b1 (zero carrier = orange whatever, version = beta1)

If by some chance mobile beta gets implemented in a way that doesn't kill
frontend caching for its users (maybe solely via different js behavior
based on the presence of the optin cookie?) the above won't be applicable
anymore, so using the event log facility / pixel service to note beta usage
becomes more appropriate.  If beta usage is going to be driven upwards, I
hope this approach is seriously considered.  Mobile currently only has
around a 58% edge cache hitrate as it is and it sounds like upcoming
features will place significant new demands on the apaches and for
memcached space.  If a non cache busting beta site is doable, go for the
logging method now that will later be compatible with it to avoid having to
change processing methods.
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Engineering] RFC: Introducing two new HTTP headers to track mobile pageviews

2013-02-03 Thread Asher Feldman
If you want to differentiate categories of API requests in logs, add
descriptive noop query params to the requests. I.e mfmode=2. Doing this in
request headers and altering edge config is unnecessary and a bad design
pattern. On the analytics side, if parsing query params seems challenging
vs. having a fixed field to parse, deal.

On Sunday, February 3, 2013, David Schoonover wrote:

 Huh! News to me as well. I definitely agree with that decision. Thanks,
 Ori!

 I've already written the Varnish code for setting X-MF-Mode so it can be
 captured by varnishncsa. Is there agreement to switch to Mobile-Mode, or at
 least, MF-Mode?

 Looking especially to hear from Arthur and Matt.

 --
 David Schoonover
 d...@wikimedia.org javascript:;


 On Sat, Feb 2, 2013 at 2:16 PM, Diederik van Liere
 dvanli...@wikimedia.org javascript:;wrote:

  Thanks Ori, I was not aware of this
  D
 
  Sent from my iPhone
 
  On 2013-02-02, at 16:55, Ori Livneh o...@wikimedia.org javascript:;
 wrote:
 
  
  
   On Saturday, February 2, 2013 at 1:36 PM, Platonides wrote:
  
   I don't like it's cryptic nature.
  
   Someone looking at the headers sent to his browser would be very
   confused about what's the point of «X-MF-Mode: b».
  
   Instead something like this would be much more descriptive:
   X-Mobile-Mode: stable
   X-Mobile-Request: secondary
  
   But that also means sending more bytes through the wire :S
   Well, you can (and should) drop the 'X-' :-)
  
   See http://tools.ietf.org/html/rfc6648: Deprecating the X- Prefix
 and
  Similar Constructs in Application Protocols
  
  
   --
   Ori Livneh
  
  
  
  
   ___
   Wikitech-l mailing list
   Wikitech-l@lists.wikimedia.org javascript:;
   https://lists.wikimedia.org/mailman/listinfo/wikitech-l
 
  ___
  Wikitech-l mailing list
  Wikitech-l@lists.wikimedia.org javascript:;
  https://lists.wikimedia.org/mailman/listinfo/wikitech-l
 
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org javascript:;
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Engineering] RFC: Introducing two new HTTP headers to track mobile pageviews

2013-02-03 Thread Asher Feldman
Regarding varnish cacheability of mobile API requests with a logging query
param - it would probably be worth making frontend varnishes strip out all
occurrences of that query param and its value from their backend requests
so they're all the same to the caching instances. A generic param name that
can take any value would allow for adding as many extra log values as
needed, limited only by the uri log field length.

l=mft2l=mfstable etc.

So still an edge cache change but the result is more flexible
while avoiding changing the fixed field length log format across unrelated
systems like text squids or image caches.

On Sunday, February 3, 2013, Asher Feldman wrote:

 If you want to differentiate categories of API requests in logs, add
 descriptive noop query params to the requests. I.e mfmode=2. Doing this in
 request headers and altering edge config is unnecessary and a bad design
 pattern. On the analytics side, if parsing query params seems challenging
 vs. having a fixed field to parse, deal.

 On Sunday, February 3, 2013, David Schoonover wrote:

 Huh! News to me as well. I definitely agree with that decision. Thanks,
 Ori!

 I've already written the Varnish code for setting X-MF-Mode so it can be
 captured by varnishncsa. Is there agreement to switch to Mobile-Mode, or
 at
 least, MF-Mode?

 Looking especially to hear from Arthur and Matt.

 --
 David Schoonover
 d...@wikimedia.org


 On Sat, Feb 2, 2013 at 2:16 PM, Diederik van Liere
 dvanli...@wikimedia.orgwrote:

  Thanks Ori, I was not aware of this
  D
 
  Sent from my iPhone
 
  On 2013-02-02, at 16:55, Ori Livneh o...@wikimedia.org wrote:
 
  
  
   On Saturday, February 2, 2013 at 1:36 PM, Platonides wrote:
  
   I don't like it's cryptic nature.
  
   Someone looking at the headers sent to his browser would be very
   confused about what's the point of «X-MF-Mode: b».
  
   Instead something like this would be much more descriptive:
   X-Mobile-Mode: stable
   X-Mobile-Request: secondary
  
   But that also means sending more bytes through the wire :S
   Well, you can (and should) drop the 'X-' :-)
  
   See http://tools.ietf.org/html/rfc6648: Deprecating the X- Prefix
 and
  Similar Constructs in Application Protocols
  
  
   --
   Ori Livneh
  
  
  
  
   ___
   Wikitech-l mailing list
   Wikitech-l@lists.wikimedia.org
   https://lists.wikimedia.org/mailman/listinfo/wikitech-l
 
  ___
  Wikitech-l mailing list
  Wikitech-l@lists.wikimedia.org
  https://lists.wikimedia.org/mailman/listinfo/wikitech-l
 
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Engineering] RFC: Introducing two new HTTP headers to track mobile pageviews

2013-02-03 Thread Tyler Romeo
Considering that the query component of a URI is meant to identify the
resource whereas HTTP headers are meant to tell the server additional
information about the request, I think a header approach is much more
appropriate than a no-op query parameter.

If the X- is removed, I'd have no problem with the addition of these
headers, but what is the advantage of having two over one. Wouldn't a
header like:
MobileFrontend: 1/2 a/b/s
work just as fine?

*--*
*Tyler Romeo*
Stevens Institute of Technology, Class of 2015
Major in Computer Science
www.whizkidztech.com | tylerro...@gmail.com


On Sun, Feb 3, 2013 at 4:35 AM, Asher Feldman afeld...@wikimedia.orgwrote:

 Regarding varnish cacheability of mobile API requests with a logging query
 param - it would probably be worth making frontend varnishes strip out all
 occurrences of that query param and its value from their backend requests
 so they're all the same to the caching instances. A generic param name that
 can take any value would allow for adding as many extra log values as
 needed, limited only by the uri log field length.

 l=mft2l=mfstable etc.

 So still an edge cache change but the result is more flexible
 while avoiding changing the fixed field length log format across unrelated
 systems like text squids or image caches.

 On Sunday, February 3, 2013, Asher Feldman wrote:

  If you want to differentiate categories of API requests in logs, add
  descriptive noop query params to the requests. I.e mfmode=2. Doing this
 in
  request headers and altering edge config is unnecessary and a bad design
  pattern. On the analytics side, if parsing query params seems challenging
  vs. having a fixed field to parse, deal.
 
  On Sunday, February 3, 2013, David Schoonover wrote:
 
  Huh! News to me as well. I definitely agree with that decision. Thanks,
  Ori!
 
  I've already written the Varnish code for setting X-MF-Mode so it can be
  captured by varnishncsa. Is there agreement to switch to Mobile-Mode, or
  at
  least, MF-Mode?
 
  Looking especially to hear from Arthur and Matt.
 
  --
  David Schoonover
  d...@wikimedia.org
 
 
  On Sat, Feb 2, 2013 at 2:16 PM, Diederik van Liere
  dvanli...@wikimedia.orgwrote:
 
   Thanks Ori, I was not aware of this
   D
  
   Sent from my iPhone
  
   On 2013-02-02, at 16:55, Ori Livneh o...@wikimedia.org wrote:
  
   
   
On Saturday, February 2, 2013 at 1:36 PM, Platonides wrote:
   
I don't like it's cryptic nature.
   
Someone looking at the headers sent to his browser would be very
confused about what's the point of «X-MF-Mode: b».
   
Instead something like this would be much more descriptive:
X-Mobile-Mode: stable
X-Mobile-Request: secondary
   
But that also means sending more bytes through the wire :S
Well, you can (and should) drop the 'X-' :-)
   
See http://tools.ietf.org/html/rfc6648: Deprecating the X- Prefix
  and
   Similar Constructs in Application Protocols
   
   
--
Ori Livneh
   
   
   
   
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
  
   ___
   Wikitech-l mailing list
   Wikitech-l@lists.wikimedia.org
   https://lists.wikimedia.org/mailman/listinfo/wikitech-l
  
  ___
  Wikitech-l mailing list
  Wikitech-l@lists.wikimedia.org
  https://lists.wikimedia.org/mailman/listinfo/wikitech-l
 
 
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Engineering] RFC: Introducing two new HTTP headers to track mobile pageviews

2013-02-03 Thread Asher Feldman
That's not at all true in the real world. Look at the actual requests for
google analytics on a high percentage of sites, etc.

Setting new request headers for mobile that map to new inflexible fields in
the log stream that must be set on all non mobile requests (\t-\t-)
equals gigabytes of unnecessarily log data every day (that we want
to save 100% of) for no good reason. Wanting to keep query params pure
isn't a good reason.

On Sunday, February 3, 2013, Tyler Romeo wrote:

 Considering that the query component of a URI is meant to identify the
 resource whereas HTTP headers are meant to tell the server additional
 information about the request, I think a header approach is much more
 appropriate than a no-op query parameter.

 If the X- is removed, I'd have no problem with the addition of these
 headers, but what is the advantage of having two over one. Wouldn't a
 header like:
 MobileFrontend: 1/2 a/b/s
 work just as fine?

 *--*
 *Tyler Romeo*
 Stevens Institute of Technology, Class of 2015
 Major in Computer Science
 www.whizkidztech.com | tylerro...@gmail.com javascript:;


 On Sun, Feb 3, 2013 at 4:35 AM, Asher Feldman 
 afeld...@wikimedia.orgjavascript:;
 wrote:

  Regarding varnish cacheability of mobile API requests with a logging
 query
  param - it would probably be worth making frontend varnishes strip out
 all
  occurrences of that query param and its value from their backend requests
  so they're all the same to the caching instances. A generic param name
 that
  can take any value would allow for adding as many extra log values as
  needed, limited only by the uri log field length.
 
  l=mft2l=mfstable etc.
 
  So still an edge cache change but the result is more flexible
  while avoiding changing the fixed field length log format across
 unrelated
  systems like text squids or image caches.
 
  On Sunday, February 3, 2013, Asher Feldman wrote:
 
   If you want to differentiate categories of API requests in logs, add
   descriptive noop query params to the requests. I.e mfmode=2. Doing
 this
  in
   request headers and altering edge config is unnecessary and a bad
 design
   pattern. On the analytics side, if parsing query params seems
 challenging
   vs. having a fixed field to parse, deal.
  
   On Sunday, February 3, 2013, David Schoonover wrote:
  
   Huh! News to me as well. I definitely agree with that decision.
 Thanks,
   Ori!
  
   I've already written the Varnish code for setting X-MF-Mode so it can
 be
   captured by varnishncsa. Is there agreement to switch to Mobile-Mode,
 or
   at
   least, MF-Mode?
  
   Looking especially to hear from Arthur and Matt.
  
   --
   David Schoonover
   d...@wikimedia.org
  
  
   On Sat, Feb 2, 2013 at 2:16 PM, Diederik van Liere
   dvanli...@wikimedia.orgwrote:
  
Thanks Ori, I was not aware of this
D
   
Sent from my iPhone
   
On 2013-02-02, at 16:55, Ori Livneh o...@wikimedia.org wrote:
   


 On Saturday, February 2, 2013 at 1:36 PM, Platonides wrote:

 I don't like it's cryptic nature.

 Someone looking at the headers sent to his browser would be very
 confused about what's the point of «X-MF-Mode: b».

 Instead something like this would be much more descriptive:
 X-Mobile-Mode: stable
 X-Mobile-Request: secondary

 But that also means sending more bytes through the wire :S
 Well, you can (and should) drop the 'X-' :-)

 See http://tools.ietf.org/html/rfc6648: Deprecating the X-
 Prefix
   and
Similar Constructs in Application Protocols


 --
 Ori Livneh




 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l
   
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
  
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Engineering] RFC: Introducing two new HTTP headers to track mobile pageviews

2013-02-03 Thread Tyler Romeo
Remind me again why a production setup is logging every header of every
request? Also, if you are logging every header, then the amount of data
added by a single extra header would be insignificant compared to the rest
of the request.

*--*
*Tyler Romeo*
Stevens Institute of Technology, Class of 2015
Major in Computer Science
www.whizkidztech.com | tylerro...@gmail.com


On Sun, Feb 3, 2013 at 5:12 AM, Asher Feldman afeld...@wikimedia.orgwrote:

 That's not at all true in the real world. Look at the actual requests for
 google analytics on a high percentage of sites, etc.

 Setting new request headers for mobile that map to new inflexible fields in
 the log stream that must be set on all non mobile requests (\t-\t-)
 equals gigabytes of unnecessarily log data every day (that we want
 to save 100% of) for no good reason. Wanting to keep query params pure
 isn't a good reason.

 On Sunday, February 3, 2013, Tyler Romeo wrote:

  Considering that the query component of a URI is meant to identify the
  resource whereas HTTP headers are meant to tell the server additional
  information about the request, I think a header approach is much more
  appropriate than a no-op query parameter.
 
  If the X- is removed, I'd have no problem with the addition of these
  headers, but what is the advantage of having two over one. Wouldn't a
  header like:
  MobileFrontend: 1/2 a/b/s
  work just as fine?
 
  *--*
  *Tyler Romeo*
  Stevens Institute of Technology, Class of 2015
  Major in Computer Science
  www.whizkidztech.com | tylerro...@gmail.com javascript:;
 
 
  On Sun, Feb 3, 2013 at 4:35 AM, Asher Feldman afeld...@wikimedia.org
 javascript:;
  wrote:
 
   Regarding varnish cacheability of mobile API requests with a logging
  query
   param - it would probably be worth making frontend varnishes strip out
  all
   occurrences of that query param and its value from their backend
 requests
   so they're all the same to the caching instances. A generic param name
  that
   can take any value would allow for adding as many extra log values as
   needed, limited only by the uri log field length.
  
   l=mft2l=mfstable etc.
  
   So still an edge cache change but the result is more flexible
   while avoiding changing the fixed field length log format across
  unrelated
   systems like text squids or image caches.
  
   On Sunday, February 3, 2013, Asher Feldman wrote:
  
If you want to differentiate categories of API requests in logs, add
descriptive noop query params to the requests. I.e mfmode=2. Doing
  this
   in
request headers and altering edge config is unnecessary and a bad
  design
pattern. On the analytics side, if parsing query params seems
  challenging
vs. having a fixed field to parse, deal.
   
On Sunday, February 3, 2013, David Schoonover wrote:
   
Huh! News to me as well. I definitely agree with that decision.
  Thanks,
Ori!
   
I've already written the Varnish code for setting X-MF-Mode so it
 can
  be
captured by varnishncsa. Is there agreement to switch to
 Mobile-Mode,
  or
at
least, MF-Mode?
   
Looking especially to hear from Arthur and Matt.
   
--
David Schoonover
d...@wikimedia.org
   
   
On Sat, Feb 2, 2013 at 2:16 PM, Diederik van Liere
dvanli...@wikimedia.orgwrote:
   
 Thanks Ori, I was not aware of this
 D

 Sent from my iPhone

 On 2013-02-02, at 16:55, Ori Livneh o...@wikimedia.org wrote:

 
 
  On Saturday, February 2, 2013 at 1:36 PM, Platonides wrote:
 
  I don't like it's cryptic nature.
 
  Someone looking at the headers sent to his browser would be
 very
  confused about what's the point of «X-MF-Mode: b».
 
  Instead something like this would be much more descriptive:
  X-Mobile-Mode: stable
  X-Mobile-Request: secondary
 
  But that also means sending more bytes through the wire :S
  Well, you can (and should) drop the 'X-' :-)
 
  See http://tools.ietf.org/html/rfc6648: Deprecating the X-
  Prefix
and
 Similar Constructs in Application Protocols
 
 
  --
  Ori Livneh
 
 
 
 
  ___
  Wikitech-l mailing list
  Wikitech-l@lists.wikimedia.org
  https://lists.wikimedia.org/mailman/listinfo/wikitech-l

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l
   
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Engineering] RFC: Introducing two new HTTP headers to track mobile pageviews

2013-02-03 Thread Asher Feldman
On Sunday, February 3, 2013, Tyler Romeo wrote:

 Remind me again why a production setup is logging every header of every
 request?


That's ludicrous. Please reread our udplog format documentation and this
entire thread carefully, especially the first message before commenting any
further.


  Also, if you are logging every header, then the amount of data
 added by a single extra header would be insignificant compared to the rest
 of the request.

 *--*
 *Tyler Romeo*
 Stevens Institute of Technology, Class of 2015
 Major in Computer Science
 www.whizkidztech.com | tylerro...@gmail.com javascript:;


 On Sun, Feb 3, 2013 at 5:12 AM, Asher Feldman 
 afeld...@wikimedia.orgjavascript:;
 wrote:

  That's not at all true in the real world. Look at the actual requests for
  google analytics on a high percentage of sites, etc.
 
  Setting new request headers for mobile that map to new inflexible fields
 in
  the log stream that must be set on all non mobile requests (\t-\t-)
  equals gigabytes of unnecessarily log data every day (that we want
  to save 100% of) for no good reason. Wanting to keep query params pure
  isn't a good reason.
 
  On Sunday, February 3, 2013, Tyler Romeo wrote:
 
   Considering that the query component of a URI is meant to identify the
   resource whereas HTTP headers are meant to tell the server additional
   information about the request, I think a header approach is much more
   appropriate than a no-op query parameter.
  
   If the X- is removed, I'd have no problem with the addition of these
   headers, but what is the advantage of having two over one. Wouldn't a
   header like:
   MobileFrontend: 1/2 a/b/s
   work just as fine?
  
   *--*
   *Tyler Romeo*
   Stevens Institute of Technology, Class of 2015
   Major in Computer Science
   www.whizkidztech.com | tylerro...@gmail.com javascript:;javascript:;
  
  
   On Sun, Feb 3, 2013 at 4:35 AM, Asher Feldman 
   afeld...@wikimedia.orgjavascript:;
  javascript:;
   wrote:
  
Regarding varnish cacheability of mobile API requests with a logging
   query
param - it would probably be worth making frontend varnishes strip
 out
   all
occurrences of that query param and its value from their backend
  requests
so they're all the same to the caching instances. A generic param
 name
   that
can take any value would allow for adding as many extra log values as
needed, limited only by the uri log field length.
   
l=mft2l=mfstable etc.
   
So still an edge cache change but the result is more flexible
while avoiding changing the fixed field length log format across
   unrelated
systems like text squids or image caches.
   
On Sunday, February 3, 2013, Asher Feldman wrote:
   
 If you want to differentiate categories of API requests in logs,
 add
 descriptive noop query params to the requests. I.e mfmode=2. Doing
   this
in
 request headers and altering edge config is unnecessary and a bad
   design
 pattern. On the analytics side, if parsing query params seems
   challenging
 vs. having a fixed field to parse, deal.

 On Sunday, February 3, 2013, David Schoonover wrote:

 Huh! News to me as well. I definitely agree with that decision.
   Thanks,
 Ori!

 I've already written the Varnish code for setting X-MF-Mode so it
  can
   be
 captured by varnishncsa. Is there agreement to switch to
  Mobile-Mode,
   or
 at
 least, MF-Mode?

 Looking especially to hear from Arthur and Matt.

 --
 David Schoonover
 d...@wikimedia.org


 On Sat, Feb 2, 2013 at 2:16 PM, Diederik van Liere
 dvanli...@wikimedia.orgwrote:

  Thanks Ori, I was not aware of this
  D
 
  Sent from my iPhone
 
  On 2013-02-02, at 16:55, Ori Livneh o...@wikimedia.org wrote:
 
  
  
   On Saturday, February 2, 2013 at 1:36 PM, Platonides wrote:
  
   I don't like it's cryptic nature.
  
   Someone looking at the headers sent to his browser would be
  very
   confused about what's the point of «X-MF-Mode: b».
  
   Instead something like this would be much more descriptive:
   X-Mobile-Mode: stable
   X-Mobile-Request: secondary
  
   But that also means sending more bytes through the wire :S
   Well, you can (and should) drop the 'X-' :-)
  
   See http://tools.ietf.org/html/rfc6648: Deprecating the X-
   Prefix
 and
  Similar Constructs in Application Protocols
  
  
   --
   Ori Livneh
  
  
   
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l