Re: [Analytics] How best to accurately record page interactions in Page Previews

2018-04-12 Thread Leila Zia
Thank you, Tilman. This is very helpful.

Leila


On Thu, Feb 8, 2018 at 1:50 AM, Tilman Bayer  wrote:

> Hi Leila,
>
> On Wed, Jan 17, 2018 at 10:46 AM, Leila Zia  wrote:
>
>> Hi Sam,
>>
>> On Wed, Jan 17, 2018 at 1:51 AM, Sam Smith 
>> wrote:
>>
>> > IMO #1 is preferable from the operations and performance perspectives
>> as the
>> > response is always served from the edge and includes very few headers,
>> > whereas the request in #2 may be served by the application servers if
>> the
>> > user is logged in (or in the mobile site's beta cohort). However, the
>> > requests in #2 are already
>>
>> It seems the sentence above is cut, can you resend it?
>>
>> > We're currently considering recording page interactions when previews
>> are
>> > open for longer than 1000 ms. We estimate that this would increase
>> overall
>> > web requests by 0.3% [3].
>>
>> Can you say some words about how the 1000 ms threshold is chosen?
>
> This is a good question, sorry that it got buried earlier. (It's kind of
> orthogonal though to the technical instrumentation questions that have been
> the focus of attention: as indicated by the capital X in Sam's initial
> post, we can still decide to fine-tune that threshold right now, it's just
> a parameter change.)
>
> This kind of threshold necessarily needs to be set somewhat arbitrarily,
> in the sense that there will always be either cases where some content was
> already read/perceived in a preview card shown for a shorter time, or cases
> where a reader needed a longer time to consume any content from the card.
> We picked a time by which we can be reasonably certain that at least some
> readers can consume content (read some words, perceive an image). It's not
> the result of an exact calculation to find the provably best limit. But we
> did have look at the frequency of the different user actions over time
> during the first seconds after they start to hover over a link. In case
> you're interested, I recently updated those charts with better quality data
> from our latest two tests, e.g:
> https://phabricator.wikimedia.org/F12940888
> https://phabricator.wikimedia.org/F13134460 (a zoomed-in look at the same
> histogram)
>
> The following is just eyeballing and thinking aloud, but one way to view
> this histogram is as the sum of several distributions associated with
> different user intentions:
> 1. Most of the time when our instrumentation registered the cursor moving
> over a link, the user was just on their way to a different part of the
> screen (with no intention of either clicking that link or viewing the
> preview). That's mostly the huge yellow spike on the left -
> "dwelledButAbandoned" meaning that the cursor left the link without either
> clicking it or causing a preview to show. The feature involves a 500ms
> delay before the preview card begins to display, so that we don't bother
> that group too much. (Only the right tail end of that distribution, folks
> moving the cursor very slowly, will be affected, where things morph from
> yellow into purple.)
> 2. Then there are users who want to click the link without viewing the
> preview, forming all of the green part left of 500ms and an unknown portion
> to the right of it (after the card starts to show, some of these "open"
> actions will instead happen after the user intentionally viewed the card,
> case 3.).
> 3. And there are users who intentionally view a preview. The little bump
> in the purple part ("dismissed" meaning that the preview was shown and then
> closed by moving the cursor away) at about 1100ms indicates that the
> distribution for that user group also peaks somewhere there, maybe a few
> 100ms to the right. That would mean that our 1000ms threshold (i.e. only
> counting the part of the histogram right of 1500ms = 500ms + 1000ms as seen
> previews) is actually right of that distribution's peak. I.e. that the
> threshold is in some sense quite conservative.
>
> Like I said, this is all of course still a bit handwavy; it involves some
> assumptions about the form of these distributions, as well as disregarding
> some other information for now that can give a fuller picture (in
> particular the analogous histogram for link interaction behavior without
> page previews being active, which we also have from our A/B tests).
>
>
>> Is
>> this based (partially) on looking at traces where a user-agent goes to
>> a page and returns to the "source" article?
>>
> We did an analysis of that user behavior, but not regarding the timing
> question; rather, it was about finding out how much of the reduction in
> pageviews comes from reduced usage of the back button. I'm not sure how
> directly we can compare the action of loading an entire new page and then
> going back (two clicks that also involve moving the mouse cursor to an
> entirely different part of the screen - the back button - inbetween) with
> the action of hovering over a link and then 

[Analytics] How best to accurately record page interactions in Page Previews

2018-02-08 Thread Tilman Bayer
Hi Leila,

On Wed, Jan 17, 2018 at 10:46 AM, Leila Zia  wrote:

> Hi Sam,
>
> On Wed, Jan 17, 2018 at 1:51 AM, Sam Smith  wrote:
>
> > IMO #1 is preferable from the operations and performance perspectives as
> the
> > response is always served from the edge and includes very few headers,
> > whereas the request in #2 may be served by the application servers if the
> > user is logged in (or in the mobile site's beta cohort). However, the
> > requests in #2 are already
>
> It seems the sentence above is cut, can you resend it?
>
> > We're currently considering recording page interactions when previews are
> > open for longer than 1000 ms. We estimate that this would increase
> overall
> > web requests by 0.3% [3].
>
> Can you say some words about how the 1000 ms threshold is chosen?

This is a good question, sorry that it got buried earlier. (It's kind of
orthogonal though to the technical instrumentation questions that have been
the focus of attention: as indicated by the capital X in Sam's initial
post, we can still decide to fine-tune that threshold right now, it's just
a parameter change.)

This kind of threshold necessarily needs to be set somewhat arbitrarily, in
the sense that there will always be either cases where some content was
already read/perceived in a preview card shown for a shorter time, or cases
where a reader needed a longer time to consume any content from the card.
We picked a time by which we can be reasonably certain that at least some
readers can consume content (read some words, perceive an image). It's not
the result of an exact calculation to find the provably best limit. But we
did have look at the frequency of the different user actions over time
during the first seconds after they start to hover over a link. In case
you're interested, I recently updated those charts with better quality data
from our latest two tests, e.g:
https://phabricator.wikimedia.org/F12940888
https://phabricator.wikimedia.org/F13134460 (a zoomed-in look at the same
histogram)

The following is just eyeballing and thinking aloud, but one way to view
this histogram is as the sum of several distributions associated with
different user intentions:
1. Most of the time when our instrumentation registered the cursor moving
over a link, the user was just on their way to a different part of the
screen (with no intention of either clicking that link or viewing the
preview). That's mostly the huge yellow spike on the left -
"dwelledButAbandoned" meaning that the cursor left the link without either
clicking it or causing a preview to show. The feature involves a 500ms
delay before the preview card begins to display, so that we don't bother
that group too much. (Only the right tail end of that distribution, folks
moving the cursor very slowly, will be affected, where things morph from
yellow into purple.)
2. Then there are users who want to click the link without viewing the
preview, forming all of the green part left of 500ms and an unknown portion
to the right of it (after the card starts to show, some of these "open"
actions will instead happen after the user intentionally viewed the card,
case 3.).
3. And there are users who intentionally view a preview. The little bump in
the purple part ("dismissed" meaning that the preview was shown and then
closed by moving the cursor away) at about 1100ms indicates that the
distribution for that user group also peaks somewhere there, maybe a few
100ms to the right. That would mean that our 1000ms threshold (i.e. only
counting the part of the histogram right of 1500ms = 500ms + 1000ms as seen
previews) is actually right of that distribution's peak. I.e. that the
threshold is in some sense quite conservative.

Like I said, this is all of course still a bit handwavy; it involves some
assumptions about the form of these distributions, as well as disregarding
some other information for now that can give a fuller picture (in
particular the analogous histogram for link interaction behavior without
page previews being active, which we also have from our A/B tests).


> Is
> this based (partially) on looking at traces where a user-agent goes to
> a page and returns to the "source" article?
>
We did an analysis of that user behavior, but not regarding the timing
question; rather, it was about finding out how much of the reduction in
pageviews comes from reduced usage of the back button. I'm not sure how
directly we can compare the action of loading an entire new page and then
going back (two clicks that also involve moving the mouse cursor to an
entirely different part of the screen - the back button - inbetween) with
the action of hovering over a link and then moving the cursor away for a
small distance to close the preview; it seems to me that the latter
involves much less friction - which is kind of the whole point of the
previews feature ;)

As indicated, we already picked a value for the threshold that we are quite
comfortable with. 

Re: [Analytics] How best to accurately record page interactions in Page Previews

2018-01-18 Thread Tilman Bayer
On Wed, Jan 17, 2018 at 10:54 AM, Nuria Ruiz  wrote:
> (Moving ops list to bcc)
>
>>Are there other ways of recording this information? We're fairly confident
>> that #1 seems like the best choice here but it's referred to as the "virtual
>> file view hack". Is this really the case?
> Yes, there are, please use eventlogging.
>
> Recording "preview_events" is really no different that recording any other
> kind of UI event, difference is going to come from scale if anything, as
> they are probably tens of thousands of those per second (I think your team
> already estimated volume, if so please send those estimates along)
Actually Sam's email already included such a volume estimate (see [3]
there for more detail, or
https://phabricator.wikimedia.org/T182314#3901330 ).
Rather than "tens of thousands", the current estimate is 700-800 per second.



>
> We discourage you from sending events directly to beacon. Rather, use the EL
> client to send a page-preview event defined in a given schema. This is a
> similar approach as to how we will be measuring banner impressions for
> fundraising banners in the future.
>
> Thanks,
>
> Nuria
>
>
>
> On Wed, Jan 17, 2018 at 1:51 AM, Sam Smith  wrote:
>>
>> Hullo,
>>
>> Page Previews is now fully deployed to all but 2 of the Wikipedias. In
>> deploying it, we've created a new way to interact with pages without
>> navigating to them. This impacts the overall and per-page pageviews metrics
>> that are used in myriad reports, e.g. to editors about the readership of
>> their articles and in monthly reports to the board. Consequently, we need to
>> be able to report a user reading the preview of a page just like we do them
>> navigating to it.
>>
>> Readers Web are planning to instrument Page Previews such that when a
>> preview is available and open for longer than X ms, a "page interaction" is
>> recorded. We're aware of a couple of mechanisms for recording something like
>> this from the client:
>>
>> All files viewed with the media viewer are recorded by the client
>> requesting the /beacon/media?duration=X=Y URL at some point [0] – as
>> Nuria points out in that thread, requests to /beacon/... are already
>> filtered and a canned response is sent immediately by Varnish [1].
>> Requesting a URL with the X-Analytics header [2] set to "preview". In this
>> context, we'd make a HEAD request to the URL of the page with the header
>> set.
>>
>> IMO #1 is preferable from the operations and performance perspectives as
>> the response is always served from the edge and includes very few headers,
>> whereas the request in #2 may be served by the application servers if the
>> user is logged in (or in the mobile site's beta cohort). However, the
>> requests in #2 are already
>>
>> We're currently considering recording page interactions when previews are
>> open for longer than 1000 ms. We estimate that this would increase overall
>> web requests by 0.3% [3].
>>
>> Are there other ways of recording this information? We're fairly confident
>> that #1 seems like the best choice here but it's referred to as the "virtual
>> file view hack". Is this really the case? Moreover, should we request a
>> distinct URL, e.g. /beacon/preview?duration=X=Y, or should we
>> consolidate the URLs as both represent the same thing essentially?
>>
>> Thanks,
>>
>> -Sam
>>
>> Timezone: GMT
>> IRC (Freenode): phuedx
>>
>> [0] https://lists.wikimedia.org/pipermail/analytics/2015-March/003633.html
>> [1]
>> https://phabricator.wikimedia.org/source/operations-puppet/browse/production/modules/varnish/templates/vcl/wikimedia-frontend.vcl.erb;1bce79d58e03bd02888beef986c41989e8345037$269
>> [2] https://wikitech.wikimedia.org/wiki/X-Analytics
>> [3] https://phabricator.wikimedia.org/T184793#3901365
>>
>> ___
>> Analytics mailing list
>> Analytics@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>



-- 
Tilman Bayer
Senior Analyst
Wikimedia Foundation
IRC (Freenode): HaeB

___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] How best to accurately record page interactions in Page Previews

2018-01-17 Thread Nuria Ruiz
(Moving ops list to bcc)

>Are there other ways of recording this information? We're fairly confident
that #1 seems like the best choice here but it's referred to as the
"virtual file view hack". Is this really the case?
Yes, there are, please use eventlogging.

Recording "preview_events" is really no different that recording any other
kind of UI event, difference is going to come from scale if anything, as
they are probably tens of thousands of those per second (I think your team
already estimated volume, if so please send those estimates along)

We discourage you from sending events directly to beacon. Rather, use the
EL client to send a page-preview event defined in a given schema. This is a
similar approach as to how we will be measuring banner impressions for
fundraising banners in the future.

Thanks,

Nuria



On Wed, Jan 17, 2018 at 1:51 AM, Sam Smith  wrote:

> Hullo,
>
> Page Previews is now fully deployed to all but 2 of the Wikipedias. In
> deploying it, we've created a new way to interact with pages without
> navigating to them. This impacts the overall and per-page pageviews metrics
> that are used in myriad reports, e.g. to editors about the readership of
> their articles and in monthly reports to the board. Consequently, we need
> to be able to report a user reading the preview of a page just like we do
> them navigating to it.
>
> Readers Web are planning to instrument Page Previews such that when a
> preview is available and open for longer than X ms, a "page interaction" is
> recorded. We're aware of a couple of mechanisms for recording something
> like this from the client:
>
>1. All files viewed with the media viewer are recorded by the client
>requesting the /beacon/media?duration=X=Y URL at some point [0] – as
>Nuria points out in that thread, requests to /beacon/... are already
>filtered and a canned response is sent immediately by Varnish [1].
>2. Requesting a URL with the X-Analytics header [2] set to "preview".
>In this context, we'd make a HEAD request to the URL of the page with the
>header set.
>
> IMO #1 is preferable from the operations and performance perspectives as
> the response is always served from the edge and includes very few headers,
> whereas the request in #2 may be served by the application servers if the
> user is logged in (or in the mobile site's beta cohort). However, the
> requests in #2 are already
>
> We're currently considering recording page interactions when previews are
> open for longer than 1000 ms. We estimate that this would increase overall
> web requests by 0.3% [3].
>
> Are there other ways of recording this information? We're fairly confident
> that #1 seems like the best choice here but it's referred to as the
> "virtual file view hack". Is this really the case? Moreover, should we
> request a distinct URL, e.g. /beacon/preview?duration=X=Y, or should
> we consolidate the URLs as both represent the same thing essentially?
>
> Thanks,
>
> -Sam
>
> Timezone: GMT
> IRC (Freenode): phuedx
>
> [0] https://lists.wikimedia.org/pipermail/analytics/2015-March/003633.html
> [1] 
> *https://phabricator.wikimedia.org/source/operations-puppet/browse/production/modules/varnish/templates/vcl/wikimedia-frontend.vcl.erb;1bce79d58e03bd02888beef986c41989e8345037$269
> *
> [2] https://wikitech.wikimedia.org/wiki/X-Analytics
> [3] https://phabricator.wikimedia.org/T184793#3901365
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] How best to accurately record page interactions in Page Previews

2018-01-17 Thread Leila Zia
Hi Sam,

On Wed, Jan 17, 2018 at 1:51 AM, Sam Smith  wrote:

> IMO #1 is preferable from the operations and performance perspectives as the
> response is always served from the edge and includes very few headers,
> whereas the request in #2 may be served by the application servers if the
> user is logged in (or in the mobile site's beta cohort). However, the
> requests in #2 are already

It seems the sentence above is cut, can you resend it?

> We're currently considering recording page interactions when previews are
> open for longer than 1000 ms. We estimate that this would increase overall
> web requests by 0.3% [3].

Can you say some words about how the 1000 ms threshold is chosen? Is
this based (partially) on looking at traces where a user-agent goes to
a page and returns to the "source" article?

Thanks,
Leila

>
> [0] https://lists.wikimedia.org/pipermail/analytics/2015-March/003633.html
> [1]
> https://phabricator.wikimedia.org/source/operations-puppet/browse/production/modules/varnish/templates/vcl/wikimedia-frontend.vcl.erb;1bce79d58e03bd02888beef986c41989e8345037$269
> [2] https://wikitech.wikimedia.org/wiki/X-Analytics
> [3] https://phabricator.wikimedia.org/T184793#3901365
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>

___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics