Hi,

If you look at https://phabricator.wikimedia.org/T104942#1436332 (linked
from this thread, before Adam posted his own data) an analysis was done
on a file called "per-domain-count" which we previously extracted from
sampled 1:1000 logs for approximately 25 days for all kinds of
domain-popularity purposes and cleanups that we've been doing as part of
the HTTPS project (more background at
https://phabricator.wikimedia.org/T102827#1429852 and also see T102826,
T102814, T102815).

Those logs above are sampled and aren't as accurate as the Hadoop data
Adam used due to other infrastructure faults that have happened in that
25-day period but they are generally okay for extracting those broad
conclusions, especially if we look at the relative popularity of e.g.
.wap. vs. .m. rather than the absolute numbers.

Finally, note that in any case there is a hard limitation of a
look-behind window of 90 days due to our data retention policy, as well
as practical considerations for extracting results from unsampled logs
for larger periods of time. You're absolutely right, though, that a
1-day sample is usually not enough, especially considering the
seasonality of data like e.g. a very different mobile-to-desktop ratio
on weekends.

Faidon

On Thu, Jul 16, 2015 at 09:55:14AM -0400, John wrote:
> Can we look at a wider sample? using a single day as judgement factor is a
> bad idea. However if the data supports your position I dont see any serious
> problems. You might want to take a look at either the UA's or refering
> sources to see if there is a primary source for the traffic and mitigate
> that.
> 
> On Thu, Jul 16, 2015 at 9:03 AM, Adam Baso <[email protected]> wrote:
> 
> > Looks like the user pageviews for wap.wikipedia.org and
> > mobile.wikipedia.org
> > subdomains are approximately 0.02% of the size of pageviews for
> > m.wikipedia.org subdomains based on a recent one day check.
> >
> > hive> select count(*) from
> > wmf.webrequest where
> > year = 2015 and month = 7 and day = 14
> > and access_method = 'mobile web'
> > and (uri_host like '%.wap.wikipedia.org' OR uri_host like '%.
> > mobile.wikipedia.org')
> > and is_pageview = true and agent_type = 'user';
> >
> > 35,543
> >
> > hive> select count(*) from
> > wmf.webrequest where
> > year = 2015 and month = 7 and day = 14
> > and access_method = 'mobile web'
> > and uri_host like '%.m.wikipedia.org'
> > and is_pageview = true and agent_type = 'user';
> >
> > 202,024,891
> >
> >
> > On Thu, Jul 16, 2015 at 5:41 AM, John <[email protected]> wrote:
> >
> > > ... Have we done any analysis on usage of those subdomains?
> > >
> > > On Thu, Jul 16, 2015 at 8:34 AM, Adam Baso <[email protected]> wrote:
> > >
> > > > There's a ticket for removing mobile.wikipedia.org and
> > wap.wikipedia.org
> > > > domains/subdomains, which are legacy domain names superceded by
> > > > m.wikipedia.org and its subdomains.
> > > >
> > > > https://phabricator.wikimedia.org/T104942
> > > >
> > > > The rationale for the removal of these legacy domain names is to help
> > > > support HSTS preloading in browsers with the existing TLS SAN cert.
> > > >
> > > > After review of the ticket, can anyone think of a compelling reason to
> > > keep
> > > > those old domain names?
> > > >
> > > > I'm going to open a separate thread on mobile-l about this given this
> > is
> > > > more mobile-targeted, yet some people only operate on one of wikitech-l
> > > or
> > > > mobile-l.
> > > >
> > > > -Adam
> > > > _______________________________________________
> > > > Wikitech-l mailing list
> > > > [email protected]
> > > > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> > > _______________________________________________
> > > Wikitech-l mailing list
> > > [email protected]
> > > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> > _______________________________________________
> > Wikitech-l mailing list
> > [email protected]
> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> >
> _______________________________________________
> Wikitech-l mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l

_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to