Re: [Analytics] High number of pageviews on page with single hyphen as title

2016-12-06 Thread Nuria Ruiz
>Also, just out of curiosity and to better understand the issue, what
>would be an example of a real life request URL that results in such a
>"no page title found" error when extracting the title?
Special page requests, for example.

Normally pages like "Special:Blah" are "actions" not pages themselves. We
do not count those as pageviews with the notably exception of Search
requests (as they do provide content). So a page like "Special:Search:
Blah-Blah" would be an example of a pageview with title "-" on
pageview_hourly table.



On Mon, Dec 5, 2016 at 3:15 PM, Tilman Bayer  wrote:

> On Mon, Nov 14, 2016 at 12:25 PM, Nuria Ruiz  wrote:
> > This is documented now here:
> >
> > https://wikitech.wikimedia.org/wiki/Analytics/PageviewAPI#Gotchas
> Thanks for the documentation. Does this only affect data provided by
> the API, or also the page_title
> field in the pageview_hourly table, i.e. the source of the API data?
>
> In the latter case, please also add a note to the "known problems" at
> https://wikitech.wikimedia.org/wiki/Analytics/Data/Pageview_hourly .
> (This is the canonical place for documenting such issues - thanks for
> making this explicit at
> https://wikitech.wikimedia.org/wiki/Analytics/PageviewAPI#Issues_with_data
> .
> Separately, for pageview definition changes there is also
> https://meta.wikimedia.org/wiki/Research:Page_view#Change_log . No
> objections of course if the Analytics team commits to keeping the
> information up to date in all three places.)
>
> Also, just out of curiosity and to better understand the issue, what
> would be an example of a real life request URL that results in such a
> "no page title found" error when extracting the title?
> >
> > On Tue, Nov 8, 2016 at 7:25 AM, Vipul Naik  wrote:
> >>
> >> Hi Joseph,
> >>
> >> Thanks for the clarification.
> >>
> >> Any ideas why this number is much higher for some months? In particular,
> >> on desktop, it's high in the months of July to September 2015 (around 10
> >> million, compared to the usual 5 million) and then high again in October
> >> 2016 (45 million, about 10x the usual value).
> For context , https://en.wikipedia.org/wiki/- was the 8th most viewed
> page on all projects from May to October 2015, see footnote [1] at
> https://phabricator.wikimedia.org/T117945 (that bug, flagged as "High"
> Analytics priority since almost a year, is about a separate but
> similar issue)
>
> >>
> >> Data is from
> >> http://wikipediaviews.org/displayviewsformultiplemonths.
> php?page=-=allmonths=all
> >> which summarizes results from the Wikimedia API (and stats.grok.se for
> data
> >> before July 2015).
> >>
> >> Vipul
> >>
> >> On Tue, Nov 8, 2016 at 3:46 AM, Joseph Allemandou
> >>  wrote:
> >>>
> >>> Hello Issa,
> >>>
> >>> Thank you for your question.
> >>> The very high number of views of the "-" page is explained by this dash
> >>> value being used as a special value for "no page title found" when
> >>> extracting titles from urls.
> >>> We definitely should document this in the API, creating this task:
> >>> https://phabricator.wikimedia.org/T150249
> >>> Best
> >>> Joseph
> >>>
> >>>
> >>> On Tue, Nov 8, 2016 at 12:28 AM, Issa Rice  wrote:
> 
>  Dear Analytics Mailing List,
> 
>  Recently while querying pageviews of various pages, I discovered that
>  the page whose title is a single hyphen character (i.e. with the title
>  "-", with URL , which redirects to
>  ) receives an unusually
> high
>  number of pageviews under the Pageview API. Taking October 2015 as an
>  example, the page received 5.4 million pageviews during that month
>  according to the API:
> 
>   article/en.wikipedia/desktop/user/-/daily/20151001/20151031>.
> 
>  However, according the stats.grok.se (which was still operational in
> the
>  same month), the page received only 1209 pageviews:
>  .
> 
>  Looking at the tabulation of pageviews on Wikipedia Views, the
> increase
>  in pageviews for this page coincides with the change to the Pageview
>  API in July 2015:
> 
>   php?page=-=allmonths=all>.
> 
>  As I understand, page titles must be URL-encoded before the query,
>  but the URL-encoding of "-" is itself.
> 
>  I looked at the API documentation but did not see this behavior
> listed,
>  so I am wondering where these numbers are coming from.
> 
>  Best regards,
>  Issa
> 
> 
>  ___
>  Analytics mailing list
>  Analytics@lists.wikimedia.org
>  https://lists.wikimedia.org/mailman/listinfo/analytics
> 
> >>>
> >>>
> >>>
> >>> 

Re: [Analytics] High number of pageviews on page with single hyphen as title

2016-12-05 Thread Tilman Bayer
On Mon, Nov 14, 2016 at 12:25 PM, Nuria Ruiz  wrote:
> This is documented now here:
>
> https://wikitech.wikimedia.org/wiki/Analytics/PageviewAPI#Gotchas
Thanks for the documentation. Does this only affect data provided by
the API, or also the page_title
field in the pageview_hourly table, i.e. the source of the API data?

In the latter case, please also add a note to the "known problems" at
https://wikitech.wikimedia.org/wiki/Analytics/Data/Pageview_hourly .
(This is the canonical place for documenting such issues - thanks for
making this explicit at
https://wikitech.wikimedia.org/wiki/Analytics/PageviewAPI#Issues_with_data .
Separately, for pageview definition changes there is also
https://meta.wikimedia.org/wiki/Research:Page_view#Change_log . No
objections of course if the Analytics team commits to keeping the
information up to date in all three places.)

Also, just out of curiosity and to better understand the issue, what
would be an example of a real life request URL that results in such a
"no page title found" error when extracting the title?
>
> On Tue, Nov 8, 2016 at 7:25 AM, Vipul Naik  wrote:
>>
>> Hi Joseph,
>>
>> Thanks for the clarification.
>>
>> Any ideas why this number is much higher for some months? In particular,
>> on desktop, it's high in the months of July to September 2015 (around 10
>> million, compared to the usual 5 million) and then high again in October
>> 2016 (45 million, about 10x the usual value).
For context , https://en.wikipedia.org/wiki/- was the 8th most viewed
page on all projects from May to October 2015, see footnote [1] at
https://phabricator.wikimedia.org/T117945 (that bug, flagged as "High"
Analytics priority since almost a year, is about a separate but
similar issue)

>>
>> Data is from
>> http://wikipediaviews.org/displayviewsformultiplemonths.php?page=-=allmonths=all
>> which summarizes results from the Wikimedia API (and stats.grok.se for data
>> before July 2015).
>>
>> Vipul
>>
>> On Tue, Nov 8, 2016 at 3:46 AM, Joseph Allemandou
>>  wrote:
>>>
>>> Hello Issa,
>>>
>>> Thank you for your question.
>>> The very high number of views of the "-" page is explained by this dash
>>> value being used as a special value for "no page title found" when
>>> extracting titles from urls.
>>> We definitely should document this in the API, creating this task:
>>> https://phabricator.wikimedia.org/T150249
>>> Best
>>> Joseph
>>>
>>>
>>> On Tue, Nov 8, 2016 at 12:28 AM, Issa Rice  wrote:

 Dear Analytics Mailing List,

 Recently while querying pageviews of various pages, I discovered that
 the page whose title is a single hyphen character (i.e. with the title
 "-", with URL , which redirects to
 ) receives an unusually high
 number of pageviews under the Pageview API. Taking October 2015 as an
 example, the page received 5.4 million pageviews during that month
 according to the API:

 .

 However, according the stats.grok.se (which was still operational in the
 same month), the page received only 1209 pageviews:
 .

 Looking at the tabulation of pageviews on Wikipedia Views, the increase
 in pageviews for this page coincides with the change to the Pageview
 API in July 2015:

 .

 As I understand, page titles must be URL-encoded before the query,
 but the URL-encoding of "-" is itself.

 I looked at the API documentation but did not see this behavior listed,
 so I am wondering where these numbers are coming from.

 Best regards,
 Issa


 ___
 Analytics mailing list
 Analytics@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/analytics

>>>
>>>
>>>
>>> --
>>> Joseph Allemandou
>>> Data Engineer @ Wikimedia Foundation
>>> IRC: joal
>>>
>>> ___
>>> Analytics mailing list
>>> Analytics@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>
>>
>>
>> ___
>> Analytics mailing list
>> Analytics@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>



-- 
Tilman Bayer
Senior Analyst
Wikimedia Foundation
IRC (Freenode): HaeB

___
Analytics mailing list
Analytics@lists.wikimedia.org

Re: [Analytics] High number of pageviews on page with single hyphen as title

2016-11-17 Thread Vipul Naik
Correction: The number for 404.php shot up on September 13:
https://wikimedia.org/api/rest_v1/metrics/pageviews/per-article/en.wikipedia/desktop/user/404.php/daily/20160901/20160930?purge756777637

On Thu, Nov 17, 2016 at 4:51 PM, Vipul Naik  wrote:

> Thanks for opening the ticket and for clarifying the issue more.
>
> On a related note, I wonder if you could add the documentation for the
> unusual amount of pageviews to 404.php as returned by the API. That number
> also shot up in October 2016; see http://wikipediaviews.org/
> displayviewsformultiplemonths.php?page=404.php=
> allmonths=all for the historical trend.
>
> Vipul
>
> On Thu, Nov 17, 2016 at 1:26 PM, Nuria Ruiz  wrote:
>
>> >Just to verify what you are saying, would it be right to say that the
>> bug fix caused
>> >a a lot of pageviews to be moved from the respective (nonexistent) pages
>> to "-" pageviews?
>>
>> No, the bugfix makes those faulty requests to no longer be stored as
>> pageviews thus it cannot make that number increase.  I am not sure we can
>> link the surge of "-" pageviews in October to any determined cause without
>> further research. Have filed ticket to that extent, hopefully we can get to
>> it before we do away with raw data: https://phabricator.wiki
>> media.org/T150990
>>
>> >And, does that means that the current estimate of "-" pageviews is more
>> accurate than it used to be prior to the bug fix?
>> No, it doesn't.
>>
>>
>>
>>
>> On Thu, Nov 17, 2016 at 1:17 PM, Vipul Naik  wrote:
>>
>>> Thank you for linking to that bug, Marcel. Just to verify what you are
>>> saying, would it be right to say that the bug fix caused a a lot of
>>> pageviews to be moved from the respective (nonexistent) pages to "-"
>>> pageviews? And, does that means that the current estimate of "-" pageviews
>>> is more accurate than it used to be prior to the bug fix?
>>>
>>> Vipul
>>>
>>> On Wed, Nov 16, 2016 at 4:33 AM, Marcel Ruiz Forns >> > wrote:
>>>
 Maybe the high value in October (45M) has something to do with the last
 changes in https://phabricator.wikimedia.org/T145922 ?

 On Mon, Nov 14, 2016 at 9:25 PM, Nuria Ruiz 
 wrote:

> This is documented now here:
>
> https://wikitech.wikimedia.org/wiki/Analytics/PageviewAPI#Gotchas
>
> On Tue, Nov 8, 2016 at 7:25 AM, Vipul Naik 
> wrote:
>
>> Hi Joseph,
>>
>> Thanks for the clarification.
>>
>> Any ideas why this number is much higher for some months? In
>> particular, on desktop, it's high in the months of July to September 2015
>> (around 10 million, compared to the usual 5 million) and then high again 
>> in
>> October 2016 (45 million, about 10x the usual value).
>>
>> Data is from http://wikipediaviews.org/displayviewsformultiplemonths
>> .php?page=-=allmonths=all which summarizes
>> results from the Wikimedia API (and stats.grok.se for data before
>> July 2015).
>>
>> Vipul
>>
>> On Tue, Nov 8, 2016 at 3:46 AM, Joseph Allemandou <
>> jalleman...@wikimedia.org> wrote:
>>
>>> Hello Issa,
>>>
>>> Thank you for your question.
>>> The very high number of views of the "-" page is explained by this
>>> dash value being used as a special value for "no page title found" when
>>> extracting titles from urls.
>>> We definitely should document this in the API, creating this task:
>>> https://phabricator.wikimedia.org/T150249
>>> Best
>>> Joseph
>>>
>>>
>>> On Tue, Nov 8, 2016 at 12:28 AM, Issa Rice 
>>> wrote:
>>>
 Dear Analytics Mailing List,

 Recently while querying pageviews of various pages, I discovered
 that
 the page whose title is a single hyphen character (i.e. with the
 title
 "-", with URL , which redirects to
 ) receives an
 unusually high
 number of pageviews under the Pageview API. Taking October 2015 as
 an
 example, the page received 5.4 million pageviews during that month
 according to the API:
 .

 However, according the stats.grok.se (which was still operational
 in the
 same month), the page received only 1209 pageviews:
 .

 Looking at the tabulation of pageviews on Wikipedia Views, the
 increase
 in pageviews for this page coincides with the change to the Pageview
 API in July 2015:
 .

 As I understand, 

Re: [Analytics] High number of pageviews on page with single hyphen as title

2016-11-17 Thread Vipul Naik
Thanks for opening the ticket and for clarifying the issue more.

On a related note, I wonder if you could add the documentation for the
unusual amount of pageviews to 404.php as returned by the API. That number
also shot up in October 2016; see
http://wikipediaviews.org/displayviewsformultiplemonths.php?page=404.php=allmonths=all
for
the historical trend.

Vipul

On Thu, Nov 17, 2016 at 1:26 PM, Nuria Ruiz  wrote:

> >Just to verify what you are saying, would it be right to say that the
> bug fix caused
> >a a lot of pageviews to be moved from the respective (nonexistent) pages
> to "-" pageviews?
>
> No, the bugfix makes those faulty requests to no longer be stored as
> pageviews thus it cannot make that number increase.  I am not sure we can
> link the surge of "-" pageviews in October to any determined cause without
> further research. Have filed ticket to that extent, hopefully we can get to
> it before we do away with raw data: https://phabricator.
> wikimedia.org/T150990
>
> >And, does that means that the current estimate of "-" pageviews is more
> accurate than it used to be prior to the bug fix?
> No, it doesn't.
>
>
>
>
> On Thu, Nov 17, 2016 at 1:17 PM, Vipul Naik  wrote:
>
>> Thank you for linking to that bug, Marcel. Just to verify what you are
>> saying, would it be right to say that the bug fix caused a a lot of
>> pageviews to be moved from the respective (nonexistent) pages to "-"
>> pageviews? And, does that means that the current estimate of "-" pageviews
>> is more accurate than it used to be prior to the bug fix?
>>
>> Vipul
>>
>> On Wed, Nov 16, 2016 at 4:33 AM, Marcel Ruiz Forns 
>> wrote:
>>
>>> Maybe the high value in October (45M) has something to do with the last
>>> changes in https://phabricator.wikimedia.org/T145922 ?
>>>
>>> On Mon, Nov 14, 2016 at 9:25 PM, Nuria Ruiz  wrote:
>>>
 This is documented now here:

 https://wikitech.wikimedia.org/wiki/Analytics/PageviewAPI#Gotchas

 On Tue, Nov 8, 2016 at 7:25 AM, Vipul Naik 
 wrote:

> Hi Joseph,
>
> Thanks for the clarification.
>
> Any ideas why this number is much higher for some months? In
> particular, on desktop, it's high in the months of July to September 2015
> (around 10 million, compared to the usual 5 million) and then high again 
> in
> October 2016 (45 million, about 10x the usual value).
>
> Data is from http://wikipediaviews.org/displayviewsformultiplemonths
> .php?page=-=allmonths=all which summarizes
> results from the Wikimedia API (and stats.grok.se for data before
> July 2015).
>
> Vipul
>
> On Tue, Nov 8, 2016 at 3:46 AM, Joseph Allemandou <
> jalleman...@wikimedia.org> wrote:
>
>> Hello Issa,
>>
>> Thank you for your question.
>> The very high number of views of the "-" page is explained by this
>> dash value being used as a special value for "no page title found" when
>> extracting titles from urls.
>> We definitely should document this in the API, creating this task:
>> https://phabricator.wikimedia.org/T150249
>> Best
>> Joseph
>>
>>
>> On Tue, Nov 8, 2016 at 12:28 AM, Issa Rice 
>> wrote:
>>
>>> Dear Analytics Mailing List,
>>>
>>> Recently while querying pageviews of various pages, I discovered that
>>> the page whose title is a single hyphen character (i.e. with the
>>> title
>>> "-", with URL , which redirects to
>>> ) receives an unusually
>>> high
>>> number of pageviews under the Pageview API. Taking October 2015 as an
>>> example, the page received 5.4 million pageviews during that month
>>> according to the API:
>>> >> icle/en.wikipedia/desktop/user/-/daily/20151001/20151031>.
>>>
>>> However, according the stats.grok.se (which was still operational
>>> in the
>>> same month), the page received only 1209 pageviews:
>>> .
>>>
>>> Looking at the tabulation of pageviews on Wikipedia Views, the
>>> increase
>>> in pageviews for this page coincides with the change to the Pageview
>>> API in July 2015:
>>> >> ?page=-=allmonths=all>.
>>>
>>> As I understand, page titles must be URL-encoded before the query,
>>> but the URL-encoding of "-" is itself.
>>>
>>> I looked at the API documentation but did not see this behavior
>>> listed,
>>> so I am wondering where these numbers are coming from.
>>>
>>> Best regards,
>>> Issa
>>>
>>>
>>> ___
>>> Analytics mailing list
>>> 

Re: [Analytics] High number of pageviews on page with single hyphen as title

2016-11-17 Thread Nuria Ruiz
>Just to verify what you are saying, would it be right to say that the bug
fix caused
>a a lot of pageviews to be moved from the respective (nonexistent) pages
to "-" pageviews?

No, the bugfix makes those faulty requests to no longer be stored as
pageviews thus it cannot make that number increase.  I am not sure we can
link the surge of "-" pageviews in October to any determined cause without
further research. Have filed ticket to that extent, hopefully we can get to
it before we do away with raw data:
https://phabricator.wikimedia.org/T150990

>And, does that means that the current estimate of "-" pageviews is more
accurate than it used to be prior to the bug fix?
No, it doesn't.




On Thu, Nov 17, 2016 at 1:17 PM, Vipul Naik  wrote:

> Thank you for linking to that bug, Marcel. Just to verify what you are
> saying, would it be right to say that the bug fix caused a a lot of
> pageviews to be moved from the respective (nonexistent) pages to "-"
> pageviews? And, does that means that the current estimate of "-" pageviews
> is more accurate than it used to be prior to the bug fix?
>
> Vipul
>
> On Wed, Nov 16, 2016 at 4:33 AM, Marcel Ruiz Forns 
> wrote:
>
>> Maybe the high value in October (45M) has something to do with the last
>> changes in https://phabricator.wikimedia.org/T145922 ?
>>
>> On Mon, Nov 14, 2016 at 9:25 PM, Nuria Ruiz  wrote:
>>
>>> This is documented now here:
>>>
>>> https://wikitech.wikimedia.org/wiki/Analytics/PageviewAPI#Gotchas
>>>
>>> On Tue, Nov 8, 2016 at 7:25 AM, Vipul Naik  wrote:
>>>
 Hi Joseph,

 Thanks for the clarification.

 Any ideas why this number is much higher for some months? In
 particular, on desktop, it's high in the months of July to September 2015
 (around 10 million, compared to the usual 5 million) and then high again in
 October 2016 (45 million, about 10x the usual value).

 Data is from http://wikipediaviews.org/displayviewsformultiplemonths
 .php?page=-=allmonths=all which summarizes results
 from the Wikimedia API (and stats.grok.se for data before July 2015).

 Vipul

 On Tue, Nov 8, 2016 at 3:46 AM, Joseph Allemandou <
 jalleman...@wikimedia.org> wrote:

> Hello Issa,
>
> Thank you for your question.
> The very high number of views of the "-" page is explained by this
> dash value being used as a special value for "no page title found" when
> extracting titles from urls.
> We definitely should document this in the API, creating this task:
> https://phabricator.wikimedia.org/T150249
> Best
> Joseph
>
>
> On Tue, Nov 8, 2016 at 12:28 AM, Issa Rice  wrote:
>
>> Dear Analytics Mailing List,
>>
>> Recently while querying pageviews of various pages, I discovered that
>> the page whose title is a single hyphen character (i.e. with the title
>> "-", with URL , which redirects to
>> ) receives an unusually
>> high
>> number of pageviews under the Pageview API. Taking October 2015 as an
>> example, the page received 5.4 million pageviews during that month
>> according to the API:
>> > icle/en.wikipedia/desktop/user/-/daily/20151001/20151031>.
>>
>> However, according the stats.grok.se (which was still operational in
>> the
>> same month), the page received only 1209 pageviews:
>> .
>>
>> Looking at the tabulation of pageviews on Wikipedia Views, the
>> increase
>> in pageviews for this page coincides with the change to the Pageview
>> API in July 2015:
>> > ?page=-=allmonths=all>.
>>
>> As I understand, page titles must be URL-encoded before the query,
>> but the URL-encoding of "-" is itself.
>>
>> I looked at the API documentation but did not see this behavior
>> listed,
>> so I am wondering where these numbers are coming from.
>>
>> Best regards,
>> Issa
>>
>>
>> ___
>> Analytics mailing list
>> Analytics@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>>
>
>
> --
> *Joseph Allemandou*
> Data Engineer @ Wikimedia Foundation
> IRC: joal
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>

 ___
 Analytics mailing list
 Analytics@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/analytics


>>>
>>> 

Re: [Analytics] High number of pageviews on page with single hyphen as title

2016-11-17 Thread Vipul Naik
Thank you for linking to that bug, Marcel. Just to verify what you are
saying, would it be right to say that the bug fix caused a a lot of
pageviews to be moved from the respective (nonexistent) pages to "-"
pageviews? And, does that means that the current estimate of "-" pageviews
is more accurate than it used to be prior to the bug fix?

Vipul

On Wed, Nov 16, 2016 at 4:33 AM, Marcel Ruiz Forns 
wrote:

> Maybe the high value in October (45M) has something to do with the last
> changes in https://phabricator.wikimedia.org/T145922 ?
>
> On Mon, Nov 14, 2016 at 9:25 PM, Nuria Ruiz  wrote:
>
>> This is documented now here:
>>
>> https://wikitech.wikimedia.org/wiki/Analytics/PageviewAPI#Gotchas
>>
>> On Tue, Nov 8, 2016 at 7:25 AM, Vipul Naik  wrote:
>>
>>> Hi Joseph,
>>>
>>> Thanks for the clarification.
>>>
>>> Any ideas why this number is much higher for some months? In particular,
>>> on desktop, it's high in the months of July to September 2015 (around 10
>>> million, compared to the usual 5 million) and then high again in October
>>> 2016 (45 million, about 10x the usual value).
>>>
>>> Data is from http://wikipediaviews.org/displayviewsformultiplemonths
>>> .php?page=-=allmonths=all which summarizes results
>>> from the Wikimedia API (and stats.grok.se for data before July 2015).
>>>
>>> Vipul
>>>
>>> On Tue, Nov 8, 2016 at 3:46 AM, Joseph Allemandou <
>>> jalleman...@wikimedia.org> wrote:
>>>
 Hello Issa,

 Thank you for your question.
 The very high number of views of the "-" page is explained by this dash
 value being used as a special value for "no page title found" when
 extracting titles from urls.
 We definitely should document this in the API, creating this task:
 https://phabricator.wikimedia.org/T150249
 Best
 Joseph


 On Tue, Nov 8, 2016 at 12:28 AM, Issa Rice  wrote:

> Dear Analytics Mailing List,
>
> Recently while querying pageviews of various pages, I discovered that
> the page whose title is a single hyphen character (i.e. with the title
> "-", with URL , which redirects to
> ) receives an unusually
> high
> number of pageviews under the Pageview API. Taking October 2015 as an
> example, the page received 5.4 million pageviews during that month
> according to the API:
>  icle/en.wikipedia/desktop/user/-/daily/20151001/20151031>.
>
> However, according the stats.grok.se (which was still operational in
> the
> same month), the page received only 1209 pageviews:
> .
>
> Looking at the tabulation of pageviews on Wikipedia Views, the increase
> in pageviews for this page coincides with the change to the Pageview
> API in July 2015:
>  ?page=-=allmonths=all>.
>
> As I understand, page titles must be URL-encoded before the query,
> but the URL-encoding of "-" is itself.
>
> I looked at the API documentation but did not see this behavior listed,
> so I am wondering where these numbers are coming from.
>
> Best regards,
> Issa
>
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>


 --
 *Joseph Allemandou*
 Data Engineer @ Wikimedia Foundation
 IRC: joal

 ___
 Analytics mailing list
 Analytics@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/analytics


>>>
>>> ___
>>> Analytics mailing list
>>> Analytics@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>
>>>
>>
>> ___
>> Analytics mailing list
>> Analytics@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>>
>
>
> --
> *Marcel Ruiz Forns*
> Analytics Developer
> Wikimedia Foundation
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] High number of pageviews on page with single hyphen as title

2016-11-16 Thread Marcel Ruiz Forns
Maybe the high value in October (45M) has something to do with the last
changes in https://phabricator.wikimedia.org/T145922 ?

On Mon, Nov 14, 2016 at 9:25 PM, Nuria Ruiz  wrote:

> This is documented now here:
>
> https://wikitech.wikimedia.org/wiki/Analytics/PageviewAPI#Gotchas
>
> On Tue, Nov 8, 2016 at 7:25 AM, Vipul Naik  wrote:
>
>> Hi Joseph,
>>
>> Thanks for the clarification.
>>
>> Any ideas why this number is much higher for some months? In particular,
>> on desktop, it's high in the months of July to September 2015 (around 10
>> million, compared to the usual 5 million) and then high again in October
>> 2016 (45 million, about 10x the usual value).
>>
>> Data is from http://wikipediaviews.org/displayviewsformultiplemonths
>> .php?page=-=allmonths=all which summarizes results
>> from the Wikimedia API (and stats.grok.se for data before July 2015).
>>
>> Vipul
>>
>> On Tue, Nov 8, 2016 at 3:46 AM, Joseph Allemandou <
>> jalleman...@wikimedia.org> wrote:
>>
>>> Hello Issa,
>>>
>>> Thank you for your question.
>>> The very high number of views of the "-" page is explained by this dash
>>> value being used as a special value for "no page title found" when
>>> extracting titles from urls.
>>> We definitely should document this in the API, creating this task:
>>> https://phabricator.wikimedia.org/T150249
>>> Best
>>> Joseph
>>>
>>>
>>> On Tue, Nov 8, 2016 at 12:28 AM, Issa Rice  wrote:
>>>
 Dear Analytics Mailing List,

 Recently while querying pageviews of various pages, I discovered that
 the page whose title is a single hyphen character (i.e. with the title
 "-", with URL , which redirects to
 ) receives an unusually
 high
 number of pageviews under the Pageview API. Taking October 2015 as an
 example, the page received 5.4 million pageviews during that month
 according to the API:
 .

 However, according the stats.grok.se (which was still operational in
 the
 same month), the page received only 1209 pageviews:
 .

 Looking at the tabulation of pageviews on Wikipedia Views, the increase
 in pageviews for this page coincides with the change to the Pageview
 API in July 2015:
 .

 As I understand, page titles must be URL-encoded before the query,
 but the URL-encoding of "-" is itself.

 I looked at the API documentation but did not see this behavior listed,
 so I am wondering where these numbers are coming from.

 Best regards,
 Issa


 ___
 Analytics mailing list
 Analytics@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/analytics


>>>
>>>
>>> --
>>> *Joseph Allemandou*
>>> Data Engineer @ Wikimedia Foundation
>>> IRC: joal
>>>
>>> ___
>>> Analytics mailing list
>>> Analytics@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>
>>>
>>
>> ___
>> Analytics mailing list
>> Analytics@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>>
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>


-- 
*Marcel Ruiz Forns*
Analytics Developer
Wikimedia Foundation
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] High number of pageviews on page with single hyphen as title

2016-11-14 Thread Nuria Ruiz
This is documented now here:

https://wikitech.wikimedia.org/wiki/Analytics/PageviewAPI#Gotchas

On Tue, Nov 8, 2016 at 7:25 AM, Vipul Naik  wrote:

> Hi Joseph,
>
> Thanks for the clarification.
>
> Any ideas why this number is much higher for some months? In particular,
> on desktop, it's high in the months of July to September 2015 (around 10
> million, compared to the usual 5 million) and then high again in October
> 2016 (45 million, about 10x the usual value).
>
> Data is from http://wikipediaviews.org/displayviewsformultiplemonths.
> php?page=-=allmonths=all which summarizes results
> from the Wikimedia API (and stats.grok.se for data before July 2015).
>
> Vipul
>
> On Tue, Nov 8, 2016 at 3:46 AM, Joseph Allemandou <
> jalleman...@wikimedia.org> wrote:
>
>> Hello Issa,
>>
>> Thank you for your question.
>> The very high number of views of the "-" page is explained by this dash
>> value being used as a special value for "no page title found" when
>> extracting titles from urls.
>> We definitely should document this in the API, creating this task:
>> https://phabricator.wikimedia.org/T150249
>> Best
>> Joseph
>>
>>
>> On Tue, Nov 8, 2016 at 12:28 AM, Issa Rice  wrote:
>>
>>> Dear Analytics Mailing List,
>>>
>>> Recently while querying pageviews of various pages, I discovered that
>>> the page whose title is a single hyphen character (i.e. with the title
>>> "-", with URL , which redirects to
>>> ) receives an unusually high
>>> number of pageviews under the Pageview API. Taking October 2015 as an
>>> example, the page received 5.4 million pageviews during that month
>>> according to the API:
>>> >> icle/en.wikipedia/desktop/user/-/daily/20151001/20151031>.
>>>
>>> However, according the stats.grok.se (which was still operational in the
>>> same month), the page received only 1209 pageviews:
>>> .
>>>
>>> Looking at the tabulation of pageviews on Wikipedia Views, the increase
>>> in pageviews for this page coincides with the change to the Pageview
>>> API in July 2015:
>>> >> ?page=-=allmonths=all>.
>>>
>>> As I understand, page titles must be URL-encoded before the query,
>>> but the URL-encoding of "-" is itself.
>>>
>>> I looked at the API documentation but did not see this behavior listed,
>>> so I am wondering where these numbers are coming from.
>>>
>>> Best regards,
>>> Issa
>>>
>>>
>>> ___
>>> Analytics mailing list
>>> Analytics@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>
>>>
>>
>>
>> --
>> *Joseph Allemandou*
>> Data Engineer @ Wikimedia Foundation
>> IRC: joal
>>
>> ___
>> Analytics mailing list
>> Analytics@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>>
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] High number of pageviews on page with single hyphen as title

2016-11-08 Thread Vipul Naik
Hi Joseph,

Thanks for the clarification.

Any ideas why this number is much higher for some months? In particular, on
desktop, it's high in the months of July to September 2015 (around 10
million, compared to the usual 5 million) and then high again in October
2016 (45 million, about 10x the usual value).

Data is from
http://wikipediaviews.org/displayviewsformultiplemonths.php?page=-=allmonths=all
which
summarizes results from the Wikimedia API (and stats.grok.se for data
before July 2015).

Vipul

On Tue, Nov 8, 2016 at 3:46 AM, Joseph Allemandou  wrote:

> Hello Issa,
>
> Thank you for your question.
> The very high number of views of the "-" page is explained by this dash
> value being used as a special value for "no page title found" when
> extracting titles from urls.
> We definitely should document this in the API, creating this task:
> https://phabricator.wikimedia.org/T150249
> Best
> Joseph
>
>
> On Tue, Nov 8, 2016 at 12:28 AM, Issa Rice  wrote:
>
>> Dear Analytics Mailing List,
>>
>> Recently while querying pageviews of various pages, I discovered that
>> the page whose title is a single hyphen character (i.e. with the title
>> "-", with URL , which redirects to
>> ) receives an unusually high
>> number of pageviews under the Pageview API. Taking October 2015 as an
>> example, the page received 5.4 million pageviews during that month
>> according to the API:
>> > icle/en.wikipedia/desktop/user/-/daily/20151001/20151031>.
>>
>> However, according the stats.grok.se (which was still operational in the
>> same month), the page received only 1209 pageviews:
>> .
>>
>> Looking at the tabulation of pageviews on Wikipedia Views, the increase
>> in pageviews for this page coincides with the change to the Pageview
>> API in July 2015:
>> > ?page=-=allmonths=all>.
>>
>> As I understand, page titles must be URL-encoded before the query,
>> but the URL-encoding of "-" is itself.
>>
>> I looked at the API documentation but did not see this behavior listed,
>> so I am wondering where these numbers are coming from.
>>
>> Best regards,
>> Issa
>>
>>
>> ___
>> Analytics mailing list
>> Analytics@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>>
>
>
> --
> *Joseph Allemandou*
> Data Engineer @ Wikimedia Foundation
> IRC: joal
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] High number of pageviews on page with single hyphen as title

2016-11-08 Thread Joseph Allemandou
Hello Issa,

Thank you for your question.
The very high number of views of the "-" page is explained by this dash
value being used as a special value for "no page title found" when
extracting titles from urls.
We definitely should document this in the API, creating this task:
https://phabricator.wikimedia.org/T150249
Best
Joseph


On Tue, Nov 8, 2016 at 12:28 AM, Issa Rice  wrote:

> Dear Analytics Mailing List,
>
> Recently while querying pageviews of various pages, I discovered that
> the page whose title is a single hyphen character (i.e. with the title
> "-", with URL , which redirects to
> ) receives an unusually high
> number of pageviews under the Pageview API. Taking October 2015 as an
> example, the page received 5.4 million pageviews during that month
> according to the API:
>  article/en.wikipedia/desktop/user/-/daily/20151001/20151031>.
>
> However, according the stats.grok.se (which was still operational in the
> same month), the page received only 1209 pageviews:
> .
>
> Looking at the tabulation of pageviews on Wikipedia Views, the increase
> in pageviews for this page coincides with the change to the Pageview
> API in July 2015:
>  php?page=-=allmonths=all>.
>
> As I understand, page titles must be URL-encoded before the query,
> but the URL-encoding of "-" is itself.
>
> I looked at the API documentation but did not see this behavior listed,
> so I am wondering where these numbers are coming from.
>
> Best regards,
> Issa
>
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>


-- 
*Joseph Allemandou*
Data Engineer @ Wikimedia Foundation
IRC: joal
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


[Analytics] High number of pageviews on page with single hyphen as title

2016-11-07 Thread Issa Rice
Dear Analytics Mailing List,

Recently while querying pageviews of various pages, I discovered that
the page whose title is a single hyphen character (i.e. with the title
"-", with URL , which redirects to
) receives an unusually high
number of pageviews under the Pageview API. Taking October 2015 as an
example, the page received 5.4 million pageviews during that month
according to the API:
<
https://wikimedia.org/api/rest_v1/metrics/pageviews/per-article/en.wikipedia/desktop/user/-/daily/20151001/20151031
>.

However, according the stats.grok.se (which was still operational in the
same month), the page received only 1209 pageviews:
.

Looking at the tabulation of pageviews on Wikipedia Views, the increase
in pageviews for this page coincides with the change to the Pageview
API in July 2015:
<
http://wikipediaviews.org/displayviewsformultiplemonths.php?page=-=allmonths=all
>.

As I understand, page titles must be URL-encoded before the query,
but the URL-encoding of "-" is itself.

I looked at the API documentation but did not see this behavior listed,
so I am wondering where these numbers are coming from.

Best regards,
Issa
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics