Re: [Analytics] How to get old page views data?

2018-02-23 Thread Dan Andreescu
Thanks, Scott, I failed to find that task and incorrectly assumed we had declined it. My fault, we'll see about loading that data then. And yes, Peter, per-article dumps are already there but they're split across pagecounts-raw from 2008-2011 and pagecounts-ez after that. The conversation

Re: [Analytics] How to get old page views data?

2018-02-22 Thread Scott Hale
FYI that there is a phabricator task to load legacy pagecounts by article to AQS: https://phabricator.wikimedia.org/T173720 That task arose from a discussion on this mailing list mid-last year: https://www.mail-archive.com/analytics@lists.wikimedia.org/msg04349.html

Re: [Analytics] How to get old page views data?

2018-02-22 Thread Nuria Ruiz
Peter: Do submit a phabricator tasks with your request, it'll be easier to follow on it than it is via e-mail. Our backlog: https://phabricator.wikimedia.org/tag/analytics/ I assume you know that per article views are available since 2015, a way to see those:

Re: [Analytics] How to get old page views data?

2018-02-22 Thread Peter Meissner
Like dumps on article-day level? That would be already super awesome much better than the current state. Best, Peter Am 22.02.2018 22:23 schrieb "Dan Andreescu" : > Peter, the data you mention here is quite large, and storage is cheap but > not free. For now, we don't

Re: [Analytics] How to get old page views data?

2018-02-22 Thread Dan Andreescu
Peter, the data you mention here is quite large, and storage is cheap but not free. For now, we don't have capacity to serve that kind of timespan from the API, but we will work to improve the dumps version so it's more comprehensive. On Thu, Feb 22, 2018 at 4:12 PM, Peter Meissner

Re: [Analytics] How to get old page views data?

2018-02-22 Thread Peter Meissner
Dear List-eners, I write in to argue the case for an Wikipedia effort to make something like stats.grok.se (page views per day per article from 2007 onwards) available again. I am author of the first R-package that was providing easy access to pageview counts by accessing the stats.grok.se

Re: [Analytics] How to get old page views data?

2018-02-22 Thread John Urbanik
Dan, Thanks for the clarification - digging into the files, I see that there are redirects and more than 30M titles. My view had been informed by the documentation at https://dumps.wikimedia.org/other/pagecounts-ez/: Hourly page views per article for around 30 million article titles (Sept >

Re: [Analytics] How to get old page views data?

2018-02-22 Thread Dan Andreescu
John: I think you may have gotten the wrong impression from some description, and I'm not sure what you were looking at. As far as I know, pagecounts-ez is the most comprehensive dataset we have with pageviews from as early as we started tracking them. It should have all articles, regardless

Re: [Analytics] How to get old page views data?

2018-02-22 Thread John Urbanik
Dan, One clarification point I'd make is that while the data is lossless for 30M articles, it is 100% lossy for redirects, old page names, or pages created after September 2013, correct? John On Wed, Feb 21, 2018 at 2:26 PM, Dan Andreescu wrote: > Hi Lars, > > You

Re: [Analytics] How to get old page views data?

2018-02-21 Thread Tilman Bayer
Thanks Dan! We should try to have this kind of information in the actual documentation updated; I just added your remarks to the page about pagecounts-raw , where the pagecounts-ez alternative had not been mentioned yet.

Re: [Analytics] How to get old page views data?

2018-02-21 Thread Dan Andreescu
Hi Lars, You have a couple of options: 1. download the data in lossless compressed form, https://dumps.wikimedia. org/other/pagecounts-ez/ The format is clever and doesn't lose granularity, should be a lot quicker than pagecounts-raw (this is basically what stats.grok.se did with the data as