See also: http://dammit.lt/wikistats/
I've parsed every one of these files (at hour granularity; grok.se aggregates at day-level, I believe) since Jan. 2010 into a DB structure indexed by page title. It takes up about 400GB of space, at the moment. While a comprehensive measurement study over this data would be interesting (long term trends, traffic spikes during cultural events, etc.) -- the technical infrastructure is already in place. I doubt a measurement study meets GSoC requirements. Thanks, -AW On 04/06/2011 11:05 AM, David Gerard wrote: > On 6 April 2011 16:02, Peng Wan<[email protected]> wrote: > >> This is Peng Wan. I have submitted my application to wikimedia of Gsoc. >> My Project title is "Figuring out the most popular pages". > > > Does it do anything http://stats.grok.se/ doesn't? > > > - d. > > _______________________________________________ > Wikitech-l mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/wikitech-l -- Andrew G. West, Doctoral Student Dept. of Computer and Information Science University of Pennsylvania, Philadelphia PA Website: http://www.cis.upenn.edu/~westand _______________________________________________ Wikitech-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikitech-l
