See also: http://dammit.lt/wikistats/

I've parsed every one of these files (at hour granularity; grok.se 
aggregates at day-level, I believe) since Jan. 2010 into a DB structure 
indexed by page title. It takes up about 400GB of space, at the moment.

While a comprehensive measurement study over this data would be 
interesting (long term trends, traffic spikes during cultural events, 
etc.) -- the technical infrastructure is already in place. I doubt a 
measurement study meets GSoC requirements.

Thanks, -AW


On 04/06/2011 11:05 AM, David Gerard wrote:
> On 6 April 2011 16:02, Peng Wan<[email protected]>  wrote:
>
>> This is Peng Wan. I have submitted my application to wikimedia of Gsoc.
>> My Project title is "Figuring out the most popular pages".
>
>
> Does it do anything http://stats.grok.se/ doesn't?
>
>
> - d.
>
> _______________________________________________
> Wikitech-l mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l

-- 
Andrew G. West, Doctoral Student
Dept. of Computer and Information Science
University of Pennsylvania, Philadelphia PA
Website: http://www.cis.upenn.edu/~westand

_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to