Re: [Wikitech-l] Page views

2012-04-11 Thread Lars Aronsson
On 04/11/2012 01:45 AM, Erik Zachte wrote: Here are some numbers on total bot burden: 1) http://stats.wikimedia.org/wikimedia/squids/SquidReportCrawlers.htm states for March 2012: In total 69.5 M page requests (mime type text/html only!) per day are considered crawler requests, out of 696 M

Re: [Wikitech-l] Page views

2012-04-11 Thread Diederik van Liere
My suggestion for how to filter these bots efficiently in c program (no costly nuanced regexps) before sending data to webstatscollector: a) Find 14th field in space delimited log line = user agent (but beware of false delimiters in logs from varnish, if still applicable) b) Search this

Re: [Wikitech-l] Page views

2012-04-10 Thread Erik Zachte
, 2012 9:21 PM To: Wikimedia developers Cc: Diederik van Liere; Lars Aronsson Subject: Re: [Wikitech-l] Page views 2012/4/8 Erik Zachte ezac...@wikimedia.org Hi Lars, You have a point here, especially for smaller projects: For Swedish Wikisource: zcat sampled-1000.log-20120404.gz | grep 'GET

Re: [Wikitech-l] Page views

2012-04-09 Thread Srikanth Lakshmanan
On Mon, Apr 9, 2012 at 00:46, Erik Zachte ezac...@wikimedia.org wrote: returns 20 lines from this 1:1000 sampled squid log file after removing javascript/json/robots.txt there are 13 left, which fits perfectly with 10,000 to 13,000 per day however 9 of these are bots!! Is this the same

Re: [Wikitech-l] Page views

2012-04-09 Thread Diederik van Liere
Hi Srikanth, Yes, we are looking into the growth percentages as they seem unrealistically high. Best, Diederik On Mon, Apr 9, 2012 at 3:30 AM, Srikanth Lakshmanan srik@gmail.com wrote: On Mon, Apr 9, 2012 at 00:46, Erik Zachte ezac...@wikimedia.org wrote: returns 20 lines from this

Re: [Wikitech-l] Page views

2012-04-09 Thread Erik Zachte
, April 09, 2012 9:28 PM To: Srikanth Lakshmanan Cc: Wikimedia developers; Diederik van Liere; Lars Aronsson Subject: Re: [Wikitech-l] Page views Hi Srikanth, Yes, we are looking into the growth percentages as they seem unrealistically high. Best, Diederik On Mon, Apr 9, 2012 at 3:30 AM, Srikanth

Re: [Wikitech-l] Page views

2012-04-08 Thread Erik Zachte
Hi Lars, You have a point here, especially for smaller projects: For Swedish Wikisource: zcat sampled-1000.log-20120404.gz | grep 'GET http://sv.wikisource.org' | awk '{print $9, $11,$14}' returns 20 lines from this 1:1000 sampled squid log file after removing javascript/json/robots.txt

Re: [Wikitech-l] Page views

2012-04-08 Thread emijrp
2012/4/8 Erik Zachte ezac...@wikimedia.org Hi Lars, You have a point here, especially for smaller projects: For Swedish Wikisource: zcat sampled-1000.log-20120404.gz | grep 'GET http://sv.wikisource.org' | awk '{print $9, $11,$14}' returns 20 lines from this 1:1000 sampled squid log

[Wikitech-l] Page views

2012-04-07 Thread Lars Aronsson
I'm telling people that the Swedish Wikipedia has 90-100 million page views per month or on average ten per month per Swedish citizen. This is based on stats.wikimedia.org (Wikistats), but is it really true? It would be really embarrassing if it were wrong by some order of magnitude. There is of