Hi Erik,
You're quite right numbers are inflated, and we've been over this before [1].
Below are some sampled data for da.wiktionary from webstatscollector [2] and 
squid log [3]
Bot traffic is a substantial share of 'page views' (but not the majority as you 
suggest).

We discussed this extensively in April and as I remember (my mail archive is 
somehow incomplete)
decided to implement a second cleaned-up stream
without /bot/crawler/spider/http (keeping the original stream so as not break 
trend lines)

However that bot free stream (projectcounts files with extra set of per wiki 
totals)
never happened yet, and I'm pretty sure we changed plans since,
and probably now wait for Kraken. Diederik can you add to this?

Oh my, I thought this was in operation already.
I've actually been looking at these page view stats,
and now I feel like a fool.

Why not just remove these web pages at
http://stats.wikimedia.org/wiktionary/EN/TablesPageViewsMonthly.htm
since they contain only nonsense? Continuity with
old nonsense is still nonsense, so remove everything
now and start a new project with real numbers.

[1] On April 8, 2012 you reported a similar issue for Swedish Wikipedia.
I checked by then one hour of sampled squid log. 9 out of 13 requests were bots.

Nobody doubts that the Swedish Wikipedia has a
substantial amount of human traffic. But for smaller
projects, the background noise will dominate. If
bots are 9 out of 13 requests to sv.wikipedia (really?),
they can easily be 99% of traffic to da.wiktionary.

One easy way to tell would be to observe the daily
rhythm. Since Swedish and Danish are limited to one
timezone, traffic in the middle of the night should be
much smaller than mid-day traffic. But bots could
be operating all night, all day. So the least active hour
is probably the background noise from bots.


--
  Lars Aronsson ([email protected])
  Aronsson Datateknik - http://aronsson.se



_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to