Thanks Tim for running those data. That seems to suggest the URL
structure works for the most case.

On Wed, Sep 18, 2013 at 12:07 AM, Tim Starling <tstarl...@wikimedia.org> wrote:
> On 17/09/13 13:59, Jon Robson wrote:
>> I would suggest taking a look at the number of 404s caused by people trying
>> to access pages without the wiki prefix.... This would be interesting data
>> to go alongside this interesting proposal...
>
> There are lots of different sorts of 404s, so it's necessary to do
> some filtering. For example:
>
> * double-slashes, due to bug 52253
> * sitemap.xml
> * Apple touch icons
> * bullet.gif in various directories
> * vulnerability scanning, e.g. xmlrpc.php
> * BlueCoat verify/notify, as described in
> <http://www.webmasterworld.com/search_engine_spiders/3859463.htm>
> * Serial numbers like http://en.wikipedia.org/B008NAYASM .
>
> I filtered out everything with a dot or slash in the prospective
> article title, as well as the BlueCoat URLs and the UAs responsible
> for serial number URLs. To simplify analysis, I took log lines from
> the English Wikipedia only.
>
> Most of the remaining log entries were search engine crawlers, so I
> took those out too.
>
> The result was 149 log entries at a 1/1000 sample rate, for the week
> of September 8-14, implying a request rate of about 639,000 per month.
> This is about 0.006% of the English Wikipedia's page view rate.
>
> The 149 URLs are at http://paste.tstarling.com/p/uhtFqg.html
>
> -- Tim Starling
>
>
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l



-- 
Jon Robson
http://jonrobson.me.uk
@rakugojon

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to