On Sun, Oct 11, 2009 at 3:28 PM, Erik Zachte <[email protected]> wrote:
> A
> Any idea why there are so many TCP_DENIED/403, are these really failures ?

Certain types of requests are blocked at the Squid level for various
reasons.  For instance, try wgetting Wikipedia; you'll get a 403
because the default UA headers for such things are blocked.  (You're
supposed to use a custom UA header, preferably with contact info, to
make your script distinctive and easily blockable by itself if there's
a problem.)  Similarly, try something like this:

http://en.wikipedia.org/&amp;

I assume this kind of thing is what causes those responses.

On Sun, Oct 11, 2009 at 8:12 PM, Robert Rohde <[email protected]> wrote:
> However, a logical guess would
> be if the Squid is configured to reject action=edit requests from
> search engine spiders and similar non-human processes.  Since such
> things are not easily incorporated into robots.txt, blocking at the
> squid layer would be a good option for stopping such traffic from
> hitting the main servers.  That would be my guess.  I suspect others
> can give a more concrete answer.

Those things are all blocked in robots.txt:

User-agent: *
Disallow: /w/

That's part of why we use long URLs for everything but page views, so
that they can be neatly blocked from spiders.

_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to