On Sun, Oct 11, 2009 at 3:28 PM, Erik Zachte <[email protected]> wrote: > A > Any idea why there are so many TCP_DENIED/403, are these really failures ?
Certain types of requests are blocked at the Squid level for various reasons. For instance, try wgetting Wikipedia; you'll get a 403 because the default UA headers for such things are blocked. (You're supposed to use a custom UA header, preferably with contact info, to make your script distinctive and easily blockable by itself if there's a problem.) Similarly, try something like this: http://en.wikipedia.org/& I assume this kind of thing is what causes those responses. On Sun, Oct 11, 2009 at 8:12 PM, Robert Rohde <[email protected]> wrote: > However, a logical guess would > be if the Squid is configured to reject action=edit requests from > search engine spiders and similar non-human processes. Since such > things are not easily incorporated into robots.txt, blocking at the > squid layer would be a good option for stopping such traffic from > hitting the main servers. That would be my guess. I suspect others > can give a more concrete answer. Those things are all blocked in robots.txt: User-agent: * Disallow: /w/ That's part of why we use long URLs for everything but page views, so that they can be neatly blocked from spiders. _______________________________________________ Wikitech-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikitech-l
