On Wed, 19 Jul 2006, Patrick R. Michaud wrote:

On Wed, Jul 19, 2006 at 09:34:44PM +0200, [EMAIL PROTECTED] wrote:
On Wed, 19 Jul 2006, Patrick R. Michaud wrote:
On Wed, Jul 19, 2006 at 11:36:53AM -0500, JB wrote:
PM,

Can I please get a copy of your robots.txt file?

Also, for any who are interested, here's the relevant
sections of my root .htaccess file, which denies certain
user agents at the webserver level instead of waiting
for PmWiki to do it:

  # HTTrack and MSIECrawler are just plain annoying
  RewriteEngine On
  RewriteCond %{HTTP_USER_AGENT} HTTrack [OR]
  RewriteCond %{HTTP_USER_AGENT} MSIECrawler
  RewriteRule ^wiki/ - [F,L]

  # block ?action= requests for these spiders
  RewriteCond %{QUERY_STRING} action=[^rb]
  RewriteCond %{HTTP_USER_AGENT} Googlebot [OR]
  RewriteCond %{HTTP_USER_AGENT} Slurp [OR]
  RewriteCond %{HTTP_USER_AGENT} msnbot [OR]
  RewriteCond %{HTTP_USER_AGENT} Teoma [OR]
  RewriteCond %{HTTP_USER_AGENT} ia_archive
  RewriteRule .* - [F,L]

The obvious solution: Add this to some PmWiki page?  Perhaps something
about administrative tasks? Or something related to robots.txt?

It probably belongs in Cookbook.ControllingWebRobots, which also needs
to be rewritten to be up-to-date with PmWiki 2.1.  There also needs
to be a link in the administrative tasks section, or at least a
FAQ question.

I'm going through old posts. Should I place the above on Cookbook.ControllingWebRobots? (I wonder there's a problem placing it in an offical place - no security risks)

/C

--
Christian Ridderström, +46-8-768 39 44               http://www.md.kth.se/~chr
_______________________________________________
pmwiki-users mailing list
[email protected]
http://www.pmichaud.com/mailman/listinfo/pmwiki-users

Reply via email to