Hi,
I see the spec you quoted as being for the value of a User-agent line.
there is also mention of the field name being case insensitive.  I don't see
the value of a Disallow line being case sensitive or insensitive.  I guess I
assumed it would be the same as the others.

I am not sure where this case issue comes into play.  Is it how the name is
stored on the drive?  Or is it from the request itself?

Here is one of the bigest offenders:

I triger on the word "slurp" [case insensitive] to see this.
Info from my logs:
cs(User-Agent) =
mozilla/5.0+(slurp/si;[EMAIL PROTECTED];+http://www.inktomi.com/slurp.html)
cs-host = www.coseco.com
cs-uri-stem = /robots.txt
cs-uri-query =
cs-method = GET
sc-status = 200

I assume they read the robots.txt file OK

cs(User-Agent) =
Mozilla/5.0+(Slurp/cat;[EMAIL PROTECTED];+http://www.inktomi.com/slurp.html
)
cs-host = www.coseco.com
cs-uri-stem = /heartnart/shoppingCart.asp
cs-uri-query = action=add&iNum=N020313-19.5
cs-method = GET
sc-status = 403


I have just noticed that this one is slightly different case mix than what I
had in robots.txt.  The file used to be "ShoppingCart.asp".  I did not
notice it had changed.  I have made a change to robots.txt and will see if
this helps.

I will keep a lookout for the others (haven't seen them today) and see if
they are in the same catagory.  I hope this solves the problem.

Thanks,
Paul Coleman

-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
Behalf Of Nick Arnett
Sent: Monday, December 01, 2003 10:33 AM
To: Internet robots, spiders, web-walkers, etc.
Subject: Re: [Robots] robots.txt questions


VBCoder wrote:

> Hi,
> Every place I have read about robots.txt rules state that it is supposed
to
> be case insensitive.

The spec says "A case insensitive substring match of the name without
version information is recommended."  This is up to the robots, not you.
You probably are getting hit by robots that don't do it.

> You seem to be suggesting that this is wrong.  I have
> added lines the include the exact case of the offender, but this does not
> seem to stop them.  The mixed case lines are and experiment, the all lower
> case lines should be enough to stop them from what I have read.  Are you
> suggesting that robots.txt needs to be case sensitive?

This should only be necessary as a work-around for robots that aren't
following the above recommendation.  Do you have user-agent names for
robots that seem to download and not follow the directive?  It would be
interesting to see if they're using a third-party library to interpret
robots.txt.

> The domain name heartnart.com does a redirect to www.coseco.com/heartnart
.
> I would think that the case should not matter to a search engine as it
> doesn't to the web in general.  I mix them so that it can be more easily
> read by a humans.  Are you suggesting that a search engine would think
that
> www.coseco.com/heartnart is a different place than
www.coseco.com/HeartnArt?
> I am more confused than before.

Windows-based web servers are the only ones that ignore case, generally
speaking.  And they make up a relatively small portion of servers out there.

Nick

--
Nick Arnett
Phone/fax: (408) 904-7198
[EMAIL PROTECTED]

_______________________________________________
Robots mailing list
[EMAIL PROTECTED]
http://www.mccmedia.com/mailman/listinfo/robots
---
Incoming mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.545 / Virus Database: 339 - Release Date: 11/27/2003


_______________________________________________
Robots mailing list
[EMAIL PROTECTED]
http://www.mccmedia.com/mailman/listinfo/robots

Reply via email to