It was thus said that the Great [EMAIL PROTECTED] once stated:
>
> It would be great to know how to ask that http service for the list of
> "default or index file" names so the agents could verify what file name was
> indeed associated with the "/" slash. We could then put the file name on the
> URL to completely qualify that URL path. Anyone?
No. In some pathological cases there *isn't* a file associated with what
looks like a directory. Case in point---both:
http://boston.conman.org/2001/11/10
and
http://boston.conman.org/2001/11/10/
Return the same document (or rather, they should---there appears to be a
bug in the code 8-)
A URL like
http://boston.conman.org/2001/index.html
Is invalid and does not exist. In fact, the ``directory'' 2001/ does not
exist. In fact, once you get past the hostname, any page in the form of
[0-9]+/[0-9][0-9]/[0-9][0-9] is more or less a database query (in fact, it's
a bit more convoluted than that but that's not important right now). So in
fact, for this portion of the webspace, such a query just doesn't apply.
> Crazy thought...
>
> This is where the robots.txt file could be used to hold that information for
> the robot agents that need to know the operational order of the "/" defaults
> names used on that service.
>
> User-agent: *
> Slash: default.htm, default.html, index.htm, index.html, welcome.html,
> sitemap.html
>
> The above is just for consideration if the robots.txt is ever updated so the
> robots could be informed of this little detail.
There was a push in '96 or '97 to update the robots.txt standard and I
wrote a proposal back then (http://www.conman.org/people/spc/robots2.html)
and while I still get the occasional email about it to my knowledge, no
robot has implemented it (some portions perhaps, but not everything). I
only mention this because it was attempted before.
-spc (For an interesting discussion, tell me how a robot should handle
a site like http://bible.conman.org/ or http://boston.conman.org
where you really have multiple views into obstensibly a single
document)
--
This message was sent by the Internet robots and spiders discussion list
([EMAIL PROTECTED]). For list server commands, send "help" in the body of a message
to "[EMAIL PROTECTED]".