Form: Reply Text: (101 lines follow) <grin> http://boston.conman.org/2001/11/10 ... that was great case in point.
If a U.R.L. is a query, then is the diagnostic on a slash up to a special pathology unit? Well (ness aside) , I agree with your point. We cannot assume what looks like a directory "/" query always redirects to a more qualified URL. They may resolve with and without redirection and include other network or application pre-op patchworks. In my own "pathological" way I would like to query the web server for its default page name, or name list they check for resovling the slash query. Why? So I can at least TRY harder to identify the true meaning of a slash in the custom configured "mind" of the web server. This lead to the crazy thought of adding it into robots.txt ( or robots-slash.txt ) would be nice but, no I do not wish one. What I would like from the web servers is to just give us the real meaning of slash in the returned location. For a slash , the Found 302 ( or 301 if configure to return it so ) could give us the real URL we need to store in from the location. Yes, no? So on a query of http://slashdot.org/ Return Location: http://slashdot.org/index.html instead of Location: http://slashdot.org/ when this condition is true on the a slash query. It may start to heal the slash(ed t)issues. -Thomas Kay ---------- Original Text ---------- From: "Sean 'Captain Napalm' Conner" <[EMAIL PROTECTED]>, on 21/11/2001 11:07 PM: It was thus said that the Great [EMAIL PROTECTED] once stated: > > It would be great to know how to ask that http service for the list of > "default or index file" names so the agents could verify what file name was > indeed associated with the "/" slash. We could then put the file name on the > URL to completely qualify that URL path. Anyone? No. In some pathological cases there *isn't* a file associated with what looks like a directory. Case in point---both: http://boston.conman.org/2001/11/10 and http://boston.conman.org/2001/11/10/ Return the same document (or rather, they should---there appears to be a bug in the code 8-) A URL like http://boston.conman.org/2001/index.html Is invalid and does not exist. In fact, the ``directory'' 2001/ does not exist. In fact, once you get past the hostname, any page in the form of [0-9]+/[0-9][0-9]/[0-9][0-9] is more or less a database query (in fact, it's a bit more convoluted than that but that's not important right now). So in fact, for this portion of the webspace, such a query just doesn't apply. > Crazy thought... > > This is where the robots.txt file could be used to hold that information for > the robot agents that need to know the operational order of the "/" defaults > names used on that service. > > User-agent: * > Slash: default.htm, default.html, index.htm, index.html, welcome.html, > sitemap.html > > The above is just for consideration if the robots.txt is ever updated so the > robots could be informed of this little detail. There was a push in '96 or '97 to update the robots.txt standard and I wrote a proposal back then (http://www.conman.org/people/spc/robots2.html) and while I still get the occasional email about it to my knowledge, no robot has implemented it (some portions perhaps, but not everything). I only mention this because it was attempted before. -spc (For an interesting discussion, tell me how a robot should handle a site like http://bible.conman.org/ or http://boston.conman.org where you really have multiple views into obstensibly a single document) -- This message was sent by the Internet robots and spiders discussion list ([EMAIL PROTECTED]). For list server commands, send "help" in the body of a message to "[EMAIL PROTECTED]". -- This message was sent by the Internet robots and spiders discussion list ([EMAIL PROTECTED]). For list server commands, send "help" in the body of a message to "[EMAIL PROTECTED]".