[Robots] Re: Correct URL, shlash at the end ?

thomas.kay 23 Nov 2001 18:33:20 -0000


Form: Reply
Text: (101 lines follow)
<grin>  http://boston.conman.org/2001/11/10 ... that was great case in point.


If a U.R.L. is a query,  then is the diagnostic on a slash up to a special 
pathology unit?

Well (ness aside) , I agree with your point.  We cannot assume what looks 
like a directory "/" query always redirects to a more qualified URL.   They 
may resolve with and without redirection and include other network or 
application pre-op patchworks.

In my own "pathological" way I would like to query the web server for its 
default page name, or name list they check for resovling the slash query.  
Why?  So I can at least TRY harder to identify the true meaning of a slash in 
the custom configured "mind" of the web server.      

This lead to the crazy thought of adding it into 
robots.txt   ( or robots-slash.txt )
would be nice but, no I do not wish one.

What I would like from the web servers is to just give us the real meaning of 
slash in the returned location.  For a slash , the Found 302  ( or 301 if 
configure to return it so ) could give us the real URL we need to store in 
from the location.  Yes, no?

So on a query of http://slashdot.org/   

Return
Location:  http://slashdot.org/index.html   
instead of 
Location:  http://slashdot.org/   

when this condition is true on the a slash query.  

It may start to heal the slash(ed t)issues.

-Thomas Kay
---------- Original Text ----------

From: "Sean 'Captain Napalm' Conner" <[EMAIL PROTECTED]>, on 21/11/2001 11:07 PM:


It was thus said that the Great [EMAIL PROTECTED] once stated:
> 
> It would be great to know how to ask that http service for the list of 
> "default or index file" names so the agents could verify what file name was 
> indeed associated with the "/" slash.  We could then put the file name on 
the 
> URL to completely qualify that URL path.   Anyone? 

  No.  In some pathological cases there *isn't* a file associated with what
looks like a directory.  Case in point---both:

                     http://boston.conman.org/2001/11/10

                                     and

                    http://boston.conman.org/2001/11/10/

  Return the same document (or rather, they should---there appears to be a
bug in the code 8-)

  A URL like

                  http://boston.conman.org/2001/index.html

  Is invalid and does not exist.   In fact, the ``directory'' 2001/ does not
exist.  In fact, once you get past the hostname, any page in the form of
[0-9]+/[0-9][0-9]/[0-9][0-9] is more or less a database query (in fact, it's
a bit more convoluted than that but that's not important right now).  So in
fact, for this portion of the webspace, such a query just doesn't apply.

> Crazy thought...
> 
> This is where the robots.txt file could be used to hold that information for 
> the robot agents that need to know the operational order of the "/" defaults 
> names used on that service.
> 
> User-agent: *
> Slash: default.htm, default.html, index.htm, index.html, welcome.html, 
> sitemap.html
> 
>  The above is just for consideration if the robots.txt is ever updated so 
the 
> robots could be informed of this little detail.   

  There was a push in '96 or '97 to update the robots.txt standard and I
wrote a proposal back then (http://www.conman.org/people/spc/robots2.html)
and while I still get the occasional email about it to my knowledge, no
robot has implemented it (some portions perhaps, but not everything).  I
only mention this because it was attempted before.

  -spc (For an interesting discussion, tell me how a robot should handle
        a site like http://bible.conman.org/ or http://boston.conman.org
        where you really have multiple views into obstensibly a single
        document)


--
This message was sent by the Internet robots and spiders discussion list 
([EMAIL PROTECTED]).  For list server commands, send "help" in the body of 
a message to "[EMAIL PROTECTED]".


--
This message was sent by the Internet robots and spiders discussion list 
([EMAIL PROTECTED]).  For list server commands, send "help" in the body of a message 
to "[EMAIL PROTECTED]".

[Robots] Re: Correct URL, shlash at the end ?

Reply via email to