Re: [Robots] robots.txt questions

2003-12-02 Thread Andrew Daviel
1996 or so) I think it was suggested that GET be used to retrieve information and POST be used to make things happen, like debiting accounts or ordering things. Robots don't do POST, generally speaking. No doubt you have some good reason for using GET, like being able to use the Back butt

RE: [Robots] Hit Rate - testing is this mailing linst alive?

2003-11-04 Thread Andrew Daviel
might be. However, I guess that the people running robots also have finite storage and have to pay for bandwidth, so that perhaps this is a non-problem except where there is a serious asymmetry between source and destination. -- Andrew Daviel, TRIUMF, Canada Tel. +1 (604) 222-7376 [EMAIL PROT

Re: [Robots] Post

2002-11-08 Thread Andrew Daviel
ect; it might be the author's, or where he went to school. Though it allegedly works quite well for some "accomodation" sites.) One idea is for geo-enabled wireless PDAs or laptops, but it's got hung up in the IETF Geo privacy committee (geopriv). --

[Robots] Re: robots in the courts

2002-03-22 Thread Andrew Daviel
ticket information aggregation site, for linking to various pages deep within Plaintiff's Web site. E-Bay Defendant would regularly spider plaintiff's site (along with many other auction sites) to extract information about items being auctioned and related prices, organize that informat

[Robots] Re: robots follow redirect scripts?

2001-12-10 Thread Andrew Daviel
appens > telnet my.host.org 80 connected GET /dir/page.htm HTTP/1.0 host: my.host.org HTTP/1.0 302 Found Date: Mon, 10 Dec 2001 08:12:28 GMT Server: Apache/1.1.1 Location: http://my.host.org/some/other/page Content-type: text/html etc. regards -- Andrew Daviel, TRIUMF, Canada Tel. +1 (604)

[Robots] Re: robots follow redirect scripts?

2001-12-08 Thread Andrew Daviel
ent to the robot and another to a human (whose browser jumps immediately to the next page). I am not sure if this would also affect the harvesting of links. In theory it should not, but in practice it may. -- Andrew Daviel, TRIUMF, Canada Tel. +1 (604) 222-7376 [EMAIL PROTECTED] -- This mess

[Robots] Re: Correct URL, shlash at the end ?

2001-11-23 Thread Andrew Daviel
to select a redirect status, so I've tried a 301 permanent redirect from http://lin00.triumf.ca to http://www.triumf.ca (I thought I'd done this before, but apparently not) In theory robots should save the redirected URL not the original in this case Andrew Daviel -- This message was sent b

[Robots] Re: Correct URL, shlash at the end ?

2001-11-21 Thread Andrew Daviel
cted location, not the original URL, or perhaps, on indexing a page http://some.site/dir/blah/some.htm[l], purge any existing entries for http://some.site/dir/blah/ or http://some.site/dir/blah with identical checksum and modification time. -- Andrew Daviel, TRIUMF, Canada Tel. +1 (604) 222-7

[Robots] Re: Anti-thesaurus proposal

2001-11-21 Thread Andrew Daviel
e to be able to say don't index this page except this bit otherwise the tag could be possibly simplified yet further to e.g. don't index this (just have to get it in the DTD) (Hmm, maybe we still want to distinguish "index" from follow" ...) (I don't really

Re: [Robots] Re: Security News Robot

2001-07-10 Thread Andrew Daviel
al.doc Looks like your report got trashed by the mailing list software. Mailing lists tend to be old-school and not like attachments, and prefer plain text messages, not HTML or PDF and not proprietary WP formats e.g. Word. Suggest you resend it as text if it's short, else put it on the Web

[Robots] Re: How to handle image maps & javascript

2001-06-27 Thread Andrew Daviel
list of choices instead of the usual dumb "upgrade your browser" text that must bug the h*ll out of blind users. -- Andrew Daviel, TRIUMF, Canada -- This message was sent by the Internet robots and spiders discussion list ([EMAIL PROTECTED]). For list server commands, send "help" in the body of a message to "[EMAIL PROTECTED]".

Re: Indexing cgi programs?

2000-04-12 Thread Andrew Daviel
have some scripts returning status 200 on error (parsing path, not query) I should fix. A quick check suggests that major search engines are returning status 200 on empty content. Netscape seems to display the page content on most status codes except 201-204. Andrew Daviel TRIUMF & Vancouver Webpages

Re: modified since/accept

2000-04-05 Thread Andrew Daviel
the original poster perhaps meant "Accept" not "Allow" - which if set to "text/html" and if working properly would reject pages of other types such as audio/mp3 or application/msword. Andrew Daviel TRIUMF, Vancouver Webpages etc.

Re: URLs with "?"s in them

2000-03-10 Thread Andrew Daviel
rs give out status 200 for "page not found". Some Novell product comes to mind. Things were a bit simpler a few years back when almost all pages were static and people put /cgi-bin in robots.txt. Andrew Daviel TRIUMF & Vancouver Webpages

Re: Date of html file

2000-03-10 Thread Andrew Daviel
me stored in the file header. Andrew Daviel TRIUMF

robots.txt a security hole??

2000-03-09 Thread Andrew Daviel
ve. Risk factor : Medium" Andrew Daviel TRIUMF