The only thing that I don't like about this is that normal HTTP mirroring is very insecure.
Our work with the "Content-Addressable Web" uses secure checksums and some HTTP extensions to provide an alternate way of solving the mirror problem. Our paper on "HTTP Extensions for a Content-Addressable Web" can be found at (http://open-content.net/specs/draft-jchapweske-caw-03.html).
You may also be interested in its companion specification, "The Tree Hash EXchange format (THEX)" at (http://open-content.net/specs/draft-jchapweske-thex-02.html).
We also have a very basic XML-RPC protocol for lease-based mirror advertisement at (http://open-content.net/specs/).
Also, there is a functioning Content-Addressable Web header proxy that you can feel free to play around with. Its currently used for the Open Content Network and can be used as follows:
bash$ HEAD http://gw1.open-content.net:8080/gateway/head?uri=http://etree01.archive.org/etree/moe1997-03-28dnk.shnf/moe1997-03-28d1/moe1997-03-28d1t06.shn
200 OK
Date: Sat, 18 Jan 2003 00:15:05 GMT
Accept-Ranges: bytes
Server: TornadoGateway/1.0 (http://onionnetworks.com/; i386; Linux)
Content-Length: 78691253
Content-MD5: 84lI1a9IFPJq7jb3YG3m9Q==
Content-Type: audio/shn
ETag: "3360009-4b0bbb5-3d8b447e"
Last-Modified: Fri, 20 Sep 2002 15:53:34 GMT
Client-Date: Mon, 31 Mar 2003 18:50:22 GMT
Client-Peer: 209.237.232.89:8080
X-Content-URN: urn:md5:6OEURVNPJAKPE2XOG33WA3PG6U
X-Content-URN: urn:sha1:VTHQINIP3JUPJIMMC5RLVZSEFKMQ5KLX
X-Content-URN: urn:tree:tiger:S6SMQPZXUD7G54ZPIJMXJPN7JAABQXM2ZCKIUEQ
X-First-Bytes: 616a6b6702fbb17009f9255952a4d1a8dc48766a1157a0d5a8b66b6dd241108040201018040a0144d64020110d8c0a0104804420164b0dd2c3a08766a11ec0070000000b8000622efb1fb66659b36d85b45d6d77d3f081756652a563b41c94dbc24ce97b31fd3e4094415a862558d6756102e987170e9c591f2a5428dcc84bc43b21554c1444fe9a306fa2e9450125e78931c15f346cc6597762d6557c68623bc99254bdeaaf470888a9e104d631cca938cf0132314e7547b94069c86106060ea012e8d9c4c3211e99b4d3618070e33359a76670f85cc449e08468ec15ecf4e64e03d3dfb976c324444a9cf31ec599682060769e4e23bf9fce1ad3ffef94be4b
X-Observed-IP: 24.118.168.169
X-Thex-URI: http://gw1.open-content.net:8080/gateway/thex?uri=http://etree01.archive.org/etree/moe1997-03-28dnk.shnf/moe1997-03-28d1/moe1997-03-28d1t06.shn;S6SMQPZXUD7G54ZPIJMXJPN7JAABQXM2ZCKIUEQ
Andre John Mas wrote:
Hi,
Mirroring a web site or ftp site is a great way of reducing load and improving access times. The only thing though is that there is no method for telling a web browser to automatically go to a mirror. For this reason I have been thinking that a 'mirrors.txt' file might be of use at the root of a web site that is either the master or a mirror, in the same way that a robot.txt file is made available.
Follows is an example of what the contents of such a file would contain:
----start of example #this is a comment
title: Project Gutenberg description: Project Gutenberg is the Internet's oldest producer of FREE electronic books (eBooks or eTexts). master: http://gutenberg.net/ search: master
mirror.name: University of North Carolina - HTTP mirror.city: Chapel Hill mirror.state: North Carolina mirror.country: USA mirror.gridref: mirror.url: http://www.ibiblio.org/gutenberg/ mirror.update.freq: daily mirror.comment: Main Project Gutenberg Collection Site
mirror.name: University of North Carolina - FTP mirror.city: Chapel Hill mirror.state: North Carolina mirror.country: USA mirror.gridref: 0/+1000,-1000 mirror.url: ftp://ibiblio.org/pub/docs/books/gutenberg/ mirror.update.freq: daily mirror.comment: Main Project Gutenberg FTP Site -- If it doesn't allow access, please try the corresponding HTTP site above
----end of example
Most of the fields should be self explaining, though for the less obvious: - search: values would be mirror or master. This is important if only the master offers a search facility - mirror.gridref: the grid coordinates of the mirror. The slash is there for a future use, such as defining planet ID as prefix. The grid ref would always be the last child. I know this is overkill, and probably no one will take this seriously, but I would like to make this future proof, if there is no extra cost. - mirror.update.freq: how oftern the mirror is updated (should this be a numerical, textual value or both?)
Some sites mirror several others, so the site would probably need more than one mirror file. Two suggestions are to have the additional mirror files have a numeric suffix, e.g. mirrors.txt, mirrors2.txt, etc. or to have a mirrors.txt file that refers to the other mirror.txt files.
Also, search engines, such as Google, could make use of this information to tie together mirrors under one link, to make for smarter navigation. Something such as:
PROJECT GUTENBERG - Project Gutenberg is the Internet's oldest producer of FREE electronic books (eBooks or eTexts). gutenberg.org/ - 18k - Master - Closest Mirror - Other Mirrors
This is a first jab at something that could well be of use, so I would certainly appreciate your comments and whether this is something that could be added as a web standard?
regards
Andre
P.S. I am not associated with Project Gutenberg, I am just using it as a useful example of real site that could benefit from such a solution.
-- Justin Chapweske, Onion Networks http://onionnetworks.com/
