The only thing that I don't like about this is that normal HTTP mirroring is very insecure.


Our work with the "Content-Addressable Web" uses secure checksums and some HTTP extensions to provide an alternate way of solving the mirror problem. Our paper on "HTTP Extensions for a Content-Addressable Web" can be found at (http://open-content.net/specs/draft-jchapweske-caw-03.html).

You may also be interested in its companion specification, "The Tree Hash EXchange format (THEX)" at (http://open-content.net/specs/draft-jchapweske-thex-02.html).

We also have a very basic XML-RPC protocol for lease-based mirror advertisement at (http://open-content.net/specs/).

Also, there is a functioning Content-Addressable Web header proxy that you can feel free to play around with. Its currently used for the Open Content Network and can be used as follows:

bash$ HEAD http://gw1.open-content.net:8080/gateway/head?uri=http://etree01.archive.org/etree/moe1997-03-28dnk.shnf/moe1997-03-28d1/moe1997-03-28d1t06.shn

200 OK
Date: Sat, 18 Jan 2003 00:15:05 GMT
Accept-Ranges: bytes
Server: TornadoGateway/1.0 (http://onionnetworks.com/; i386; Linux)
Content-Length: 78691253
Content-MD5: 84lI1a9IFPJq7jb3YG3m9Q==
Content-Type: audio/shn
ETag: "3360009-4b0bbb5-3d8b447e"
Last-Modified: Fri, 20 Sep 2002 15:53:34 GMT
Client-Date: Mon, 31 Mar 2003 18:50:22 GMT
Client-Peer: 209.237.232.89:8080
X-Content-URN: urn:md5:6OEURVNPJAKPE2XOG33WA3PG6U
X-Content-URN: urn:sha1:VTHQINIP3JUPJIMMC5RLVZSEFKMQ5KLX
X-Content-URN: urn:tree:tiger:S6SMQPZXUD7G54ZPIJMXJPN7JAABQXM2ZCKIUEQ
X-First-Bytes: 616a6b6702fbb17009f9255952a4d1a8dc48766a1157a0d5a8b66b6dd241108040201018040a0144d64020110d8c0a0104804420164b0dd2c3a08766a11ec0070000000b8000622efb1fb66659b36d85b45d6d77d3f081756652a563b41c94dbc24ce97b31fd3e4094415a862558d6756102e987170e9c591f2a5428dcc84bc43b21554c1444fe9a306fa2e9450125e78931c15f346cc6597762d6557c68623bc99254bdeaaf470888a9e104d631cca938cf0132314e7547b94069c86106060ea012e8d9c4c3211e99b4d3618070e33359a76670f85cc449e08468ec15ecf4e64e03d3dfb976c324444a9cf31ec599682060769e4e23bf9fce1ad3ffef94be4b
X-Observed-IP: 24.118.168.169
X-Thex-URI: http://gw1.open-content.net:8080/gateway/thex?uri=http://etree01.archive.org/etree/moe1997-03-28dnk.shnf/moe1997-03-28d1/moe1997-03-28d1t06.shn;S6SMQPZXUD7G54ZPIJMXJPN7JAABQXM2ZCKIUEQ




Andre John Mas wrote:

Hi,


Mirroring a web site or ftp site is a great way of reducing load
and improving access times. The only thing though is that there is
no method for telling a web browser to automatically go to a mirror.
For this reason I have been thinking that a 'mirrors.txt' file might
be of use at the root of a web site that is either the master or a
mirror, in the same way that a robot.txt file is made available.

Follows is an example of what the contents of such a file would contain:

----start of example
#this is a comment

title:   Project Gutenberg
description: Project Gutenberg is the Internet's oldest producer of FREE
  electronic books (eBooks or eTexts).
master:  http://gutenberg.net/
search:  master

mirror.name: University of North Carolina - HTTP
mirror.city: Chapel Hill
mirror.state: North Carolina
mirror.country: USA
mirror.gridref:
mirror.url: http://www.ibiblio.org/gutenberg/
mirror.update.freq: daily
mirror.comment: Main Project Gutenberg Collection Site

mirror.name: University of North Carolina - FTP
mirror.city: Chapel Hill
mirror.state: North Carolina
mirror.country: USA
mirror.gridref: 0/+1000,-1000
mirror.url: ftp://ibiblio.org/pub/docs/books/gutenberg/
mirror.update.freq: daily
mirror.comment: Main Project Gutenberg FTP Site -- If it doesn't allow
  access, please try the corresponding HTTP site above

----end of example

Most of the fields should be self explaining, though for the less
obvious:
 - search: values would be mirror or master. This is important if
   only the master offers a search facility
 - mirror.gridref: the grid coordinates of the mirror. The slash
   is there for a future use, such as defining planet ID as prefix.
   The grid ref would always be the last child. I know this is
   overkill, and probably no one will take this seriously, but I
   would like to make this future proof, if there is no extra cost.
 - mirror.update.freq: how oftern the mirror is updated (should this
   be a numerical, textual value or both?)

Some sites mirror several others, so the site would probably need more
than one mirror file. Two suggestions are to have the additional mirror
files have a numeric suffix, e.g. mirrors.txt, mirrors2.txt, etc. or
to have a mirrors.txt file that refers to the other mirror.txt files.

Also, search engines, such as Google, could make use of this information
to tie together mirrors under one link, to make for smarter navigation.
Something such as:

  PROJECT GUTENBERG -
  Project Gutenberg is the Internet's oldest producer of FREE
  electronic books (eBooks or eTexts).
  gutenberg.org/ - 18k - Master - Closest Mirror - Other Mirrors

This is a first jab at something that could well be of use, so I would
certainly appreciate your comments and whether this is something that
could be added as a web standard?

regards

Andre

P.S. I am not associated with Project Gutenberg, I am just using it as
a useful example of real site that could benefit from such a solution.




--
Justin Chapweske, Onion Networks
http://onionnetworks.com/



Reply via email to