On Thu, 16 Dec 2004, Rod Walker wrote:
I`m hoping to use Squid for the retrieval and caching of large data files(~1GB) for High Energy Physics applications. One of the considerations is the file transfer rates relative to gridFTP, which can use multiple parallel streams to increase the transfer rate.
On googling around a little I found several multi-streamed 'wget-like' http clients, e.g. aget, prozilla, that get transfer speeds comparable to gridftp. These do not respect the http_proxy environment variable and do not use the squid cache, probably for the very good reason that splitting a file into several chunks for transfer will make it very hard to cache.
It is in theory not that hard to cache, but Squid still lacks some of the the needed capabilities. What is needed to deal with this proper is the ability to cache partial objects. There is also some minor implications relating to the ETag header but this is pretty minimal for the specific question.
Doing the stream splitting/merging withing Squid in response to a single request is not that feasible. The problem with this is that the proxy need to delay possibly huge amounts of data to be sent to the client until all data before has been received and sent already. This causes a number of problems in timeouts, buffering, bandwidth usage etc. It is by far best if the client initiates the multi-stream transfer.
Regards Henrik
