Re: next version of content-encoding / gzip design doc
Here's yet another design version, following many helpful suggestions from Henrik. Gzip Content-Encoding in Squid Design Version Choice The goal will be to get these changes into Squid3 HEAD. Content-Encoding Protocol The content-encoding protocol is describedi Header field cases from client: If Accept-Encoding field is present in client request If there is a cached response aleady available, and it contains a Content-Encoding field with encodings that are a subset of what the client accepts Then forward response to client unchanged Else (no cached response with right content-encoding) If uncoded response isn't available Then forward client request to server/cache If server/cache response contains Content-Encoding field Then forward new response to client Else (server/cache response doesn't have Content-Encoding) Then encode client response Send encoded response to client Else (uncoded server response already available) Then encode uncoded response Send encoded response to client Else (no Accept-Encoding in client request) If uncoded server response already available Forward unchanged to client Else if coded server response already available Then decode server response send decoded response to client Else (no response available yet) Then forward request to client or cache, and behave unchanged with respect to this protocol. There will be no explicit links between objects that are different links to the same coding. Instead, StoreKeys of coded objects will be chosen particularly as MD5(OriginalStoreKey,Content-Encoding). This would allow one to derive the StoreKeys of all possible encodings including original if only knowing the original StoreKey and not the requested URL. Searching for an uncoded version of an object is done by generating an uncoded StoreKey and looking for an object with that key. It's needed upon cache miss (see protocol above). Upon original or encoded object update or PURGE, delete all the possible encoding variants. As the encodings are applied locally the possible combinations are known and finite so there is no problem on purging all at once. If the number of encodings grows nontrivially, we may need to add an additional mechanism to keep that check under control. Original-update deletion will be triggered on swapout of a new original object (when it gets a public key). Etags: Encoded objects will be given unique new entity tags. There will be a configuration option to turn off content-encoding. Content-Encoding Implementation New HttpHdrContCode module, that parses related HTTP headers, and arranges for encoding or decoding appropriately. Includes the following functions: codeParseRequest(): Called from client_side:parseHttpRequest() after clientStreamInit() call. Checks for and parses Allow-Encoding headers. Instantiates content_coding appropriately, and calls codeClientStreamInit(). codeClientStreamInit(): Adds a new node to clientStream with codeStreamRead(), codeStreamCallback(), and codeStreamStatus() functions. codeStreamCallback()set up encoding/decoding state depending on combination of Content-Encoding and Allow-Encoding fields seen. codeStreamRead(): call HttpContentCoder transformation functions appropriately. codeStreamStatus(): report status to stream. New HttpContentCoder abstract type, with functions: encodeStart() encodeEnd() encodeChunk() decodeStart() decodeEnd() decodeChunk() New per-coded-object ContentCoderState, to handle coding state. It'll be referenced from the clientStream, and include fields: HttpContentCoder *coder off_t codedOffset Objects will be stored both in unencoded and encoded formats. An object will stay in the format in which Squid receives it until requested by a client requesting a different Content-Encoding which Squid supports (this could be immediate). Once this happens, the object will be streamed coded into a different StoreEntry and on to the client. Other changes needed: Add new content_coding field to HttpReply. New httpHeaderGetContentEncoding(HttpReply *) function in HttpHeader.cc. A new configuration flag to turn content-encoding off, if desired. A new object flag, encoded. Whenever an encoded or decoded object is created, it's tagged as encoded. Thus, a locally redecoded object will be obviously so. A new store.cc function, storeDeleteCodedCopies(), will do the deletion of all (un)coded copies described above. Gzip A new GzipContentCoder module, which will be an instance of HttpContentCoder. Data encoding will be handled by the gzip.org a href=http://www.gzip.org/zlib/ zlib library/a. Functions: gzEncodeStart: call
Re: next version of content-encoding / gzip design doc
Henrik Nordstrom wrote: On Fri, 5 Mar 2004, Jon Kay wrote: If Accept-Encoding field is present in client request If server or cache response contains Content-Encoding field with encodings that are a subset of what the client accepts This must be relaxed to just contains a Content-Encoding field, ignoring if it is acceptable by the client. If not you run into ugly corner cases if the server ignores what the client accepts. OOPS. I misstated this test. It SHOULD be: If Accept-Encoding field is present in client request If there is a cached response aleady available, and it contains a Content-Encoding field with encodings that are a subset of what the client accepts Then forward response to client unchanged Else (no cached response with right content-encoding) ... otherwise the same.
Re: next version of content-encoding / gzip design doc
Henrik Nordstrom wrote: On Wed, 3 Mar 2004, Jon Kay wrote: Because current browser implementations treat Content-Encoding much as though it was Transfer-Encoding, we will implement Content-Encoding and Accept-Encoding as though they were actually the Transfer-Encoding and TE described in the HTTP specifications. This part I do not understand. Coontent-Encoding and Transfer-Encoding is fundamentally different in their operation far beyond the hop-by-hop vs end-to-end difference. You can not interchange one for the other. It is not safe to assume a clients accepts gzip TE only because they accept gzip content-encoding. For one thing the message format is completely different. Etags of replies encoded by Squid will be modified to turn them into weak tags if they are not already so. Why to you oppose creating new unique ETags? There will be a configuration option to turn off content-encoding. Granted, and this will default off in the standard distribution, as any other option which violates the semantically transparent HTTP proxy requirements. Content-Encoding Implementation No comments there. Objects will be stored both in unencoded and encoded formats. An object will stay in the format in which Squid receives it until requested by a client requesting a different Content-Encoding which Squid supports (this could be immediate). Once this happens, the object will be streamed coded into a different StoreEntry and on to the client. Ok. A new store_dup module will be created to manage dup store_entries and make sure duplicate entries are invalidated when a new version of an object is read. It consists of a circular list of StoreEntry pointers named dupnext and dupprev When a new duplicate encoding (or decoding) of an object is created, it's added to the list. When any StoreEntry is invalidated or updated, all dups are invalidated. Looks a little too complex to me. Wouldn't something simpler like the following work: Modify the store key to account for content encoding. Add a internal meta object listing the known content encodings of a given object. When a new encoding is added rewrite this object to add the new encoding name. On cache hits, iterate over the known acceptable encodings until a match is found in the cache. In recoded objects include a meta header indicating the identity of the original object and disregard the recoded object on a cache hit if it no longer matches the original. From what I can tell the above would also work for adding server-driven Content-Encoding negotiation to the proxy to complement the use of Vary (which most mod_gzip servers do not support btw). Regards Henrik
Re: next version of content-encoding / gzip design doc
Coontent-Encoding and Transfer-Encoding is fundamentally different in their operation far beyond the hop-by-hop vs end-to-end difference. You can not interchange one for the other. It is not safe to assume a clients accepts gzip TE only because they accept gzip content-encoding. For one thing the message format is completely different. Yes. I'm going to try a different tack to explanation / underpinnings. Now I'm going to outline it by case analysis: Protocol: Header field cases from client: If Accept-Encoding field is present in client request If server or cache response contains Content-Encoding field with encodings that are a subset of what the client accepts Then forward response to client unchanged Else (no helpful content-encoding field) If uncoded response isn't available Then forward client request to server/cache If server/cache response contains Content-Encoding field Then forward new response to client Add this response to duplicate list for the object Else (server/cache response doesn't have Content-Encoding) Then encode client response Add encoded response to duplicate list for the object Send encoded response to client Else (uncoded server response already available) Then encode uncoded response Add encoded response to duplicate list for the object Send encoded response to client Else (no Accept-Encoding in client request) If uncoded server response already available Forward unchanged to client Else if coded server response already available Then decode server response add decoded response to duplicate list for the object send decoded response to client Else (no response available yet) Then forward request to client or cache, and behave unchanged with respect to this protocol.
Re: next version of content-encoding / gzip design doc
On Wed, 3 Mar 2004, Jon Kay wrote: Because current browser implementations treat Content-Encoding much as though it was Transfer-Encoding, we will implement Content-Encoding and Accept-Encoding as though they were actually the Transfer-Encoding and TE described in the HTTP specifications. This part I do not understand. Coontent-Encoding and Transfer-Encoding is fundamentally different in their operation far beyond the hop-by-hop vs end-to-end difference. You can not interchange one for the other. It is not safe to assume a clients accepts gzip TE only because they accept gzip content-encoding. For one thing the message format is completely different. Etags of replies encoded by Squid will be modified to turn them into weak tags if they are not already so. Why to you oppose creating new unique ETags? There will be a configuration option to turn off content-encoding. Granted, and this will default off in the standard distribution, as any other option which violates the semantically transparent HTTP proxy requirements. Content-Encoding Implementation No comments there. Objects will be stored both in unencoded and encoded formats. An object will stay in the format in which Squid receives it until requested by a client requesting a different Content-Encoding which Squid supports (this could be immediate). Once this happens, the object will be streamed coded into a different StoreEntry and on to the client. Ok. A new store_dup module will be created to manage dup store_entries and make sure duplicate entries are invalidated when a new version of an object is read. It consists of a circular list of StoreEntry pointers named dupnext and dupprev When a new duplicate encoding (or decoding) of an object is created, it's added to the list. When any StoreEntry is invalidated or updated, all dups are invalidated. Looks a little too complex to me. Wouldn't something simpler like the following work: Modify the store key to account for content encoding. Add a internal meta object listing the known content encodings of a given object. When a new encoding is added rewrite this object to add the new encoding name. On cache hits, iterate over the known acceptable encodings until a match is found in the cache. In recoded objects include a meta header indicating the identity of the original object and disregard the recoded object on a cache hit if it no longer matches the original. From what I can tell the above would also work for adding server-driven Content-Encoding negotiation to the proxy to complement the use of Vary (which most mod_gzip servers do not support btw). Regards Henrik
Re: next version of content-encoding / gzip design doc
Hi there, I'm back with this task (again). Jon: you are far more advanced than I am on understanding squid. I can start helping content compression writing GzipCoder, if you want to. (Already discussed) About TE/Transfer-Encoding vs Accept-Encoding/Content-Encoding: Content-Encoding implementation (even if it bends standards) seems to be the reasonable choice, since TE/Transfer-Encoding is not available on most common browsers. If implemented as Content-Encoding the following headers should be altered before encoding: Content-Length: deleted (could be updated to actual gzipped content length, but it is too much trouble i guess). ETag: modified (appending CEgz, for instance?) Vary: append Accept-Encoding (if not already there) Connection: if client is 1.1, could be set to keep-alive, but Transfer-Encoding chunked should be added/checked. Hope this provides some light about possible encoding options. Regards, -- Gonzalo Arana Ingenieria UOLSinectis Florida 537 Piso 6, Buenos Aires, Argentina +54-11-4321-9110 ext 2543 http://www.uolsinectis.com.ar/