Re: [MERGE] Bug 419: Hop by Hop headers MUST NOT be forwarded (attempt 2)
Fubar. Real bundle coming soon. Amos -- Please be using Current Stable Squid 2.7.STABLE5 or 3.0.STABLE11 Current Beta Squid 3.1.0.3
[MERGE] Bug 419: Hop by Hop headers MUST NOT be forwarded (attempt 2)
This attempt builds on Henriks re-work of the client-request to server-request cloning done since the last attempt was made at closing this bug. Adds all RFC 2616 listed Hop-by-hop headers to the clone selection test as 'ignore' cases unless otherwise handled already. The test for whether they exist in Connection: is moved to the default case as an inline. Which reduces the code a fair bit and prevents the side case where a specially handled header gets ignored because the client explicitly added it to Connection: when it did not have to. This method sets up a background default of not passing the hop-by-hop headers while allowing any code which explicitly sets or copies the headers across to operate as before without interference. # Bazaar merge directive format 2 (Bazaar 0.90) # revision_id: squ...@treenet.co.nz-20090120085104-4jxpmbmnuowss2za # target_branch: file:///src/squid/bzr/trunk/ # testament_sha1: 206a5d4bc37533ae3fef2dd39d4516202227a931 # timestamp: 2009-01-20 21:52:26 +1300 # base_revision_id: squ...@treenet.co.nz-20090118082924-\ # c72yl5o47sx7e7fd # # Begin patch === modified file 'src/HttpHeaderTools.cc' --- src/HttpHeaderTools.cc 2008-12-24 13:35:21 + +++ src/HttpHeaderTools.cc 2009-01-20 08:51:04 + @@ -182,7 +182,7 @@ return res; } -/* returns true iff m is a member of the list */ +/** returns true iff m is a member of the list */ int strListIsMember(const String * list, const char *m, char del) { === modified file 'src/HttpRequest.cc' --- src/HttpRequest.cc 2009-01-08 13:45:29 + +++ src/HttpRequest.cc 2009-01-20 08:51:04 + @@ -326,6 +326,7 @@ header.len + 2; } +#if DEAD_CODE // 2009-01-20: inlined this with its ONLY caller (copyOneHeader...) /** * Returns true if HTTP allows us to pass this header on. Does not * check anonymizer (aka header_access) configuration. @@ -341,6 +342,7 @@ return 1; } +#endif /* sync this routine when you update HttpRequest struct */ void === modified file 'src/http.cc' --- src/http.cc 2009-01-13 05:28:23 + +++ src/http.cc 2009-01-20 08:51:04 + @@ -72,8 +72,8 @@ static const char *const crlf = \r\n; static void httpMaybeRemovePublic(StoreEntry *, http_status); -static void copyOneHeaderFromClientsideRequestToUpstreamRequest(const HttpHeaderEntry *e, String strConnection, HttpRequest * request, HttpRequest * orig_request, -HttpHeader * hdr_out, int we_do_ranges, http_state_flags); +static void copyOneHeaderFromClientsideRequestToUpstreamRequest(const HttpHeaderEntry *e, const String strConnection, HttpRequest * request, const HttpRequest * orig_request, +HttpHeader * hdr_out, const int we_do_ranges, const http_state_flags); HttpStateData::HttpStateData(FwdState *theFwdState) : AsyncJob(HttpStateData), ServerStateData(theFwdState), lastChunk(0), header_bytes_read(0), reply_bytes_read(0), httpChunkDecoder(NULL) @@ -1647,20 +1647,22 @@ strConnection.clean(); } +/** + * Decides whether a particular header may be cloned from the received Clients request + * to our outgoing fetch request. + */ void -copyOneHeaderFromClientsideRequestToUpstreamRequest(const HttpHeaderEntry *e, String strConnection, HttpRequest * request, HttpRequest * orig_request, HttpHeader * hdr_out, int we_do_ranges, http_state_flags flags) +copyOneHeaderFromClientsideRequestToUpstreamRequest(const HttpHeaderEntry *e, const String strConnection, HttpRequest * request, const HttpRequest * orig_request, HttpHeader * hdr_out, const int we_do_ranges, const http_state_flags flags) { debugs(11, 5, httpBuildRequestHeader: e-name.buf() : e-value.buf()); -if (!httpRequestHdrAllowed(e, strConnection)) { -debugs(11, 2, ' e-name.buf() ' header denied by anonymize_headers configuration); -return; -} - switch (e-id) { +/** \title RFC 2616 sect 13.5.1 - Hop-by-Hop headers which Squid should not pass on. */ + case HDR_PROXY_AUTHORIZATION: -/* Only pass on proxy authentication to peers for which +/** \par Proxy-Authorization: + * Only pass on proxy authentication to peers for which * authentication forwarding is explicitly enabled */ @@ -1672,16 +1674,31 @@ break; +/** \title RFC 2616 sect 13.5.1 - Hop-by-Hop headers which Squid does not pass on. */ + +case HDR_CONNECTION: /** \par Connection: */ +case HDR_TE: /** \par TE: */ +case HDR_KEEP_ALIVE: /** \par Keep-Alive: */ +case HDR_PROXY_AUTHENTICATE: /** \par Proxy-Authenticate: */ +case HDR_TRAILERS:/** \par Trailers: */ +case HDR_UPGRADE: /** \par Upgrade: */ +case HDR_TRANSFER_ENCODING: /** \par Transfer-Encoding: */ +break; + + +/** \title OTHER headers I haven't bothered to track down yet. */ + case HDR_AUTHORIZATION: -/* Pass on WWW authentication */ +/** \par WWW-Authorization:
IRC Meetup logs up in the wiki
Hi all, I've slightly polished the IRC logs for Sunday's IRC meetup and put them on the wiki as a reference to all parties interested: http://wiki.squid-cache.org/MeetUps/IrcMeetup-2009-01-17 -- /kinkie
bzr 1.11 on squid-cache.org
bzr has been upgraded to 1.11 on squid-cache.org. This release reporedly includes support for case-insensitive filesystems (i.e. Windows). Regards Henrik
Re: bzr 1.11 on squid-cache.org
Hi, At 13.12 20/01/2009, Henrik Nordstrom wrote: bzr has been upgraded to 1.11 on squid-cache.org. This release reporedly includes support for case-insensitive filesystems (i.e. Windows). Any news on the line ending side ? Regards Guido Regards Henrik - Guido Serassio Acme Consulting S.r.l. - Microsoft Certified Partner Via Lucia Savarino, 1 10098 - Rivoli (TO) - ITALY Tel. : +39.011.9530135 Fax. : +39.011.9781115 Email: guido.seras...@acmeconsulting.it WWW: http://www.acmeconsulting.it/
Buffer/String split, take2
Hello, Kinkie has finished another round of his String NG project. The code is available at https://code.launchpad.net/~kinkie/squid/stringng During code review and subsequent IRC discussion archived at http://wiki.squid-cache.org/MeetUps/IrcMeetup-2009-01-17 it became apparent that the current design makes all participating developers unhappy (for different reasons). We have to revisit the discussion we had in the beginning of this project[1] and put this issue to rest, at last. [1] http://thread.gmane.org/gmane.comp.web.squid.devel/8188 There was not enough developers on the IRC to come to a consensus regarding the best direction, but it was clear that the current design is the worst one considered as it tries to mix at least two incompatible designs together. This email summarizes a few design options we can chose from (none of them matches the current code for the above mentioned reasons). Please voice your opinion: which design would be best for Squid 3.2 and the foreseeable future. * Universal Buffer: Blob = low-level raw chunk of RAM invisible to general code allocates, holds, frees raw RAM buffer can grow the buffer and write to the buffer the memory allocation strategy can change w/o affecting others does not have a notion of content, just allocated RAM Buffer = all-purpose user-level buffer allows users to safely share a Blob instance via COW search, compare, consume, append, truncate, import, export, etc. has (offset, length) to maintain an area of Blob used by this Buffer This design is very similar to std::string. The code gets a universal buffer that can do everything. This is probably the simplest design possible. The primary drawback here is that it would be difficult and messy to optimize different buffering needs in a single Buffer class. For example, I/O buffers usually need to track appended/consumed size and want to optimize (or eliminate) coping when it is time to do the next I/O while some strings are pointing to the old buffer content. Adding that tracking logic and optimizations to generic Buffer would be wrong because it will pollute Buffers used like strings. Similarly, general strings may want to keep encoding information or perform heavy search optimizations. Adding those to generic Buffer would be wrong because it will pollute I/O buffers code. Another example is adding simple but efficient vector I/O support. With a single Buffer, it would be difficult to support vectors because it will clash with string-like usage needs. * Divide and Conquer (DC): Blob = low-level raw chunk of RAM invisible to general code same as Blob in the Universal Buffer approach Buffer = shareable Blob allows users to safely share a Blob instance via COW works with Blob as a whole: not areas (see note below) used as exchange interface between specialized buffers IoBuffer = buffer optimized for I/O needs perhaps should be called IoStream uses Buffer has (appended, consumed) to track I/O progress and area exports available data as a Buffer instance may eventually support vector I/O by using multiple Buffers String = buffer optimized for content manipulation uses Buffer has (offset, length) to maintain a Buffer content area search, compare, replace, append, truncate, import, export, etc. may eventually store content encoding information The killer idea here is that the interpretation of a piece of allocated and shareable RAM (i.e, Buffer) is left to classes that specialize in certain memory manipulations (e.g., I/O or string search). Optimizing or changing one class does not have to affect the other. More specialized classes can be added as needed. Buffer is used to share info between classes. Conversions are explicit and easier to track. We could also add an Area class that makes it possible to store content offset and length when importing or exporting a Buffer. (note) A possible variation of the same design would be to move area manipulation to Buffer. This will free String from area code but force IoBuffers and others to use the same area model instead of appended/consumed counters or whatever they need. This will probably make migration to vectored I/O more complex, but we can deal with it. If DC approach is chosen, we will decide where to put area manipulation: Buffer, String, or perhaps a separate Area class. * Other There are probably other options. I still think we should implement one good design, commit it, and work on converting the code to use it rather than starting with massaging the old code to be easier to convert to something in the future. If you would like to discuss the choice between those two strategies, please start your own thread :-)! So far, my _personal_ interpretation of votes based on the recent IRC discussions and that earlier squid-dev thread[1] is: Universal String: Kinkie, Amos Divide and Conquer: Adrian, Henrik, Alex Do you prefer a Universal
Re: Buffer/String split, take2
2009/1/20 Alex Rousskov rouss...@measurement-factory.com: Please voice your opinion: which design would be best for Squid 3.2 and the foreseeable future. [snip] I'm about 2/3rds of the way along the actual implementation path of this in Cacheboy so I can provide an opinion based on increasing amounts of experience. :) [Warning: long, somewhat rambly post follows, from said experience.] The thing I'm looking at right now is what buffer design is required to adequately handle the problem set. There's a few things which we currently do very stupidly in any Squid related codebase: * storeClientCopy - which Squid-2.HEAD and Cacheboy avoid the copy on, but it exposes issues (see below); * storeAppend - the majority of data coming -into- the cache (ie, anything from an upstream server, very applicable today for forward proxies, not as applicable for high-hit-rate reverse proxies) is still memcpy()'ed, and this can use up a whole lot of bus time; * creating strings - most strings are created during parsing; few are generated themselves, and those which are, are at least half static data which shouldn't be re-generated over and over and over again; * duplicating strings - httpHeaderClone() and friends - dup'ing happens quite often, and making it cheap for the read only copies which are made would be fantastic * later on, being able to use it for disk buffers, see below * later on, being able to properly use it for the memory cache, again see below The biggest problems I've hit thus far stem from the data pipeline from server - memstore - store client - client side. At the moment, the storeClientCopy() call aggregates data across the 4k stmem page size (at least in squid-2/cacheboy, I think its still 4k in squid-3) and thus if your last access gave you half a page, your next access can get data from both the other half of the page and whatever is in the next buffer. Just referencing the stmem pages in 2.HEAD/Cacheboy means that you can (and do) end up with a large number of small reads from the memory store. You save on the referencing, but fail on the work chunk size. You end up having to have a sensible reference counted buffer design -and- a vector list to operate on it with. The string type right now makes sense if it references a contiguous, linear block of memory (ie, a sub-region of a contig buffer). This is how its treated today. For almost all of the lifting inside Squid proper, that may be enough. There may however be a need later on for string-like and buffer-like operations on buffer -vectors- - for example, if you're doing some kind of content scanning over incoming data, you may wish to buffer your incoming data until you have enough data to match that string which is straddling two buffers - and the current APIs don't support it. Well, nothing in Squid supports it currently, but I think its worth thinking about for the longer term. Certainly though, I think that picking a sensible string API with absolutely no direct buffer access out of a few controlled areas (eg, translating a list of strings or list of buffers into an iovec for writev(), for example) is the way to go. That will equip Squid with a decent enough set of tools to start converting everything else which currently uses C strings over to using Squid Strings and eventually reap the benefits of the zero-cost string duplication. Ok, to summarise, and this may not exactly be liked by the majority of fellow developers: I think the benefits that augmenting/fixing the current SquidString API and tidying up all the bad places where its used right now is going to give you the maximum long-term benefit. There's a lot of legacy code right now which absolutely needs to be trashed and rewritten. I think the smartest path forward is to ignore 95% of the decision about deciding which buffering method to use for now, fix the current String API and all the code which uses it so its sensible (and fixing it so its sensible won't take long; fixing the code which uses it will take longer) and at that point the codebase will be in much better shape to decide which will be the better path forward. Now, just so people don't think I'm stirring trouble, I've gone through this myself in both a squid-2 branch and Cacheboy, and here's what I found: * there's a lot of code which uses C strings created from Strings; * there's a lot of code which init'ed strings from C strings, where the length was already known and thrown out; * there's a lot of code which init'ed strings from C strings which were once Strings; * there's even code which init's strings -from- a string, but only by using strBuf(s) (I'm pointing at the http header related code here, ugh) * all the stuff which directly accesses the string buffer code can and should be tossed, immediately - unfortunately there's a lot of it, the majority being in what I gather is very long-lived code in src/client_side.c (and what it became in squid-3) So what I'm sort of doing now in Cacheboy-head, combined