On Tue, Oct 02, 2001 at 07:09:57AM -0700, Aaron Bannert wrote:
> On Mon, Oct 01, 2001 at 11:35:10PM -0700, Greg Stein wrote:
>...
> > I believe a header bucket is simply an apr_hash_t of headers. Adding and
> > looking up headers requires a bit of extra glue to lower-case the header
> > name. Using a hash means it is very easy to set aside the headers (waiting
> > for the end-of-header bucket so a header can analyze the full set of hdrs),
> > manipulate them, and to gather them up (we need to add a "merge" function to
> > the hash table, although merging headers requires a bit more than simple
> > overlap).
> 
> I think an alternative solution is to give each header it's own bucket.

A filter may need to gather up a bunch of headers until it finds the one it
wants, then modify the headers, then pass them all along. Using a hash makes
it much easier to manipulate the headers once you've got them inhand.

IMO, a header per bucket is fine only for limited header operations.

> This may seem like significant overhead at first, but it actually isn't
> that bad, and it has some good side-effects as well:
> 
>  - we don't have to do memcopys everywhere, we can do some trickery with
>    string pointers to the original buffers (and even counted strings).

But we *will* have to copy it when we need that header name as a
null-terminated string. Think about all the various code that could use a
name (logging, control tests, etc). We could end up making quite a few
copies.

>  - we avoid the big constants in dealing with the hash.

We don't know if that is going to be a problem. In fact, the hashes could be
much faster than alternatives.

Note that we could also use a suggestion from Roy a billion years ago --
define a particular (integer) token for "common" or "known" headers. For
example:

#define SERF_TKN_CONTENT_LENGTH  101

Next, we add APIs to access an apr_hash_t with "pre-hashed" keys. (which is
a good thing in its own right; an app might have a more optimal hash
function for certain types of keys)

Then, an app can simply do:

    hdr = apr_hash_get_h(hdrs, SERF_TKN_CONTENT_LENGTH)

Internally, that just indexes right into the hash table array (mod the table
size, of coures).

[ there is actually quite a bit more gunk that would need to be layed on top
  of the hash table to support this well. most of that becomes obvious once
  you consider that each header value would probably appear in the hash
  table using both a token key and a string key ]

>  - we allow filters to get at the headers sooner (faster responding
>    client apps).

If we read a big chunk off the network, then we'd process the whole batch at
once, place them into a brigade (in some form, which is what we're
discussing here), and then send it into the filter stack. A filter that
examines the headers will have everything right away. Thus, I don't really
see this point.

>  - not all headers happen at the front of the request/response streams.
>    (think footers, not to mention delayed headers like what Roy was talking
>     about for HTTP/2.0 (aka Wakka))

Agreed. There can also be headers inside a multipart body.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

Reply via email to