> > I think an alternative solution is to give each header it's own bucket.
> 
> A filter may need to gather up a bunch of headers until it finds the one it
> wants, then modify the headers, then pass them all along. Using a hash makes
> it much easier to manipulate the headers once you've got them inhand.

I was thinking both models could live in the same chain, before and after
a "collector" that would simply pull in all the headers until it got an
"EOHeaders" bucket, at which time it would pass a hash of all the headers
as a single bucket to filters behind it that want it.

Just an idea, I'm not implying this model would at all be practical. :)

> IMO, a header per bucket is fine only for limited header operations.
> 
> > This may seem like significant overhead at first, but it actually isn't
> > that bad, and it has some good side-effects as well:
> > 
> >  - we don't have to do memcopys everywhere, we can do some trickery with
> >    string pointers to the original buffers (and even counted strings).
> 
> But we *will* have to copy it when we need that header name as a
> null-terminated string. Think about all the various code that could use a
> name (logging, control tests, etc). We could end up making quite a few
> copies.

You're probably right. And I like this next idea:

> Note that we could also use a suggestion from Roy a billion years ago --
> define a particular (integer) token for "common" or "known" headers. For
> example:
> 
> #define SERF_TKN_CONTENT_LENGTH  101
> 
> Next, we add APIs to access an apr_hash_t with "pre-hashed" keys. (which is
> a good thing in its own right; an app might have a more optimal hash
> function for certain types of keys)
> 
> Then, an app can simply do:
> 
>     hdr = apr_hash_get_h(hdrs, SERF_TKN_CONTENT_LENGTH)
> 
> Internally, that just indexes right into the hash table array (mod the table
> size, of coures).
> 
> [ there is actually quite a bit more gunk that would need to be layed on top
>   of the hash table to support this well. most of that becomes obvious once
>   you consider that each header value would probably appear in the hash
>   table using both a token key and a string key ]

Using a predefined Header Token ID is definately the way to go. We can get
some really neat tricks going here to make lookups really fast.

> >  - we allow filters to get at the headers sooner (faster responding
> >    client apps).
> 
> If we read a big chunk off the network, then we'd process the whole batch at
> once, place them into a brigade (in some form, which is what we're
> discussing here), and then send it into the filter stack. A filter that
> examines the headers will have everything right away. Thus, I don't really
> see this point.

Hmmm. A brigade is handy when you have more than one "unit" (aka bucket)
of data to pass down the filter chain, but that only works in my mind for
the push model. In the pull model, the lower filter says "hey, give me
a bucket" so you don't pass it a brigade. Does this make sense?

I've been piecing together a buckets/filter design in my head over
the last few weeks that may materialize Any Day Now. :) My hope is to
solve some of the issues we've been facing over on httpd where all of
our filters must define their requests by byte size, even if the unit
of data that they deal with has nothing to do with bytes (ie, "give me
the header hash", or "give me the body"). It'll take some more time to
sort out a coherent idea, though.

-aaron

Reply via email to