On Wed, Aug 28, 2002 at 05:48:46PM -0700, Greg Stein wrote:
> However, it may be important to note that your "give me FOO" is based on the
> "pull" model. IMO, we should encourage more "push" model development. The
> latter is usually much more efficient.
>
> One comment about the "give me FOO". How does a caller specify FOO, and how
> does it know that the filter stack can respond appropriately? Is it possible
> for a stack to return "I don't know what FOO" means? This would be quite
> possible, because I have got to imagine that FOO needs to be dynamic -- that
> we cannot possibly specify all types of FOO-ness in the serf library.
>
> Is passing a request for FOO to the next filter the right behavior, if you
> don't understand it? But that doesn't seem to make much sense. If NEXT
> returns a FOO, then SELF needs to apply its filter operation before
> returning it to the caller. At that point, it would be FOO'. Is that still
> valid for the caller? I would think not. The caller wanted FOO. You may have
> altered FOO in appropriate ways.
>
> Thus, I think if a filter F doesn't not understand a type-request of FOO,
> then it must bail. Of course, this now means that the app needs to have some
> kind of fallback request types. Where does it end? Is there a minimum set of
> types that a filter must be able to respond to?
>
> Aaron: your idea, you get to solve it :-)
:) Let me take whack at this. This is what has been floating around
in my brain for the last half-year or so...:
1) Let's get away from the notion of Push and Pull, and go to the idea
where the app drives the filter chain, and each filter is given
control of a piece of data for a short period of time.
2) Filters register themselves with all possible* inputs and outputs.
(* See discussion on hierarchial abstract-datatype system below.)
3) The chain of filters now forms into a graph (with help from this registry
or some external mechanism). Since the dataflow is no longer linear through
this graph, we probably shouldn't call it a filter, but instead let's call
it a "filter graph". For any other graph theory geeks out there, this
graph happens to be a directed acyclic graph (DAG).
4) All graphs have by default two filters, a SOURCE and a SINK. The SOURCE
filter produces a single TRANSACTION (for lack of a better term),
and a SINK consumes the same.
5) New filters are added into this graph, specifying all possible inputs
and outputs.
6) The application then uses the filter graph. If there are any unused inputs
or outputs, there is an immediate runtime error. The reason this is deferred
until the filter graph is actually used is so that the contents of the
graph can be modified at any time. (One can imagine wanting to add or
remove certain filters after a particular filter graph has been used, so
why limit ourselves here?)
7) In order to achieve filter implementations that are agnostic of both
push and pull, the filter graph simply takes the data types and applies
them only to the next appropriate filter. That filter then decides one
of three things: 1) to pass the original data, 2) to consume the data
and produce nothing, 3) to consume the data and produce one or more
new data thingies*, or 4) to fail.
(*"data thingies" will not be the final name, but I can't think of
anything better at the moment. :)
Let's take a really simple example: Serving a static file via HTTP req/resp:
- The types of filters I could imagine being involved here would be:
1. SOURCE - consumes nothing, produces TRANSACTION
2. SOCKET_READER - consumes TRANSACTION, produces HEADERS
3. RESPONSE_DISPATCHER - consumes HEADERS, produces a bunch of stuff,
one of which is a RESPONSE_FILE.
4. STATIC_FILE_HANDLER - consumes a bunch of things, one of which is a
RESPONSE_FILE*, produces RESPONSE_HEADERS and
a FILE_DESCRIPTOR.
5. SOCKET_WRITER - consumes all sorts of byte-oriented data types, including
RESPONSE_HEADERS and FILE_DESCRIPTORS (using sendfile
or mmap()+writev() to do the actual response), produces
a TRANSACTION.
6. SINK - consumes TRANSACTION, produces nothing
Now this is a simple linear example, so it doesn't illustrate the
inherent parallelism in this model, but one could easily imagine at
this point where new types of handlers fit in. The best way to think
about this in my mind is by layers (aka resolution):
low res: SOURCE --> SINK
\ /
2x res: SOCKET_READER --> SOCKET_WRITER
\ /
3x res: RESPONSE_DISPATCHER --> STATIC_FILE_HANDLER
New layers are added in sets of at least 2, typically more. This gives
us multiplicity *and* allows us to reuse high-level concepts at the
right level in our graph.
Type System:
So now we're going to have an explosion of new data types, how do we
manage this? My OOP training would love to be able to use a whole
inheritance system to deal with this, and although we don't have that
explicitly in the C language, I think that's what we're going to end
up having. Imagine something like this:
DATA_THINGY
/\
/ \
METADATA DATA
/\ /\
/ \ / \
{ HTTP, { SOCKET,
FTP, FILE,
... } HEAP,
... }
I think you get the idea.
So a filter could say "I consume DATA, I produce a TRANSACTION". Or
even better, a filter could say "I consume HTTP_HEADERs, and I produce
HTTP_RESPONSEs". A hierarchal type system for abstract data types
is crutial to this whole design.
Ok, it's time for dinner, enough bloodletting for this evening. I had
hoped to develop this more before ApacheCon, and to try to present on
something similiar while at the hackathon, but now is as good a time
as any to try to get some feedback and try to develop some more of
the details.
-aaron