Re: cvs commit: apr-serf/docs notes-filter-chains.txt

Aaron Bannert 29 Aug 2002 01:26:08 -0000

On Wed, Aug 28, 2002 at 05:48:46PM -0700, Greg Stein wrote:
> However, it may be important to note that your "give me FOO" is based on the
> "pull" model. IMO, we should encourage more "push" model development. The
> latter is usually much more efficient.
> 
> One comment about the "give me FOO". How does a caller specify FOO, and how
> does it know that the filter stack can respond appropriately? Is it possible
> for a stack to return "I don't know what FOO" means? This would be quite
> possible, because I have got to imagine that FOO needs to be dynamic -- that
> we cannot possibly specify all types of FOO-ness in the serf library.
> 
> Is passing a request for FOO to the next filter the right behavior, if you
> don't understand it? But that doesn't seem to make much sense. If NEXT
> returns a FOO, then SELF needs to apply its filter operation before
> returning it to the caller. At that point, it would be FOO'. Is that still
> valid for the caller? I would think not. The caller wanted FOO. You may have
> altered FOO in appropriate ways.
> 
> Thus, I think if a filter F doesn't not understand a type-request of FOO,
> then it must bail. Of course, this now means that the app needs to have some
> kind of fallback request types. Where does it end? Is there a minimum set of
> types that a filter must be able to respond to?
> 
> Aaron: your idea, you get to solve it :-)


:) Let me take whack at this. This is what has been floating around
in my brain for the last half-year or so...:


1) Let's get away from the notion of Push and Pull, and go to the idea
  where the app drives the filter chain, and each filter is given
  control of a piece of data for a short period of time.

2) Filters register themselves with all possible* inputs and outputs.
   (* See discussion on hierarchial abstract-datatype system below.)

3) The chain of filters now forms into a graph (with help from this registry
  or some external mechanism). Since the dataflow is no longer linear through
  this graph, we probably shouldn't call it a filter, but instead let's call
  it a "filter graph". For any other graph theory geeks out there, this
  graph happens to be a directed acyclic graph (DAG).

4) All graphs have by default two filters, a SOURCE and a SINK. The SOURCE
  filter produces a single TRANSACTION (for lack of a better term),
  and a SINK consumes the same.

5) New filters are added into this graph, specifying all possible inputs
  and outputs.

6) The application then uses the filter graph. If there are any unused inputs
  or outputs, there is an immediate runtime error. The reason this is deferred
  until the filter graph is actually used is so that the contents of the
  graph can be modified at any time. (One can imagine wanting to add or
  remove certain filters after a particular filter graph has been used, so
  why limit ourselves here?)

7) In order to achieve filter implementations that are agnostic of both
  push and pull, the filter graph simply takes the data types and applies
  them only to the next appropriate filter. That filter then decides one
  of three things: 1) to pass the original data, 2) to consume the data
  and produce nothing, 3) to consume the data and produce one or more
  new data thingies*, or 4) to fail.

  (*"data thingies" will not be the final name, but I can't think of
    anything better at the moment. :)


Let's take a really simple example: Serving a static file via HTTP req/resp:

 - The types of filters I could imagine being involved here would be:

  1. SOURCE - consumes nothing, produces TRANSACTION
  2. SOCKET_READER - consumes TRANSACTION, produces HEADERS
  3. RESPONSE_DISPATCHER - consumes HEADERS, produces a bunch of stuff,
                           one of which is a RESPONSE_FILE.
  4. STATIC_FILE_HANDLER - consumes a bunch of things, one of which is a
                           RESPONSE_FILE*, produces RESPONSE_HEADERS and
                           a FILE_DESCRIPTOR.
  5. SOCKET_WRITER - consumes all sorts of byte-oriented data types, including
                     RESPONSE_HEADERS and FILE_DESCRIPTORS (using sendfile
                     or mmap()+writev() to do the actual response), produces
                     a TRANSACTION.
  6. SINK - consumes TRANSACTION, produces nothing

Now this is a simple linear example, so it doesn't illustrate the
inherent parallelism in this model, but one could easily imagine at
this point where new types of handlers fit in. The best way to think
about this in my mind is by layers (aka resolution):

 low res:    SOURCE                    -->                        SINK
                \                                                 /
  2x res:      SOCKET_READER           -->              SOCKET_WRITER
                    \                                     /
  3x res:          RESPONSE_DISPATCHER --> STATIC_FILE_HANDLER

New layers are added in sets of at least 2, typically more. This gives
us multiplicity *and* allows us to reuse high-level concepts at the
right level in our graph.



Type System:

So now we're going to have an explosion of new data types, how do we
manage this? My OOP training would love to be able to use a whole
inheritance system to deal with this, and although we don't have that
explicitly in the C language, I think that's what we're going to end
up having. Imagine something like this:

                                DATA_THINGY
                                    /\
                                   /  \
                            METADATA  DATA
                              /\       /\
                             /  \     /  \
                         { HTTP,    { SOCKET,
                            FTP,      FILE,
                            ... }     HEAP,
                                      ... }


I think you get the idea.

So a filter could say "I consume DATA, I produce a TRANSACTION". Or
even better, a filter could say "I consume HTTP_HEADERs, and I produce
HTTP_RESPONSEs". A hierarchal type system for abstract data types
is crutial to this whole design.


Ok, it's time for dinner, enough bloodletting for this evening. I had
hoped to develop this more before ApacheCon, and to try to present on
something similiar while at the hackathon, but now is as good a time
as any to try to get some feedback and try to develop some more of
the details.

-aaron

Re: cvs commit: apr-serf/docs notes-filter-chains.txt

Reply via email to