Re: [Snap Framework] Adding File Upload Capability

Gregory Collins Sat, 06 Nov 2010 17:39:17 -0700

On Sat, Nov 6, 2010 at 10:00 PM, Dan Getelman <[email protected]> wrote:


> I've looked through the source where Yesod handles this, and it seems their
> approach is to have, when their requests are parsed, the body goes into an
> "IO thunk", such that any POST requests are processed at-most once, but also
> don't necessarily have to be processed at all. The request body is then
> represented as a tuple, where the first element is [(paramName,
> paramValue)], and the second is [(paramName, fileInfo)], where fileInfo is
> the file name, content type, and content.
> (https://github.com/snoyberg/yesod/blob/master/Yesod/Request.hs)

Hey Dan,

Let me give you some more background on our I/O model because that will help
you to understand how to do this. I won't go much into what the form handling
code should look like at this point, but instead I'll just talk generally about
what happens when Snap handles an HTTP request and how the request body is made
available to handler programs.

HTTP POST requests, as you're undoubtedly aware, look like this: (and imagine a
CR/LF at the end of every blank line)

    POST / HTTP/1.1
    Host: foo.com
    Some-Other-Headers: whatever
    Content-Length: 100
    Content-Encoding: multipart/form-data

    ...100 bytes of stuff...

We use the iteratee data model, as you're also aware of. The way that iteratees
work is that Enumerators (functions which produce chunks of data) feed into
Iteratees (functions which consume it). You can think of it like a pipeline. If
you take a look at Snap.Internal.Http.Server.SimpleBackend from snap-server,
you can see the Enumerator that we use to feed data from the socket:

    enumerate :: (MonadIO m) => Connection -> Enumerator m a
    enumerate = loop
      where
        loop conn f = do
            debug $ "Backend.enumerate: reading from socket"
            s <- liftIO $ timeoutRecv conn bLOCKSIZE
            debug $ "Backend.enumerate: got " ++ Prelude.show (B.length s)
                    ++ " bytes from read end"
            sendOne conn f s

        sendOne conn f s = do
            v <- runIter f (if B.null s
                             then EOF Nothing
                             else Chunk $ WrapBS s)
            case v of
              r@(Done _ _)      -> return $ liftI r
              (Cont k Nothing)  -> loop conn k
              (Cont _ (Just e)) -> return $ throwErr e

[[An aside: actually, now that I look at it, we should probably be lifting
"conn" out as a parameter to "enumerate" since it never changes -- will add
that to my todo list]]

To understand this, you need the type of Enumerator (and this is a bit of a
fudge since it's a type alias for a more complicated type in the iteratee
library, but you can think of it this way):

    type Enumerator m a = Iteratee m a -> m (Iteratee m a)

So you hand an enumerator an iteratee, it runs some I/O and gives you an
iteratee back. As you can see, "enumerate" just runs in a loop where it reads a
block of data from the socket (and this is a "read()" call underneath if you're
familiar with Unix socket programming), passes that block to the wrapped
iteratee. If it finishes, by yielding a value with "Done", then we stop sending
it data chunks -- and note that "Done" carries around with it the remainder of
the unread data.

The neat thing about Iteratees is that you can chain them with their Monad
instance; "a >>= b" boils down to "a accepts chunks from the Enumerator until
it produces a value, which gets sent to the b function (along with any unread
input), which accepts chunks from the Enumerator until it produces a
value...". And for the most part this allows you to ignore chunk management and
just deal with handling the input your particular iteratee has to deal with.

For instance, here's an example which shows an iteratee that reads a single
line of input, and a toy program which reads two lines from a stream and
returns them:

------------------------------------------------------------------------------
    import           Control.Monad (replicateM)
    import           Data.ByteString.Char8 (ByteString)
    import qualified Data.ByteString.Char8 as B
    import           Data.Iteratee
    import           Data.Iteratee.WrappedByteString


    type Iteratee m a = IterateeG WrappedByteString Char m a

    readOneLine :: Iteratee IO ByteString
    readOneLine = IterateeG $ f []
      where
        f _ (EOF e@(Just _)) =
            return $ Cont undefined e         -- handle error

        f chunks c@(EOF Nothing) =
            return $ Done (B.concat chunks) c -- eof? give me the line

        f chunks (Chunk (WrapBS s)) = do
            let (a,b) = B.break (=='\n') s
            if B.null b   -- newline in the input? if no, grab more
input, if yes,
                          -- return the line up till the '\n'
              then return $ Cont (IterateeG $ f (chunks ++ [s])) Nothing
              else do
                  let line = B.concat $ chunks ++ [a]
                  let rest = Chunk $ WrapBS $ B.drop 1 b
                  return $ Done line rest


    main :: IO ()
    main = do
        let input = WrapBS $ B.pack "line 1\nline 2\nline 3"
        let enum = enumPure1Chunk input
        xs <- enum (replicateM 2 readOneLine) >>= run

        Prelude.mapM_ B.putStrLn xs
------------------------------------------------------------------------------

If you run "main" you get:

    line 1
    line 2

The point I'm trying to make is that an iteratee can read as much input as it
likes, produce a value, and leave the rest of the stream for some other
iteratee to process. Another interesting consequence of this data model is that
you can do things like take N bytes from the stream and pass it into an
iteratee, transparently decode chunked transfer-encodings or gzipped data, etc
-- and the wrapped iteratee only has to deal with the stream of bytes IT cares
about. Hopefully this is clear so far. (!)

Now what happens with Snap and HTTP requests is this: if you take a look at the
definition of the Snap monad itself, you'll see:

    newtype Snap a = Snap {
          unSnap :: StateT SnapState (Iteratee IO) (Maybe (Either Response a))
    }

In other words, a Snap handler, the **whole thing**, is itself an iteratee
underneath. This may be surprising because it isn't really exposed to the user,
but what is happening is this: the snap server runs an iteratee which peels off
the HTTP protocol goop and hands control to your handler at the point
immediately
following the blank line which ends the headers. (Of course, if
"Transfer-Encoding" is set to "chunked" then it wraps your handler with a
decoder, but that's a quibbling detail.)

So your handler is in a position to read the body of the request; in fact,
there's probably some data left over from the request parsing iteratee already
ready to be fed to you, but if you get a long POST request then what's
happening is that **you haven't actually read the entire request yet** when
your handler runs -- you might have to read() a few megabytes of data
even. This is also what would allows us to stream in O(1) space, we can read
chunks of the input as they come in and dispatch them without holding the whole
thing in memory.

Of course, Snap does some stuff for you to ensure that this is all "safe" with
respect to the protocol. Your handler may not consume ANY bytes from the input,
and in this case after your handler runs Snap fully drains the request
body. Snap also makes sure you can't partially read the request or read it
twice; inside the Request object is this ugly bit of hack:

    rqBody           :: IORef SomeEnumerator

The only way we allow you to read the request body is using the
"runRequestBody" function, which looks like this:

------------------------------------------------------------------------------
    -- | Sends the request body through an iteratee (data consumer) and
    -- returns the result.
    runRequestBody :: Iteratee IO a -> Snap a
    runRequestBody iter = do
        req  <- getRequest
        senum <- liftIO $ readIORef $ rqBody req
        let (SomeEnumerator enum) = senum

        -- make sure the iteratee consumes all of the output
        let iter' = iter >>= (\a -> Iter.skipToEof >> return a)

        -- run the iteratee
        result <- liftIter $ Iter.joinIM $ enum iter'

        -- stuff a new dummy enumerator into the request, so you can only try to
        -- read the request body from the socket once
        liftIO $ writeIORef (rqBody req)
                            (SomeEnumerator $ return . Iter.joinI .
Iter.take 0 )

        return result
------------------------------------------------------------------------------

You give it an iteratee, and it sends the remainder of the request body to it,
and subsequently puts a dummy enumerator into the IORef which doesn't read any
of the input. In other words, if you were to call "runRequestBody" twice, the
second call would immediately send EOF to your iteratee; this makes sense of
course, because reading the request body *doesn't actually happen* until you
call runRequestBody, and once you've read it, it's gone!

                                   *   *   *

OK, so we've come this far. Where does that leave us? Well, from a practical
perspective, if you want to handle file uploads, you'll be writing an Iteratee
which you pass into runRequestBody. You can relax and just concentrate on a)
parsing the format of multipart/form-data requests, and b) deciding what to do
with the encoded data. Given that you'll be decoding MIME messages we can
probably re-use some stuff from the snap-server HTTP parser; in fact this might
necessitate us moving Snap.Internal.Http.Parser from snap-server to snap-core.

If you want to parse some stuff from a stream using iteratees, the easiest way
(and the way we do it in the server) is to write an applicative parser using
attoparsec, and turn it into an iteratee using the "attoparsec-iteratee"
package. See Snap.Internal.Http.Parser for an example.

This has been quite a long-winded post -- let me know if something isn't clear
or if you have further questions.

Cheers,

G

-- 
Gregory Collins <[email protected]>
_______________________________________________
Snap mailing list
[email protected]
http://mailman-mail5.webfaction.com/listinfo/snap

Re: [Snap Framework] Adding File Upload Capability

Reply via email to