On Sat, Nov 6, 2010 at 10:00 PM, Dan Getelman <[email protected]> wrote:
> I've looked through the source where Yesod handles this, and it seems their > approach is to have, when their requests are parsed, the body goes into an > "IO thunk", such that any POST requests are processed at-most once, but also > don't necessarily have to be processed at all. The request body is then > represented as a tuple, where the first element is [(paramName, > paramValue)], and the second is [(paramName, fileInfo)], where fileInfo is > the file name, content type, and content. > (https://github.com/snoyberg/yesod/blob/master/Yesod/Request.hs) Hey Dan, Let me give you some more background on our I/O model because that will help you to understand how to do this. I won't go much into what the form handling code should look like at this point, but instead I'll just talk generally about what happens when Snap handles an HTTP request and how the request body is made available to handler programs. HTTP POST requests, as you're undoubtedly aware, look like this: (and imagine a CR/LF at the end of every blank line) POST / HTTP/1.1 Host: foo.com Some-Other-Headers: whatever Content-Length: 100 Content-Encoding: multipart/form-data ...100 bytes of stuff... We use the iteratee data model, as you're also aware of. The way that iteratees work is that Enumerators (functions which produce chunks of data) feed into Iteratees (functions which consume it). You can think of it like a pipeline. If you take a look at Snap.Internal.Http.Server.SimpleBackend from snap-server, you can see the Enumerator that we use to feed data from the socket: enumerate :: (MonadIO m) => Connection -> Enumerator m a enumerate = loop where loop conn f = do debug $ "Backend.enumerate: reading from socket" s <- liftIO $ timeoutRecv conn bLOCKSIZE debug $ "Backend.enumerate: got " ++ Prelude.show (B.length s) ++ " bytes from read end" sendOne conn f s sendOne conn f s = do v <- runIter f (if B.null s then EOF Nothing else Chunk $ WrapBS s) case v of r@(Done _ _) -> return $ liftI r (Cont k Nothing) -> loop conn k (Cont _ (Just e)) -> return $ throwErr e [[An aside: actually, now that I look at it, we should probably be lifting "conn" out as a parameter to "enumerate" since it never changes -- will add that to my todo list]] To understand this, you need the type of Enumerator (and this is a bit of a fudge since it's a type alias for a more complicated type in the iteratee library, but you can think of it this way): type Enumerator m a = Iteratee m a -> m (Iteratee m a) So you hand an enumerator an iteratee, it runs some I/O and gives you an iteratee back. As you can see, "enumerate" just runs in a loop where it reads a block of data from the socket (and this is a "read()" call underneath if you're familiar with Unix socket programming), passes that block to the wrapped iteratee. If it finishes, by yielding a value with "Done", then we stop sending it data chunks -- and note that "Done" carries around with it the remainder of the unread data. The neat thing about Iteratees is that you can chain them with their Monad instance; "a >>= b" boils down to "a accepts chunks from the Enumerator until it produces a value, which gets sent to the b function (along with any unread input), which accepts chunks from the Enumerator until it produces a value...". And for the most part this allows you to ignore chunk management and just deal with handling the input your particular iteratee has to deal with. For instance, here's an example which shows an iteratee that reads a single line of input, and a toy program which reads two lines from a stream and returns them: ------------------------------------------------------------------------------ import Control.Monad (replicateM) import Data.ByteString.Char8 (ByteString) import qualified Data.ByteString.Char8 as B import Data.Iteratee import Data.Iteratee.WrappedByteString type Iteratee m a = IterateeG WrappedByteString Char m a readOneLine :: Iteratee IO ByteString readOneLine = IterateeG $ f [] where f _ (EOF e@(Just _)) = return $ Cont undefined e -- handle error f chunks c@(EOF Nothing) = return $ Done (B.concat chunks) c -- eof? give me the line f chunks (Chunk (WrapBS s)) = do let (a,b) = B.break (=='\n') s if B.null b -- newline in the input? if no, grab more input, if yes, -- return the line up till the '\n' then return $ Cont (IterateeG $ f (chunks ++ [s])) Nothing else do let line = B.concat $ chunks ++ [a] let rest = Chunk $ WrapBS $ B.drop 1 b return $ Done line rest main :: IO () main = do let input = WrapBS $ B.pack "line 1\nline 2\nline 3" let enum = enumPure1Chunk input xs <- enum (replicateM 2 readOneLine) >>= run Prelude.mapM_ B.putStrLn xs ------------------------------------------------------------------------------ If you run "main" you get: line 1 line 2 The point I'm trying to make is that an iteratee can read as much input as it likes, produce a value, and leave the rest of the stream for some other iteratee to process. Another interesting consequence of this data model is that you can do things like take N bytes from the stream and pass it into an iteratee, transparently decode chunked transfer-encodings or gzipped data, etc -- and the wrapped iteratee only has to deal with the stream of bytes IT cares about. Hopefully this is clear so far. (!) Now what happens with Snap and HTTP requests is this: if you take a look at the definition of the Snap monad itself, you'll see: newtype Snap a = Snap { unSnap :: StateT SnapState (Iteratee IO) (Maybe (Either Response a)) } In other words, a Snap handler, the **whole thing**, is itself an iteratee underneath. This may be surprising because it isn't really exposed to the user, but what is happening is this: the snap server runs an iteratee which peels off the HTTP protocol goop and hands control to your handler at the point immediately following the blank line which ends the headers. (Of course, if "Transfer-Encoding" is set to "chunked" then it wraps your handler with a decoder, but that's a quibbling detail.) So your handler is in a position to read the body of the request; in fact, there's probably some data left over from the request parsing iteratee already ready to be fed to you, but if you get a long POST request then what's happening is that **you haven't actually read the entire request yet** when your handler runs -- you might have to read() a few megabytes of data even. This is also what would allows us to stream in O(1) space, we can read chunks of the input as they come in and dispatch them without holding the whole thing in memory. Of course, Snap does some stuff for you to ensure that this is all "safe" with respect to the protocol. Your handler may not consume ANY bytes from the input, and in this case after your handler runs Snap fully drains the request body. Snap also makes sure you can't partially read the request or read it twice; inside the Request object is this ugly bit of hack: rqBody :: IORef SomeEnumerator The only way we allow you to read the request body is using the "runRequestBody" function, which looks like this: ------------------------------------------------------------------------------ -- | Sends the request body through an iteratee (data consumer) and -- returns the result. runRequestBody :: Iteratee IO a -> Snap a runRequestBody iter = do req <- getRequest senum <- liftIO $ readIORef $ rqBody req let (SomeEnumerator enum) = senum -- make sure the iteratee consumes all of the output let iter' = iter >>= (\a -> Iter.skipToEof >> return a) -- run the iteratee result <- liftIter $ Iter.joinIM $ enum iter' -- stuff a new dummy enumerator into the request, so you can only try to -- read the request body from the socket once liftIO $ writeIORef (rqBody req) (SomeEnumerator $ return . Iter.joinI . Iter.take 0 ) return result ------------------------------------------------------------------------------ You give it an iteratee, and it sends the remainder of the request body to it, and subsequently puts a dummy enumerator into the IORef which doesn't read any of the input. In other words, if you were to call "runRequestBody" twice, the second call would immediately send EOF to your iteratee; this makes sense of course, because reading the request body *doesn't actually happen* until you call runRequestBody, and once you've read it, it's gone! * * * OK, so we've come this far. Where does that leave us? Well, from a practical perspective, if you want to handle file uploads, you'll be writing an Iteratee which you pass into runRequestBody. You can relax and just concentrate on a) parsing the format of multipart/form-data requests, and b) deciding what to do with the encoded data. Given that you'll be decoding MIME messages we can probably re-use some stuff from the snap-server HTTP parser; in fact this might necessitate us moving Snap.Internal.Http.Parser from snap-server to snap-core. If you want to parse some stuff from a stream using iteratees, the easiest way (and the way we do it in the server) is to write an applicative parser using attoparsec, and turn it into an iteratee using the "attoparsec-iteratee" package. See Snap.Internal.Http.Parser for an example. This has been quite a long-winded post -- let me know if something isn't clear or if you have further questions. Cheers, G -- Gregory Collins <[email protected]> _______________________________________________ Snap mailing list [email protected] http://mailman-mail5.webfaction.com/listinfo/snap
