Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package
On 24/08/10 03:47, John Millikin wrote: [...] > I would like to avoid hard-coding the error type to SomeException, because > it forces libraries to use unsafe/unportable language features (dynamic > typing and casting). However, given the apparent practical requirement that > all iteratees have the same error type, it seems like there's no other > choice. I haven't worked enough with iteratees to have an informed opinion on this, but I wonder what the pros and cons are of having an error state in the iteratees at all. In other words, why would this data Step a m b = Continue (Stream a -> Iteratee a m b) | Yield b (Stream a) | Error E.SomeException be preferred over this data Step a m b = Continue (Stream a -> Iteratee a m b) | Yield b (Stream a) (Maybe with the restriction that m is a MonadError.) /M -- Magnus Therning(OpenPGP: 0xAB4DFBA4) magnus@therning.org Jabber: magnus@therning.org http://therning.org/magnus identi.ca|twitter: magthe signature.asc Description: OpenPGP digital signature ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package
It's not released yet, but persistent 0.2 is going to be using enumerator. I personally don't mind SomeException as a hard-coded error type, but go ahead and do whatever you think is best for the API. Michael On Tue, Aug 24, 2010 at 5:47 AM, John Millikin wrote: > After fielding some more questions regarding error handling, it turns > out that my earlier mail was in error (hah) -- error handling is much > more complicated than I thought. > > When I gave each iteratee its own error type, I was expecting that > each pipeline would have only one or two sources of errors -- for > example. a parser, or a file reader. However, in reality, it's likely > that every single element in a pipeline can produce an error. For > example, in a JSON/XML/etc reformatter (enumFile, parseEvents, > formatEvents, iterFile), errors could be SomeException, ParseError, or > FormatError. > > Futhermore, while it's easy to change an iteratee's error type with > just (e1 -> e2), changing an enumerator or enumeratee *also* requires > (e2 -> e1). In other words, to avoid loss of error information, the > two types have to be basically the same thing anyway. > > I would like to avoid hard-coding the error type to SomeException, > because it forces libraries to use unsafe/unportable language features > (dynamic typing and casting). However, given the apparent practical > requirement that all iteratees have the same error type, it seems like > there's no other choice. > > So, my questions: > > 1. Has anybody here successfully created / used / heard of an iteratee > implementation with independent error types? > 2. Do alternative Haskell implementations (JHC, UHC, Hugs, etc) > support DeriveDataTypeable? If not, is there any more portable way to > define exceptions? > 3. Has anybody actually written any libraries which use the existing > "enumerator" error handling API? I don't mind rewriting my own > uploads, since this whole mess is my own damn fault, but I don't want > to inconvenience anybody else. > ___ > Haskell-Cafe mailing list > Haskell-Cafe@haskell.org > http://www.haskell.org/mailman/listinfo/haskell-cafe > ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package
After fielding some more questions regarding error handling, it turns out that my earlier mail was in error (hah) -- error handling is much more complicated than I thought. When I gave each iteratee its own error type, I was expecting that each pipeline would have only one or two sources of errors -- for example. a parser, or a file reader. However, in reality, it's likely that every single element in a pipeline can produce an error. For example, in a JSON/XML/etc reformatter (enumFile, parseEvents, formatEvents, iterFile), errors could be SomeException, ParseError, or FormatError. Futhermore, while it's easy to change an iteratee's error type with just (e1 -> e2), changing an enumerator or enumeratee *also* requires (e2 -> e1). In other words, to avoid loss of error information, the two types have to be basically the same thing anyway. I would like to avoid hard-coding the error type to SomeException, because it forces libraries to use unsafe/unportable language features (dynamic typing and casting). However, given the apparent practical requirement that all iteratees have the same error type, it seems like there's no other choice. So, my questions: 1. Has anybody here successfully created / used / heard of an iteratee implementation with independent error types? 2. Do alternative Haskell implementations (JHC, UHC, Hugs, etc) support DeriveDataTypeable? If not, is there any more portable way to define exceptions? 3. Has anybody actually written any libraries which use the existing "enumerator" error handling API? I don't mind rewriting my own uploads, since this whole mess is my own damn fault, but I don't want to inconvenience anybody else. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package
On Sat, 21 Aug 2010 13:36:08 -0700, John Millikin wrote: > On Sat, Aug 21, 2010 at 12:44, Magnus Therning wrote: > > As an aside, has anyone written the code necessary to convert a parser, such > > as e.g. attoparsec, into an enumerator-iteratee[1]? > > This sort of conversion is trivial. For an example, I've uploaded the > attoparsec-enumerator package at < > http://hackage.haskell.org/package/attoparsec-enumerator > -- > iterParser is about 20 lines, excluding the module header and imports. < A.Done extra a -> E.yield a (E.Chunks [extra]) Maybe it would be better to check if extra is empty to produce an empty list of chunks? -- Nicolas Pouillard http://nicolaspouillard.fr ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package
On Aug 21, 2010, at 4:12 AM, John Millikin wrote: > This thought occurred to me, but really, how often are you going to > have a 10 GiB **text** file with no newlines? When you have a file developed on a system that follows a different new-line convention. I haven't seen a file that big, but I'm sadly used to seeing humanly large files display as single lines. Of course if getLine/hGetLine accept *any* of CR, LF, CR+LF as end-of-line (as opposed to using the platform native convention), there's no problem. That's a darned good idea anyway. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package
On Sat, Aug 21, 2010 at 23:14, Paulo Tanimoto wrote: > One question: enumFile has type > > enumFile :: FilePath -> Enumerator SomeException ByteString IO b > > and iterParser has type > > iterParser :: Monad m => Parser a -> Iteratee ParseError ByteString m a > > How do we use both together? Something in these lines won't type-check > > E.run (E.enumFile "file" E.$$ (E.iterParser p)) > > because the error types are different. Forgot to mention that -- use the "mapError" function from enumerator-0.2.1 thusly: http://ianen.org/haskell/enumerator/api-docs/Data-Enumerator.html#v%3AmapError parser :: Parser Foo toExc :: Show a => a -> E.SomeException toExc = E.SomeException . E.ErrorCall . show main :: IO () main = do run (enumFile "parsetest.txt" $$ mapError toExc $$ iterParser parser) >>= print You don't have to map to SomeException -- any type will do. For example, in a complex pipeline with real error handling at the other end, you might want a custom error type so you'll know at what stage the error occurred. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package
On Sat, Aug 21, 2010 at 15:35, Paulo Tanimoto wrote: > Apologies if I'm asking you to repeat yourself, but I couldn't find > the explanation. What was the reason why you went with IterateeM > instead of IterateeMCPS? Simplicity? Iteratees are difficult enough to understand already -- requiring prospective users to learn and understand CPS would just be another roadblock. The CPS implementation is also slower -- I performed some basic benchmarking of IterateeM.hs and IterateeMCPS.hs, and CPS is only faster without optimizations. At -O, they are equal, and at -O2, IterateeM is faster. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package
John, On Sat, Aug 21, 2010 at 5:06 PM, John Millikin wrote: > > I think the API is pretty stable. Most of the significant research > into iteratee-based APIs has already been performed by users of the > "iteratee" library, and by Oleg. There might be a few > backwards-compatible changes (new modules, new exports, etc). I'm not > planning to make any large changes, such as Mr. Lato's transition to > CPS-based iteratees. > Apologies if I'm asking you to repeat yourself, but I couldn't find the explanation. What was the reason why you went with IterateeM instead of IterateeMCPS? Simplicity? Thanks, Paulo ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package
On Sat, Aug 21, 2010 at 14:41, Michael Snoyman wrote: > Hey John, > As I mentioned, I'm considering having persistent depend upon enumerator. Do > you think it's too early in enumerator's life to do so and I should wait > till the API stabilizes a bit more? Also, two other packages I would think > to bring into the enumerator family would be: > * yaml > * wai-extra, providing an enumerator layer for more easily dealing with the > Source and Enumerator datatypes in wai. I might just release a > wai-enumerator package instead. > Thanks again for your work on this, > Michael I think the API is pretty stable. Most of the significant research into iteratee-based APIs has already been performed by users of the "iteratee" library, and by Oleg. There might be a few backwards-compatible changes (new modules, new exports, etc). I'm not planning to make any large changes, such as Mr. Lato's transition to CPS-based iteratees. As long as you import the enumerator modules with "qualified" (to avoid Prelude name clashes), it should be safe to start porting libraries. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package
On Sat, Aug 21, 2010 at 14:17, Paulo Tanimoto wrote: > Cool, but is there a reason it won't work with version 0.2 you just released? > > build-depends: > [...] > , enumerator >= 0.1 && < 0.2 > > I noticed that when installing it. Hah ... forgot to save the vim buffer. Corrected version uploaded. Sorry about that. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package
On Sat, Aug 21, 2010 at 3:36 PM, John Millikin wrote: > > This sort of conversion is trivial. For an example, I've uploaded the > attoparsec-enumerator package at < > http://hackage.haskell.org/package/attoparsec-enumerator > -- > iterParser is about 20 lines, excluding the module header and imports. Cool, but is there a reason it won't work with version 0.2 you just released? build-depends: [...] , enumerator >= 0.1 && < 0.2 I noticed that when installing it. Paulo ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package
On Sat, Aug 21, 2010 at 12:44, Magnus Therning wrote: > As an aside, has anyone written the code necessary to convert a parser, such > as e.g. attoparsec, into an enumerator-iteratee[1]? This sort of conversion is trivial. For an example, I've uploaded the attoparsec-enumerator package at < http://hackage.haskell.org/package/attoparsec-enumerator > -- iterParser is about 20 lines, excluding the module header and imports. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package
On 21/08/10 18:58, John Millikin wrote: > I think the docs are wrong, or perhaps we're misunderstanding them. > Magnus is correct. > > Attached is a test program which listens on two ports, 42000 (blocking > IO) and 42001 (non-blocking). You can use netcat, telnet, etc, to send > it data. The behavior is as Magnus describes: bytes from > hGetNonBlocking are available immediately, while hGet waits for a full > buffer (or EOF) before returning. > > This behavior obviously makes hGet unsuitable for enumHandle; my > apologies for not understanding the problem sooner. Thanks, but I suspect that it was my bad description of the issue that made understanding the issue more problematic. Anyway it's good we now understand each other, and even better that we agree :-) As an aside, has anyone written the code necessary to convert a parser, such as e.g. attoparsec, into an enumerator-iteratee[1]? /M [1] Similar to how attoparsec-iteratee does it for iteratee-iteratee. -- Magnus Therning(OpenPGP: 0xAB4DFBA4) magnus@therning.org Jabber: magnus@therning.org http://therning.org/magnus identi.ca|twitter: magthe signature.asc Description: OpenPGP digital signature ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package
On Sat, Aug 21, 2010 at 11:58, Judah Jacobson wrote: > You should note that in ghc>=6.12, hWaitForInput tries to decode the > next character of input based on to the Handle's encoding. As a > result, it will block if the next multibyte sequence is incomplete, > and it will throw an error if a multibyte sequence gets split between > two chunks. > > I worked around this problem in Haskeline by temporarily setting stdin > to BinaryMode; you may want to do something similar. > > Also, this issue caused a bug in bytestring with ghc-6.12: > http://hackage.haskell.org/trac/ghc/ticket/3808 > which will be resolved by the new function 'hGetBufSome' (in ghc-6.14) > that blocks only when there's no data to read: > http://hackage.haskell.org/trac/ghc/ticket/4046 > That function might be useful for your package, though not portable to > other implementations or older GHC versions. You should not be reading bytestrings from text-mode handles. The more I think about it, the more having a single Handle type for both text and binary data causes problems. There should be some separation so users don't accidentally use a text handle with binary functions, and vice-versa: openFile :: FilePath -> IOMode -> IO TextHandle openBinaryFile :: FIlePath -> IOMode -> IO BinaryHandle hGetBuf :: BinaryHandle -> Ptr a -> Int -> IO Int Data.ByteString.hGet :: BinaryHandle -> IO ByteString -- etc then the enumerators would simply require the correct handle type: Data.Enumerator.IO.enumHandle :: BinaryHandle -> Enumerator SomeException ByteString IO b Data.Enumerator.Text.enumHandle :: TextHandle -> Enumerator SomeException Text IO b I suppose the enumerators could verify the handle mode and throw an exception if it's incorrect -- at least that way, it will fail consistently rather than only in rare occasions. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package
On Sat, Aug 21, 2010 at 11:35, Gregory Collins wrote: > John Millikin writes: > >> I think the docs are wrong, or perhaps we're misunderstanding them. >> Magnus is correct. >> >> Attached is a test program which listens on two ports, 42000 (blocking >> IO) and 42001 (non-blocking). You can use netcat, telnet, etc, to send >> it data. The behavior is as Magnus describes: bytes from >> hGetNonBlocking are available immediately, while hGet waits for a full >> buffer (or EOF) before returning. > > "hSetBuffering handle NoBuffering"? > > The implementation as it is is fine IMO. Disabling buffering doesn't change the behavior -- hGet h 20 still doesn't return until the handle has at least 20 bytes of input available. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package
On Sat, Aug 21, 2010 at 10:58 AM, John Millikin wrote: > I think the docs are wrong, or perhaps we're misunderstanding them. > Magnus is correct. > > Attached is a test program which listens on two ports, 42000 (blocking > IO) and 42001 (non-blocking). You can use netcat, telnet, etc, to send > it data. The behavior is as Magnus describes: bytes from > hGetNonBlocking are available immediately, while hGet waits for a full > buffer (or EOF) before returning. > > This behavior obviously makes hGet unsuitable for enumHandle; my > apologies for not understanding the problem sooner. You should note that in ghc>=6.12, hWaitForInput tries to decode the next character of input based on to the Handle's encoding. As a result, it will block if the next multibyte sequence is incomplete, and it will throw an error if a multibyte sequence gets split between two chunks. I worked around this problem in Haskeline by temporarily setting stdin to BinaryMode; you may want to do something similar. Also, this issue caused a bug in bytestring with ghc-6.12: http://hackage.haskell.org/trac/ghc/ticket/3808 which will be resolved by the new function 'hGetBufSome' (in ghc-6.14) that blocks only when there's no data to read: http://hackage.haskell.org/trac/ghc/ticket/4046 That function might be useful for your package, though not portable to other implementations or older GHC versions. Best, -Judah ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package
John Millikin writes: > I think the docs are wrong, or perhaps we're misunderstanding them. > Magnus is correct. > > Attached is a test program which listens on two ports, 42000 (blocking > IO) and 42001 (non-blocking). You can use netcat, telnet, etc, to send > it data. The behavior is as Magnus describes: bytes from > hGetNonBlocking are available immediately, while hGet waits for a full > buffer (or EOF) before returning. "hSetBuffering handle NoBuffering"? The implementation as it is is fine IMO. G -- Gregory Collins ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package
I think the docs are wrong, or perhaps we're misunderstanding them. Magnus is correct. Attached is a test program which listens on two ports, 42000 (blocking IO) and 42001 (non-blocking). You can use netcat, telnet, etc, to send it data. The behavior is as Magnus describes: bytes from hGetNonBlocking are available immediately, while hGet waits for a full buffer (or EOF) before returning. This behavior obviously makes hGet unsuitable for enumHandle; my apologies for not understanding the problem sooner. import Control.Concurrent (forkIO, threadDelay) import Control.Monad (forever, unless) import Control.Monad.Fix (fix) import qualified Data.ByteString as B import Network import System.IO main :: IO () main = do blockingSock <- listenOn (PortNumber 42000) nonblockingSock <- listenOn (PortNumber 42001) forkIO $ acceptLoop B.hGet blockingSock "Blocking" forkIO $ acceptLoop nonblockingGet nonblockingSock "Non-blocking" forever $ threadDelay 100 nonblockingGet :: Handle -> Int -> IO B.ByteString nonblockingGet h n = do hasInput <- catch (hWaitForInput h (-1)) (\_ -> return False) if hasInput then B.hGetNonBlocking h n else return B.empty acceptLoop :: (Handle -> Int -> IO B.ByteString) -> Socket -> String -> IO () acceptLoop get sock label = fix $ \loop -> do (h, _, _) <- accept sock putStrLn $ label ++ " client connected" bytesLoop (get h) putStrLn $ label ++ " EOF" loop bytesLoop :: (Int -> IO B.ByteString) -> IO () bytesLoop get = fix $ \loop -> do bytes <- get 20 unless (B.null bytes) $ do putStrLn $ "bytes = " ++ show bytes loop ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package
On Sat, Aug 21, 2010 at 5:40 AM, Magnus Therning wrote: > It changes the timing. The iteratee will receive the data sooner (when it's > available rather than when the buffer is full). This means it can fail > *sooner*, in wall-clock time. I still fail to see how this works. So I went to see the sources. In [1] we can see how hGet and hGetNonBlocking are defined. The only difference is that the former uses hGetBuf, and the latter uses hGetBufNonBlocking. [1] http://hackage.haskell.org/packages/archive/bytestring/0.9.1.7/doc/html/src/Data-ByteString.html#line-1908 hGetBuf's main loop is bufRead [2], while hGetBufNonBlocking's main loop is bufReadNonBlocking [3]. Both are very similar. The main differences are RawIO.read vs RawIO.readNonBlocking [4], and Buffered.fillReadBuffer vs Buffered.fillReadBuffer0 [5]. Reading RawIO's documentation [4], we see that RawIO.read blocks only if there is no data available. So it doesn't wait for the buffer to be fully filled, it just "returns the available data". Unfortunately, BufferedIO's documentation [5] doesn't specify if Buffered.fillReadBuffer should return the available data without blocking. However, it does specify that that it should be "blocking if the are no bytes available". [2] http://hackage.haskell.org/packages/archive/base/4.2.0.1/doc/html/src/GHC-IO-Handle-Text.html#line-820 [3] http://hackage.haskell.org/packages/archive/base/4.2.0.1/doc/html/src/GHC-IO-Handle-Text.html#bufReadNonBlocking [4] http://hackage.haskell.org/packages/archive/base/4.2.0.1/doc/html/src/GHC-IO-Device.html#RawIO [5] http://hackage.haskell.org/packages/archive/base/4.2.0.1/doc/html/src/GHC-IO-BufferedIO.html#BufferedIO So, assuming that the semantics of BufferedIO are the same as RawIO's, *both* are non-blocking whenever data is already available. None of them wait until the buffer is full. The difference lies in whether they block if there is no data available. However, when there isn't data the enumarator *always* wants to block. So using non-blocking IO doesn't give anything, only complicates the code. Am I misreading the docs/source somewhere? =) Cheers! -- Felipe. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package
On 20/08/10 23:12, John Millikin wrote: > On Fri, Aug 20, 2010 at 14:58, Magnus Therning wrote: >> Indeed. >> >> In many protocols it would force the attacker to send well-formed requests >> though. I think this is true for many text-based protocols like >> HTTP. >> >> The looping can be handled effectively through hWaitForInput. >> >> There are also other reasons for doing non-blocking IO, not least that it >> makes developing and manual testing a lot nicer. > > I think I'm failing to understand something. > > Using a non-blocking read doesn't change how the iteratees react to > well- or mal-formed requests. All it does is change the failure > condition from "blocked indefinitely" to "looping indefinitely". It changes the timing. The iteratee will receive the data sooner (when it's available rather than when the buffer is full). This means it can fail *sooner*, in wall-clock time. > Replacing the hGet with a combination of hWaitForInput / > hGetNonBlocking would cause a third failure condition, "looping > indefinitely with periodic blocks". This doesn't seem to be an > improvement over simply blocking. It is an improvement when data is trickling in. In other cases it's no improvement (besides that it'd be possible have time-outs on a "lower level"). > Do you have any example code which works well using a non-blocking > enumerator, but fails with a blocking one? It's not about failing vs non-failing, it's about time of failure. An example would be failing after reading a few bytes (the verb of a HTTP request) vs failing after either reading 4k (which is the buffer size in iteratee, IIRC) or when the client hangs up. /M -- Magnus Therning(OpenPGP: 0xAB4DFBA4) magnus@therning.org Jabber: magnus@therning.org http://therning.org/magnus identi.ca|twitter: magthe signature.asc Description: OpenPGP digital signature ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package
On Fri, Aug 20, 2010 at 14:58, Magnus Therning wrote: > Indeed. > > In many protocols it would force the attacker to send well-formed requests > though. I think this is true for many text-based protocols like > HTTP. > > The looping can be handled effectively through hWaitForInput. > > There are also other reasons for doing non-blocking IO, not least that it > makes developing and manual testing a lot nicer. I think I'm failing to understand something. Using a non-blocking read doesn't change how the iteratees react to well- or mal-formed requests. All it does is change the failure condition from "blocked indefinitely" to "looping indefinitely". Replacing the hGet with a combination of hWaitForInput / hGetNonBlocking would cause a third failure condition, "looping indefinitely with periodic blocks". This doesn't seem to be an improvement over simply blocking. Do you have any example code which works well using a non-blocking enumerator, but fails with a blocking one? ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package
On 20/08/10 22:32, John Millikin wrote: > On Fri, Aug 20, 2010 at 12:52, Magnus Therning wrote: >> You don't need to send that much data, the current implementation of >> Enumerator uses hGet, which blocks, so just send the server a few bytes and >> it'll be sitting there waiting for input until it times out (if ever). >> Open a few hundred of those connections and you're likely to cause the >> server to run out of FDs. Of course this is already coded up in tools like >> slowloris[1] :-) > > Correct me if I'm wrong, but I'm pretty sure changing the implementation to > something non-blocking like hGetNonBlocking will not fix this. Hooking up an > iteratee to an enumerator which doesn't block will cause it to loop forever, > which is arguably worse than simply blocking. > > The best way I can think of to defeat a handle-exhaustion attack is to > enforce a timeout on HTTP header parsing, using something like > System.Timeout. This protects against slowloris, since requiring the > entire header to be parsed within some fixed small period of time > prevents the socket from being held open via slowly-trickled headers. Indeed. In many protocols it would force the attacker to send well-formed requests though. I think this is true for many text-based protocols like HTTP. The looping can be handled effectively through hWaitForInput. There are also other reasons for doing non-blocking IO, not least that it makes developing and manual testing a lot nicer. /M -- Magnus Therning(OpenPGP: 0xAB4DFBA4) magnus@therning.org Jabber: magnus@therning.org http://therning.org/magnus identi.ca|twitter: magthe signature.asc Description: OpenPGP digital signature ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package
On Fri, Aug 20, 2010 at 12:52, Magnus Therning wrote: > You don't need to send that much data, the current implementation of > Enumerator uses hGet, which blocks, so just send the server a few bytes and > it'll be sitting there waiting for input until it times out (if ever). > Open a > few hundred of those connections and you're likely to cause the server > to run > out of FDs. Of course this is already coded up in tools like > slowloris[1] :-) Correct me if I'm wrong, but I'm pretty sure changing the implementation to something non-blocking like hGetNonBlocking will not fix this. Hooking up an iteratee to an enumerator which doesn't block will cause it to loop forever, which is arguably worse than simply blocking. The best way I can think of to defeat a handle-exhaustion attack is to enforce a timeout on HTTP header parsing, using something like System.Timeout. This protects against slowloris, since requiring the entire header to be parsed within some fixed small period of time prevents the socket from being held open via slowly-trickled headers. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package
On 20/08/10 17:30, Felipe Lessa wrote: > On Fri, Aug 20, 2010 at 1:12 PM, John Millikin wrote: >> This thought occurred to me, but really, how often are you going to >> have a 10 GiB **text** file with no newlines? Remember, this is for >> text (log files, INI-style configs, plain .txt), not binary (HTML, >> XML, JSON). Off the top of my head, I can't think of any case where >> you'd expect to see 10 GiB in a single line. >> >> In the worst case, you can just use "decode" to process bytes coming >> from the ByteString-based enumHandle, which should give nicely chunked >> text. > > I was thinking about an attacker, not a use case. Think of a web > server accepting queries using iteratees internally. This may open > door to at least DoS attacks. You don't need to send that much data, the current implementation of Enumerator uses hGet, which blocks, so just send the server a few bytes and it'll be sitting there waiting for input until it times out (if ever). Open a few hundred of those connections and you're likely to cause the server to run out of FDs. Of course this is already coded up in tools like slowloris[1] :-) /M [1] http://ha.ckers.org/slowloris/ -- Magnus Therning(OpenPGP: 0xAB4DFBA4) magnus@therning.org Jabber: magnus@therning.org http://therning.org/magnus identi.ca|twitter: magthe signature.asc Description: OpenPGP digital signature ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package
On Fri, Aug 20, 2010 at 09:30, Felipe Lessa wrote: > I was thinking about an attacker, not a use case. Think of a web > server accepting queries using iteratees internally. This may open > door to at least DoS attacks. Web servers parse/generate HTTP, which is byte-based. They should be using the bytes-based handle enumerator. > And then, we use iteratees because we don't like the unpredictability > of lazy IO. Why should iteratees be unpredictable when dealing with > Text? Besides the memory consumption problem, there may be > performance problems if the lines are too short. If you don't want unpredictable performance, use bytes-based IO and decode it with "decode utf8" or something similar. Text-based IO merely exists to solve the most common case, which is a small file in local encoding with relatively short (< 200 char) lines. If you need to handle more complicated cases, such as: * Files in fixed or self-described encodings (JSON, XML) * Files with unknown encodings (HTML, RSS) * Files with content in multiple encodings (EMail) * Files containing potentially malicious input (such as public server log files) Then you need to read them as bytes and decide yourself which decoding is necessary. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package
On Fri, Aug 20, 2010 at 1:12 PM, John Millikin wrote: > This thought occurred to me, but really, how often are you going to > have a 10 GiB **text** file with no newlines? Remember, this is for > text (log files, INI-style configs, plain .txt), not binary (HTML, > XML, JSON). Off the top of my head, I can't think of any case where > you'd expect to see 10 GiB in a single line. > > In the worst case, you can just use "decode" to process bytes coming > from the ByteString-based enumHandle, which should give nicely chunked > text. I was thinking about an attacker, not a use case. Think of a web server accepting queries using iteratees internally. This may open door to at least DoS attacks. And then, we use iteratees because we don't like the unpredictability of lazy IO. Why should iteratees be unpredictable when dealing with Text? Besides the memory consumption problem, there may be performance problems if the lines are too short. Cheers! =) -- Felipe. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package
On Fri, Aug 20, 2010 at 08:59, Felipe Lessa wrote: > On Fri, Aug 20, 2010 at 12:51 PM, John Millikin wrote: >> Currently, I'm planning on the following type signatures for D.E.Text. >> 'enumHandle' will use Text's hGetLine, since there doesn't seem to be >> any text-based equivalent to ByteString's 'hGet'. > > CC'ing text's maintainer. Using 'hGetLine' will cause baaad surprises > when you process a 10 GiB file with no '\n' in sight. This thought occurred to me, but really, how often are you going to have a 10 GiB **text** file with no newlines? Remember, this is for text (log files, INI-style configs, plain .txt), not binary (HTML, XML, JSON). Off the top of my head, I can't think of any case where you'd expect to see 10 GiB in a single line. In the worst case, you can just use "decode" to process bytes coming from the ByteString-based enumHandle, which should give nicely chunked text. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package
On Fri, Aug 20, 2010 at 12:51 PM, John Millikin wrote: > Currently, I'm planning on the following type signatures for D.E.Text. > 'enumHandle' will use Text's hGetLine, since there doesn't seem to be > any text-based equivalent to ByteString's 'hGet'. CC'ing text's maintainer. Using 'hGetLine' will cause baaad surprises when you process a 10 GiB file with no '\n' in sight. Cheers! =) -- Felipe. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package
On Fri, Aug 20, 2010 at 04:01, Simon Marlow wrote: > Handle IO is also doing Unicode encoding/decoding, which iteratees bypass. > Have you thought about how to incorporate encoding/decoding? Yes; there will be a module Data.Enumerator.Text which contains locale-based IO, enumeratee-based encoding/decoding, and so forth. Since "iteratee" doesn't have any text-based IO, I figured it wasn't necessary for a first release; getting feedback on the basic soundness of the package was more important. Currently, I'm planning on the following type signatures for D.E.Text. 'enumHandle' will use Text's hGetLine, since there doesn't seem to be any text-based equivalent to ByteString's 'hGet'. enumHandle :: Handle -> Enumerator SomeException Text IO b enumFile :: FilePath -> Enumerator SomeException Text IO b data Codec = Codec { codecName :: Text , codecEncode :: Text -> Either SomeException ByteString , codecDecode :: ByteString -> Either SomeException (Text, ByteString) } encode :: Codec -> Enumeratee SomeException Text ByteString m b decode :: Codec -> Enumeratee SomeException ByteString Text m b utf8 :: Codec utf16le :: Codec utf16be :: Codec utf32le :: Codec utf32be :: Codec ascii :: Codec iso8859_1 :: Codec ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package
On 19/08/2010 18:21, John Millikin wrote: On Wed, Aug 18, 2010 at 23:33, Jason Dagit wrote: The main reason I would use iteratees is for performance reasons. To help me, as a potential consumer of your library, could you please provide benchmarks for comparing the performance of enumerator with say, a) iteratee, b) lazy/strict bytestring, and c) Prelude functions? I'm interested in both max memory consumption and run-times. Using criterion and/or progression to get the run-times would be icing on an already delicious cake! Oleg has some benchmarks of his implementation at< http://okmij.org/ftp/Haskell/Iteratee/Lazy-vs-correct.txt>, which clock iteratees at about twice as fast as lazy IO. He also compares them to a native "wc", but his comparison is flawed, because he's comparing a String iteratee vs byte-based wc. Handle IO is also doing Unicode encoding/decoding, which iteratees bypass. Have you thought about how to incorporate encoding/decoding? Cheers, Simon ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe