subject:"\[Haskell\] Re\: \[Haskell\-cafe\] ANNOUNCE\: enumerator, an alternative iteratee package"

Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package

2010-08-23 Thread Magnus Therning

On 24/08/10 03:47, John Millikin wrote:
[...]
> I would like to avoid hard-coding the error type to SomeException, because
> it forces libraries to use unsafe/unportable language features (dynamic
> typing and casting). However, given the apparent practical requirement that
> all iteratees have the same error type, it seems like there's no other
> choice.

I haven't worked enough with iteratees to have an informed opinion on this,
but I wonder what the pros and cons are of having an error state in the
iteratees at all.  In other words, why would this

  data Step a m b
  = Continue (Stream a -> Iteratee a m b)
  | Yield b (Stream a)
  | Error E.SomeException

be preferred over this

  data Step a m b
  = Continue (Stream a -> Iteratee a m b)
  | Yield b (Stream a)

(Maybe with the restriction that m is a MonadError.)

/M

-- 
Magnus Therning(OpenPGP: 0xAB4DFBA4)
magnus＠therning．org   Jabber: magnus＠therning．org
http://therning.org/magnus identi.ca|twitter: magthe



signature.asc
Description: OpenPGP digital signature
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package

2010-08-23 Thread Michael Snoyman

It's not released yet, but persistent 0.2 is going to be using enumerator. I
personally don't mind SomeException as a hard-coded error type, but go ahead
and do whatever you think is best for the API.

Michael

On Tue, Aug 24, 2010 at 5:47 AM, John Millikin  wrote:

> After fielding some more questions regarding error handling, it turns
> out that my earlier mail was in error (hah) -- error handling is much
> more complicated than I thought.
>
> When I gave each iteratee its own error type, I was expecting that
> each pipeline would have only one or two sources of errors -- for
> example. a parser, or a file reader. However, in reality, it's likely
> that every single element in a pipeline can produce an error. For
> example, in a JSON/XML/etc reformatter (enumFile, parseEvents,
> formatEvents, iterFile), errors could be SomeException, ParseError, or
> FormatError.
>
> Futhermore, while it's easy to change an iteratee's error type with
> just (e1 -> e2), changing an enumerator or enumeratee *also* requires
> (e2 -> e1). In other words, to avoid loss of error information, the
> two types have to be basically the same thing anyway.
>
> I would like to avoid hard-coding the error type to SomeException,
> because it forces libraries to use unsafe/unportable language features
> (dynamic typing and casting). However, given the apparent practical
> requirement that all iteratees have the same error type, it seems like
> there's no other choice.
>
> So, my questions:
>
> 1. Has anybody here successfully created / used / heard of an iteratee
> implementation with independent error types?
> 2. Do alternative Haskell implementations (JHC, UHC, Hugs, etc)
> support DeriveDataTypeable? If not, is there any more portable way to
> define exceptions?
> 3. Has anybody actually written any libraries which use the existing
> "enumerator" error handling API? I don't mind rewriting my own
> uploads, since this whole mess is my own damn fault, but I don't want
> to inconvenience anybody else.
> ___
> Haskell-Cafe mailing list
> Haskell-Cafe@haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package

2010-08-23 Thread John Millikin

After fielding some more questions regarding error handling, it turns
out that my earlier mail was in error (hah) -- error handling is much
more complicated than I thought.

When I gave each iteratee its own error type, I was expecting that
each pipeline would have only one or two sources of errors -- for
example. a parser, or a file reader. However, in reality, it's likely
that every single element in a pipeline can produce an error. For
example, in a JSON/XML/etc reformatter (enumFile, parseEvents,
formatEvents, iterFile), errors could be SomeException, ParseError, or
FormatError.

Futhermore, while it's easy to change an iteratee's error type with
just (e1 -> e2), changing an enumerator or enumeratee *also* requires
(e2 -> e1). In other words, to avoid loss of error information, the
two types have to be basically the same thing anyway.

I would like to avoid hard-coding the error type to SomeException,
because it forces libraries to use unsafe/unportable language features
(dynamic typing and casting). However, given the apparent practical
requirement that all iteratees have the same error type, it seems like
there's no other choice.

So, my questions:

1. Has anybody here successfully created / used / heard of an iteratee
implementation with independent error types?
2. Do alternative Haskell implementations (JHC, UHC, Hugs, etc)
support DeriveDataTypeable? If not, is there any more portable way to
define exceptions?
3. Has anybody actually written any libraries which use the existing
"enumerator" error handling API? I don't mind rewriting my own
uploads, since this whole mess is my own damn fault, but I don't want
to inconvenience anybody else.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package

2010-08-23 Thread Nicolas Pouillard

On Sat, 21 Aug 2010 13:36:08 -0700, John Millikin  wrote:
> On Sat, Aug 21, 2010 at 12:44, Magnus Therning  wrote:
> > As an aside, has anyone written the code necessary to convert a parser, such
> > as e.g.  attoparsec, into an enumerator-iteratee[1]?
> 
> This sort of conversion is trivial. For an example, I've uploaded the
> attoparsec-enumerator package at <
> http://hackage.haskell.org/package/attoparsec-enumerator > --
> iterParser is about 20 lines, excluding the module header and imports.

< A.Done extra a -> E.yield a (E.Chunks [extra])

Maybe it would be better to check if extra is empty to produce
an empty list of chunks?

-- 
Nicolas Pouillard
http://nicolaspouillard.fr
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package

2010-08-22 Thread Richard O'Keefe

On Aug 21, 2010, at 4:12 AM, John Millikin wrote:
> This thought occurred to me, but really, how often are you going to
> have a 10 GiB **text** file with no newlines?

When you have a file developed on a system that follows a
different new-line convention.  I haven't seen a file that
big, but I'm sadly used to seeing humanly large files
display as single lines.

Of course if getLine/hGetLine accept *any* of CR, LF, CR+LF
as end-of-line (as opposed to using the platform native
convention), there's no problem.  That's a darned good idea
anyway.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package

2010-08-22 Thread John Millikin

On Sat, Aug 21, 2010 at 23:14, Paulo Tanimoto  wrote:
> One question: enumFile has type
>
>    enumFile :: FilePath -> Enumerator SomeException ByteString IO b
>
> and iterParser has type
>
>    iterParser :: Monad m => Parser a -> Iteratee ParseError ByteString m a
>
> How do we use both together?  Something in these lines won't type-check
>
>    E.run (E.enumFile "file" E.$$ (E.iterParser p))
>
> because the error types are different.

Forgot to mention that -- use the "mapError" function from
enumerator-0.2.1 thusly:

http://ianen.org/haskell/enumerator/api-docs/Data-Enumerator.html#v%3AmapError

parser :: Parser Foo

toExc :: Show a => a -> E.SomeException
toExc = E.SomeException . E.ErrorCall . show

main :: IO ()
main = do
run (enumFile "parsetest.txt" $$ mapError toExc $$ iterParser parser) 
>>= print

You don't have to map to SomeException -- any type will do. For
example, in a complex pipeline with real error handling at the other
end, you might want a custom error type so you'll know at what stage
the error occurred.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package

2010-08-21 Thread John Millikin

On Sat, Aug 21, 2010 at 15:35, Paulo Tanimoto  wrote:
> Apologies if I'm asking you to repeat yourself, but I couldn't find
> the explanation.  What was the reason why you went with IterateeM
> instead of IterateeMCPS?  Simplicity?

Iteratees are difficult enough to understand already -- requiring
prospective users to learn and understand CPS would just be another
roadblock. The CPS implementation is also slower -- I performed some
basic benchmarking of IterateeM.hs and IterateeMCPS.hs, and CPS is
only faster without optimizations. At -O, they are equal, and at -O2,
IterateeM is faster.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package

2010-08-21 Thread Paulo Tanimoto

John,

On Sat, Aug 21, 2010 at 5:06 PM, John Millikin  wrote:
>
> I think the API is pretty stable. Most of the significant research
> into iteratee-based APIs has already been performed by users of the
> "iteratee" library, and by Oleg. There might be a few
> backwards-compatible changes (new modules, new exports, etc). I'm not
> planning to make any large changes, such as Mr. Lato's transition to
> CPS-based iteratees.
>

Apologies if I'm asking you to repeat yourself, but I couldn't find
the explanation.  What was the reason why you went with IterateeM
instead of IterateeMCPS?  Simplicity?

Thanks,

Paulo
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package

2010-08-21 Thread John Millikin

On Sat, Aug 21, 2010 at 14:41, Michael Snoyman  wrote:
> Hey John,
> As I mentioned, I'm considering having persistent depend upon enumerator. Do
> you think it's too early in enumerator's life to do so and I should wait
> till the API stabilizes a bit more? Also, two other packages I would think
> to bring into the enumerator family would be:
> * yaml
> * wai-extra, providing an enumerator layer for more easily dealing with the
> Source and Enumerator datatypes in wai. I might just release a
> wai-enumerator package instead.
> Thanks again for your work on this,
> Michael

I think the API is pretty stable. Most of the significant research
into iteratee-based APIs has already been performed by users of the
"iteratee" library, and by Oleg. There might be a few
backwards-compatible changes (new modules, new exports, etc). I'm not
planning to make any large changes, such as Mr. Lato's transition to
CPS-based iteratees.

As long as you import the enumerator modules with "qualified" (to
avoid Prelude name clashes), it should be safe to start porting
libraries.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package

2010-08-21 Thread John Millikin

On Sat, Aug 21, 2010 at 14:17, Paulo Tanimoto  wrote:
> Cool, but is there a reason it won't work with version 0.2 you just released?
>
>  build-depends:
>    [...]
>    , enumerator >= 0.1 && < 0.2
>
> I noticed that when installing it.

Hah ... forgot to save the vim buffer. Corrected version uploaded.
Sorry about that.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package

2010-08-21 Thread Paulo Tanimoto

On Sat, Aug 21, 2010 at 3:36 PM, John Millikin  wrote:
>
> This sort of conversion is trivial. For an example, I've uploaded the
> attoparsec-enumerator package at <
> http://hackage.haskell.org/package/attoparsec-enumerator > --
> iterParser is about 20 lines, excluding the module header and imports.

Cool, but is there a reason it won't work with version 0.2 you just released?

  build-depends:
[...]
, enumerator >= 0.1 && < 0.2

I noticed that when installing it.

Paulo
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package

2010-08-21 Thread John Millikin

On Sat, Aug 21, 2010 at 12:44, Magnus Therning  wrote:
> As an aside, has anyone written the code necessary to convert a parser, such
> as e.g.  attoparsec, into an enumerator-iteratee[1]?

This sort of conversion is trivial. For an example, I've uploaded the
attoparsec-enumerator package at <
http://hackage.haskell.org/package/attoparsec-enumerator > --
iterParser is about 20 lines, excluding the module header and imports.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package

2010-08-21 Thread Magnus Therning

On 21/08/10 18:58, John Millikin wrote:
> I think the docs are wrong, or perhaps we're misunderstanding them.
> Magnus is correct.
>
> Attached is a test program which listens on two ports, 42000 (blocking
> IO) and 42001 (non-blocking). You can use netcat, telnet, etc, to send
> it data. The behavior is as Magnus describes: bytes from
> hGetNonBlocking are available immediately, while hGet waits for a full
> buffer (or EOF) before returning.
>
> This behavior obviously makes hGet unsuitable for enumHandle; my
> apologies for not understanding the problem sooner.

Thanks, but I suspect that it was my bad description of the issue that made
understanding the issue more problematic.

Anyway it's good we now understand each other, and even better that we agree
:-)

As an aside, has anyone written the code necessary to convert a parser, such
as e.g.  attoparsec, into an enumerator-iteratee[1]?

/M

[1] Similar to how attoparsec-iteratee does it for iteratee-iteratee.
-- 
Magnus Therning(OpenPGP: 0xAB4DFBA4)
magnus＠therning．org   Jabber: magnus＠therning．org
http://therning.org/magnus identi.ca|twitter: magthe



signature.asc
Description: OpenPGP digital signature
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package

2010-08-21 Thread John Millikin

On Sat, Aug 21, 2010 at 11:58, Judah Jacobson  wrote:
> You should note that in ghc>=6.12, hWaitForInput tries to decode the
> next character of input based on to the Handle's encoding.  As a
> result, it will block if the next multibyte sequence is incomplete,
> and it will throw an error if a multibyte sequence gets split between
> two chunks.
>
> I worked around this problem in Haskeline by temporarily setting stdin
> to BinaryMode; you may want to do something similar.
>
> Also, this issue caused a bug in bytestring with ghc-6.12:
> http://hackage.haskell.org/trac/ghc/ticket/3808
> which will be resolved by the new function 'hGetBufSome' (in ghc-6.14)
> that blocks only when there's no data to read:
> http://hackage.haskell.org/trac/ghc/ticket/4046
> That function might be useful for your package, though not portable to
> other implementations or older GHC versions.

You should not be reading bytestrings from text-mode handles.

The more I think about it, the more having a single Handle type for
both text and binary data causes problems. There should be some
separation so users don't accidentally use a text handle with binary
functions, and vice-versa:

openFile :: FilePath -> IOMode -> IO TextHandle
openBinaryFile :: FIlePath -> IOMode -> IO BinaryHandle
hGetBuf :: BinaryHandle -> Ptr a -> Int -> IO Int
Data.ByteString.hGet :: BinaryHandle -> IO ByteString
-- etc

then the enumerators would simply require the correct handle type:

Data.Enumerator.IO.enumHandle :: BinaryHandle -> Enumerator
SomeException ByteString IO b
Data.Enumerator.Text.enumHandle :: TextHandle -> Enumerator
SomeException Text IO b

I suppose the enumerators could verify the handle mode and throw an
exception if it's incorrect -- at least that way, it will fail
consistently rather than only in rare occasions.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package

2010-08-21 Thread John Millikin

On Sat, Aug 21, 2010 at 11:35, Gregory Collins  wrote:
> John Millikin  writes:
>
>> I think the docs are wrong, or perhaps we're misunderstanding them.
>> Magnus is correct.
>>
>> Attached is a test program which listens on two ports, 42000 (blocking
>> IO) and 42001 (non-blocking). You can use netcat, telnet, etc, to send
>> it data. The behavior is as Magnus describes: bytes from
>> hGetNonBlocking are available immediately, while hGet waits for a full
>> buffer (or EOF) before returning.
>
> "hSetBuffering handle NoBuffering"?
>
> The implementation as it is is fine IMO.

Disabling buffering doesn't change the behavior -- hGet h 20 still
doesn't return until the handle has at least 20 bytes of input
available.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package

2010-08-21 Thread Judah Jacobson

On Sat, Aug 21, 2010 at 10:58 AM, John Millikin  wrote:
> I think the docs are wrong, or perhaps we're misunderstanding them.
> Magnus is correct.
>
> Attached is a test program which listens on two ports, 42000 (blocking
> IO) and 42001 (non-blocking). You can use netcat, telnet, etc, to send
> it data. The behavior is as Magnus describes: bytes from
> hGetNonBlocking are available immediately, while hGet waits for a full
> buffer (or EOF) before returning.
>
> This behavior obviously makes hGet unsuitable for enumHandle; my
> apologies for not understanding the problem sooner.

You should note that in ghc>=6.12, hWaitForInput tries to decode the
next character of input based on to the Handle's encoding.  As a
result, it will block if the next multibyte sequence is incomplete,
and it will throw an error if a multibyte sequence gets split between
two chunks.

I worked around this problem in Haskeline by temporarily setting stdin
to BinaryMode; you may want to do something similar.

Also, this issue caused a bug in bytestring with ghc-6.12:
http://hackage.haskell.org/trac/ghc/ticket/3808
which will be resolved by the new function 'hGetBufSome' (in ghc-6.14)
that blocks only when there's no data to read:
http://hackage.haskell.org/trac/ghc/ticket/4046
That function might be useful for your package, though not portable to
other implementations or older GHC versions.

Best,
-Judah
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package

2010-08-21 Thread Gregory Collins

John Millikin  writes:

> I think the docs are wrong, or perhaps we're misunderstanding them.
> Magnus is correct.
>
> Attached is a test program which listens on two ports, 42000 (blocking
> IO) and 42001 (non-blocking). You can use netcat, telnet, etc, to send
> it data. The behavior is as Magnus describes: bytes from
> hGetNonBlocking are available immediately, while hGet waits for a full
> buffer (or EOF) before returning.

"hSetBuffering handle NoBuffering"?

The implementation as it is is fine IMO.

G
-- 
Gregory Collins 
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package

2010-08-21 Thread John Millikin

I think the docs are wrong, or perhaps we're misunderstanding them.
Magnus is correct.

Attached is a test program which listens on two ports, 42000 (blocking
IO) and 42001 (non-blocking). You can use netcat, telnet, etc, to send
it data. The behavior is as Magnus describes: bytes from
hGetNonBlocking are available immediately, while hGet waits for a full
buffer (or EOF) before returning.

This behavior obviously makes hGet unsuitable for enumHandle; my
apologies for not understanding the problem sooner.
import Control.Concurrent (forkIO, threadDelay)
import Control.Monad (forever, unless)
import Control.Monad.Fix (fix)
import qualified Data.ByteString as B
import Network
import System.IO

main :: IO ()
main = do
	blockingSock <- listenOn (PortNumber 42000)
	nonblockingSock <- listenOn (PortNumber 42001)
	
	forkIO $ acceptLoop B.hGet blockingSock "Blocking"
	forkIO $ acceptLoop nonblockingGet nonblockingSock "Non-blocking"
	forever $ threadDelay 100

nonblockingGet :: Handle -> Int -> IO B.ByteString
nonblockingGet h n = do
	hasInput <- catch (hWaitForInput h (-1)) (\_ -> return False)
	if hasInput
		then B.hGetNonBlocking h n
		else return B.empty

acceptLoop :: (Handle -> Int -> IO B.ByteString) -> Socket -> String -> IO ()
acceptLoop get sock label = fix $ \loop -> do
	(h, _, _) <- accept sock
	putStrLn $ label ++ " client connected"
	bytesLoop (get h)
	putStrLn $ label ++ " EOF"
	loop

bytesLoop :: (Int -> IO B.ByteString) -> IO ()
bytesLoop get = fix $ \loop -> do
	bytes <- get 20
	unless (B.null bytes) $ do
		putStrLn $ "bytes = " ++ show bytes
		loop
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package

2010-08-21 Thread Felipe Lessa

On Sat, Aug 21, 2010 at 5:40 AM, Magnus Therning wrote:
> It changes the timing. The iteratee will receive the data sooner (when it's
> available rather than when the buffer is full). This means it can fail
> *sooner*, in wall-clock time.

I still fail to see how this works. So I went to see the sources.

In [1] we can see how hGet and hGetNonBlocking are defined. The only
difference is that the former uses hGetBuf, and the latter uses
hGetBufNonBlocking.

[1]
http://hackage.haskell.org/packages/archive/bytestring/0.9.1.7/doc/html/src/Data-ByteString.html#line-1908

hGetBuf's main loop is bufRead [2], while hGetBufNonBlocking's main
loop is bufReadNonBlocking [3]. Both are very similar. The main
differences are RawIO.read vs RawIO.readNonBlocking [4], and
Buffered.fillReadBuffer vs Buffered.fillReadBuffer0 [5]. Reading
RawIO's documentation [4], we see that RawIO.read blocks only if there
is no data available. So it doesn't wait for the buffer to be fully
filled, it just "returns the available data". Unfortunately,
BufferedIO's documentation [5] doesn't specify if
Buffered.fillReadBuffer should return the available data without
blocking. However, it does specify that that it should be "blocking
if the are no bytes available".

[2]
http://hackage.haskell.org/packages/archive/base/4.2.0.1/doc/html/src/GHC-IO-Handle-Text.html#line-820
[3]
http://hackage.haskell.org/packages/archive/base/4.2.0.1/doc/html/src/GHC-IO-Handle-Text.html#bufReadNonBlocking
[4]
http://hackage.haskell.org/packages/archive/base/4.2.0.1/doc/html/src/GHC-IO-Device.html#RawIO
[5]
http://hackage.haskell.org/packages/archive/base/4.2.0.1/doc/html/src/GHC-IO-BufferedIO.html#BufferedIO

So, assuming that the semantics of BufferedIO are the same as RawIO's,
*both* are non-blocking whenever data is already available. None of
them wait until the buffer is full. The difference lies in whether
they block if there is no data available. However, when there isn't
data the enumarator *always* wants to block. So using non-blocking IO
doesn't give anything, only complicates the code.

Am I misreading the docs/source somewhere? =)

Cheers!

--
Felipe.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package

2010-08-21 Thread Magnus Therning

On 20/08/10 23:12, John Millikin wrote:
> On Fri, Aug 20, 2010 at 14:58, Magnus Therning  wrote:
>> Indeed.
>>
>> In many protocols it would force the attacker to send well-formed requests
>> though.  I think this is true for many text-based protocols like
>> HTTP.
>>
>> The looping can be handled effectively through hWaitForInput.
>>
>> There are also other reasons for doing non-blocking IO, not least that it
>> makes developing and manual testing a lot nicer.
>
> I think I'm failing to understand something.
>
> Using a non-blocking read doesn't change how the iteratees react to
> well- or mal-formed requests. All it does is change the failure
> condition from "blocked indefinitely" to "looping indefinitely".

It changes the timing.  The iteratee will receive the data sooner (when it's
available rather than when the buffer is full).  This means it can fail
*sooner*, in wall-clock time.

> Replacing the hGet with a combination of hWaitForInput /
> hGetNonBlocking would cause a third failure condition, "looping
> indefinitely with periodic blocks". This doesn't seem to be an
> improvement over simply blocking.

It is an improvement when data is trickling in.  In other cases it's no
improvement (besides that it'd be possible have time-outs on a "lower
level").

> Do you have any example code which works well using a non-blocking
> enumerator, but fails with a blocking one?

It's not about failing vs non-failing, it's about time of failure.  An
example
would be failing after reading a few bytes (the verb of a HTTP request) vs
failing after either reading 4k (which is the buffer size in iteratee, IIRC)
or when the client hangs up.

/M

-- 
Magnus Therning(OpenPGP: 0xAB4DFBA4)
magnus＠therning．org   Jabber: magnus＠therning．org
http://therning.org/magnus identi.ca|twitter: magthe



signature.asc
Description: OpenPGP digital signature
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package

2010-08-20 Thread John Millikin

On Fri, Aug 20, 2010 at 14:58, Magnus Therning  wrote:
> Indeed.
>
> In many protocols it would force the attacker to send well-formed requests
> though.  I think this is true for many text-based protocols like
> HTTP.
>
> The looping can be handled effectively through hWaitForInput.
>
> There are also other reasons for doing non-blocking IO, not least that it
> makes developing and manual testing a lot nicer.

I think I'm failing to understand something.

Using a non-blocking read doesn't change how the iteratees react to
well- or mal-formed requests. All it does is change the failure
condition from "blocked indefinitely" to "looping indefinitely".

Replacing the hGet with a combination of hWaitForInput /
hGetNonBlocking would cause a third failure condition, "looping
indefinitely with periodic blocks". This doesn't seem to be an
improvement over simply blocking.

Do you have any example code which works well using a non-blocking
enumerator, but fails with a blocking one?
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package

2010-08-20 Thread Magnus Therning

On 20/08/10 22:32, John Millikin wrote:
> On Fri, Aug 20, 2010 at 12:52, Magnus Therning  wrote:
>> You don't need to send that much data, the current implementation of
>> Enumerator uses hGet, which blocks, so just send the server a few bytes and
>> it'll be sitting there waiting for input until it times out (if ever).
>> Open a few hundred of those connections and you're likely to cause the
>> server to run out of FDs.  Of course this is already coded up in tools like
>> slowloris[1] :-)
>
> Correct me if I'm wrong, but I'm pretty sure changing the implementation to
> something non-blocking like hGetNonBlocking will not fix this. Hooking up an
> iteratee to an enumerator which doesn't block will cause it to loop forever,
> which is arguably worse than simply blocking.
>
> The best way I can think of to defeat a handle-exhaustion attack is to
> enforce a timeout on HTTP header parsing, using something like
> System.Timeout. This protects against slowloris, since requiring the
> entire header to be parsed within some fixed small period of time
> prevents the socket from being held open via slowly-trickled headers.

Indeed.

In many protocols it would force the attacker to send well-formed requests
though.  I think this is true for many text-based protocols like
HTTP.

The looping can be handled effectively through hWaitForInput.

There are also other reasons for doing non-blocking IO, not least that it
makes developing and manual testing a lot nicer.

/M

-- 
Magnus Therning(OpenPGP: 0xAB4DFBA4)
magnus＠therning．org   Jabber: magnus＠therning．org
http://therning.org/magnus identi.ca|twitter: magthe



signature.asc
Description: OpenPGP digital signature
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package

2010-08-20 Thread John Millikin

On Fri, Aug 20, 2010 at 12:52, Magnus Therning  wrote:
> You don't need to send that much data, the current implementation of
> Enumerator uses hGet, which blocks, so just send the server a few bytes and
> it'll be sitting there waiting for input until it times out (if ever).
> Open a
> few hundred of those connections and you're likely to cause the server
> to run
> out of FDs.  Of course this is already coded up in tools like
> slowloris[1] :-)

Correct me if I'm wrong, but I'm pretty sure changing the
implementation to something non-blocking like hGetNonBlocking will not
fix this. Hooking up an iteratee to an enumerator which doesn't block
will cause it to loop forever, which is arguably worse than simply
blocking.

The best way I can think of to defeat a handle-exhaustion attack is to
enforce a timeout on HTTP header parsing, using something like
System.Timeout. This protects against slowloris, since requiring the
entire header to be parsed within some fixed small period of time
prevents the socket from being held open via slowly-trickled headers.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package

2010-08-20 Thread Magnus Therning

On 20/08/10 17:30, Felipe Lessa wrote:
> On Fri, Aug 20, 2010 at 1:12 PM, John Millikin  wrote:
>> This thought occurred to me, but really, how often are you going to
>> have a 10 GiB **text** file with no newlines? Remember, this is for
>> text (log files, INI-style configs, plain .txt), not binary (HTML,
>> XML, JSON). Off the top of my head, I can't think of any case where
>> you'd expect to see 10 GiB in a single line.
>>
>> In the worst case, you can just use "decode" to process bytes coming
>> from the ByteString-based enumHandle, which should give nicely chunked
>> text.
>
> I was thinking about an attacker, not a use case.  Think of a web
> server accepting queries using iteratees internally.  This may open
> door to at least DoS attacks.

You don't need to send that much data, the current implementation of
Enumerator uses hGet, which blocks, so just send the server a few bytes and
it'll be sitting there waiting for input until it times out (if ever).
Open a
few hundred of those connections and you're likely to cause the server
to run
out of FDs.  Of course this is already coded up in tools like
slowloris[1] :-)

/M

[1] http://ha.ckers.org/slowloris/
-- 
Magnus Therning(OpenPGP: 0xAB4DFBA4)
magnus＠therning．org   Jabber: magnus＠therning．org
http://therning.org/magnus identi.ca|twitter: magthe



signature.asc
Description: OpenPGP digital signature
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package

2010-08-20 Thread John Millikin

On Fri, Aug 20, 2010 at 09:30, Felipe Lessa  wrote:
> I was thinking about an attacker, not a use case.  Think of a web
> server accepting queries using iteratees internally.  This may open
> door to at least DoS attacks.

Web servers parse/generate HTTP, which is byte-based. They should be
using the bytes-based handle enumerator.

> And then, we use iteratees because we don't like the unpredictability
> of lazy IO.  Why should iteratees be unpredictable when dealing with
> Text?  Besides the memory consumption problem, there may be
> performance problems if the lines are too short.

If you don't want unpredictable performance, use bytes-based IO and
decode it with "decode utf8" or something similar.

Text-based IO merely exists to solve the most common case, which is a
small file in local encoding with relatively short (< 200 char) lines.
If you need to handle more complicated cases, such as:

* Files in fixed or self-described encodings (JSON, XML)
* Files with unknown encodings (HTML, RSS)
* Files with content in multiple encodings (EMail)
* Files containing potentially malicious input (such as public server log files)

Then you need to read them as bytes and decide yourself which decoding
is necessary.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package

2010-08-20 Thread Felipe Lessa

On Fri, Aug 20, 2010 at 1:12 PM, John Millikin  wrote:
> This thought occurred to me, but really, how often are you going to
> have a 10 GiB **text** file with no newlines? Remember, this is for
> text (log files, INI-style configs, plain .txt), not binary (HTML,
> XML, JSON). Off the top of my head, I can't think of any case where
> you'd expect to see 10 GiB in a single line.
>
> In the worst case, you can just use "decode" to process bytes coming
> from the ByteString-based enumHandle, which should give nicely chunked
> text.

I was thinking about an attacker, not a use case.  Think of a web
server accepting queries using iteratees internally.  This may open
door to at least DoS attacks.

And then, we use iteratees because we don't like the unpredictability
of lazy IO.  Why should iteratees be unpredictable when dealing with
Text?  Besides the memory consumption problem, there may be
performance problems if the lines are too short.

Cheers! =)

-- 
Felipe.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package

2010-08-20 Thread John Millikin

On Fri, Aug 20, 2010 at 08:59, Felipe Lessa  wrote:
> On Fri, Aug 20, 2010 at 12:51 PM, John Millikin  wrote:
>> Currently, I'm planning on the following type signatures for D.E.Text.
>> 'enumHandle' will use Text's hGetLine, since there doesn't seem to be
>> any text-based equivalent to ByteString's 'hGet'.
>
> CC'ing text's maintainer.  Using 'hGetLine' will cause baaad surprises
> when you process a 10 GiB file with no '\n' in sight.

This thought occurred to me, but really, how often are you going to
have a 10 GiB **text** file with no newlines? Remember, this is for
text (log files, INI-style configs, plain .txt), not binary (HTML,
XML, JSON). Off the top of my head, I can't think of any case where
you'd expect to see 10 GiB in a single line.

In the worst case, you can just use "decode" to process bytes coming
from the ByteString-based enumHandle, which should give nicely chunked
text.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package

2010-08-20 Thread Felipe Lessa

On Fri, Aug 20, 2010 at 12:51 PM, John Millikin  wrote:
> Currently, I'm planning on the following type signatures for D.E.Text.
> 'enumHandle' will use Text's hGetLine, since there doesn't seem to be
> any text-based equivalent to ByteString's 'hGet'.

CC'ing text's maintainer.  Using 'hGetLine' will cause baaad surprises
when you process a 10 GiB file with no '\n' in sight.

Cheers! =)

-- 
Felipe.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package

2010-08-20 Thread John Millikin

On Fri, Aug 20, 2010 at 04:01, Simon Marlow  wrote:
> Handle IO is also doing Unicode encoding/decoding, which iteratees bypass.
>  Have you thought about how to incorporate encoding/decoding?

Yes; there will be a module Data.Enumerator.Text which contains
locale-based IO, enumeratee-based encoding/decoding, and so forth.
Since "iteratee" doesn't have any text-based IO, I figured it wasn't
necessary for a first release; getting feedback on the basic soundness
of the package was more important.

Currently, I'm planning on the following type signatures for D.E.Text.
'enumHandle' will use Text's hGetLine, since there doesn't seem to be
any text-based equivalent to ByteString's 'hGet'.

enumHandle :: Handle -> Enumerator SomeException Text IO b

enumFile :: FilePath -> Enumerator SomeException Text IO b

data Codec = Codec
{ codecName :: Text
, codecEncode :: Text -> Either SomeException ByteString
, codecDecode :: ByteString -> Either SomeException (Text, ByteString)
}

encode :: Codec -> Enumeratee SomeException Text ByteString m b

decode :: Codec -> Enumeratee SomeException ByteString Text m b

utf8 :: Codec

utf16le :: Codec

utf16be :: Codec

utf32le :: Codec

utf32be :: Codec

ascii :: Codec

iso8859_1 :: Codec

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package

2010-08-20 Thread Simon Marlow


On 19/08/2010 18:21, John Millikin wrote:

On Wed, Aug 18, 2010 at 23:33, Jason Dagit  wrote:

The main reason I would use iteratees is for performance reasons.  To help
me, as a potential consumer of your library, could you please provide
benchmarks for comparing the performance of enumerator with say, a)
iteratee, b) lazy/strict bytestring, and c) Prelude functions?
I'm interested in both max memory consumption and run-times.  Using
criterion and/or progression to get the run-times would be icing on an
already delicious cake!


Oleg has some benchmarks of his implementation at<
http://okmij.org/ftp/Haskell/Iteratee/Lazy-vs-correct.txt>, which
clock iteratees at about twice as fast as lazy IO. He also compares
them to a native "wc", but his comparison is flawed, because he's
comparing a String iteratee vs byte-based wc.


Handle IO is also doing Unicode encoding/decoding, which iteratees 
bypass.  Have you thought about how to incorporate encoding/decoding?


Cheers,
Simon
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package

Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package

Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package

Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package

Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package

Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package

Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package

Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package

Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package

Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package

Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package

Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package

Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package

Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package

Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package

Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package

Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package

Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package

Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package

Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package

Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package

Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package

Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package

Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package

Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package

Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package

Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package

Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package

Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package

Re: [Haskell] Re: [Haskell-cafe] ANNOUNCE: enumerator, an alternative iteratee package

30 matches

Site Navigation

Mail list logo

Footer information