subject:"\[Haskell\-cafe\] Re\: \[darcs\-users\] Iteratees, streams, and mmap"

[Haskell-cafe] Re: [darcs-users] Iteratees, streams, and mmap

2010-01-05 Thread Heinrich Apfelmus

Jason Dagit wrote:
 Heinrich Apfelmus wrote:

 How about tracking the requirement of bounded in the type system? In
 particular, I'm thinking of a type class

class NFData a = Small a

 where the idea is that all types that can be stored in constant space
 are members of this class. For example, we have

instance Small ()
instance Small Int
instance Small Char
instance (Small a, Small b) = Small (a,b)
instance (Small a, Small b) = Small (Either a b)

 but recursive types like  [a]  or  String  are not allowed to be members.

On second (and late) thought, this idea is not as good as it sounds and
needs more thought.

The thing is if A is an instance of  Small , we are not guaranteed that
a value  x :: A  will take little memory, we are only guaranteed that
its *normal form* takes little memory.

Now, due to pervasive lazy evaluation, it's quite impossible to track
whether a value is in normal form in the type system.

 It seems like we also need:
instance Small (IO ())

 Which is not necessarily bounded due to things like IORefs. [...]

In the light of the above, I think this is related to the issue that we
can't evaluate IO () to normal form at all, i.e. there is no  deepSeq
for values of type  IO () .

 I could also see us needing this:
 
 bracketSmall :: (Small c) = IO a - (a - IO b) - (a - IO c) - IO c
 
 I'm not sure if b needs to be Small, since it's just supposed to be the
 return value of the deallocation step.

Thanks to parametricity, we know that  b  is thrown away, it can never
appear in the result type  c .

 It seems that having functions like bracketSmall are
 necessary if we want to hide the stream itself from being exposed outside of
 the traversal function.  For example, your foldlSmall doesn't leak, but
 something at the same level of scope as it could leak the list.

Yep, though this does have the unfortunate (?) consequence that we
cannot use  bracketSmall  to return any large values anymore, even if
they do not leak the stream. (A notion which literally gets muddier in
the course of the program because it might be that  c  does not leak the
vanilla stream, but for example every character of it.)


Regards,
Heinrich Apfelmus

--
http://apfelmus.nfshost.com

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

[Haskell-cafe] Re: [darcs-users] Iteratees, streams, and mmap

2009-12-13 Thread Heinrich Apfelmus

Jason Dagit wrote:
 Johann Höchtl wrote:

 As a beginner to Haskell, I am only 1/3 through RWH, those lines scare
 me in a sense to question my effort. I simply can not distinguish if
 this discussion is somewhat pathological in a sense that every access
 to the outside world imposes dangers and an additional exception
 handler here and there and an additional if-statement to handle error
 return codes will suffice.

 
 We're not talking about exception handling :)  And yes, Heinrich is talking
 about pathological cases.
 
 
 Or lazy evaluation, IO monads and the whole story behind
 unsafePerformIO was an additional layer of self-deception and
 unpredictable effects from the outside world and lazy evaluation can
 NEVER be satisfactory handled.

 
 We're not talking about unsafePerformIO either.
 
 The discussion at hand is much simpler.  Given a large stream of data, how
 should you style your code so that it's easy to reason about the space
 usage?  This is an important question in every programming language I've
 ever used.  Also, IO need not enter the discussion except that in darcs the
 streams of data come from the disk.  I think moving the discussion to
 haskell-cafe was a mistake.  I said what I said in a very specific context
 (that of the darcs developers mailing list), which is likely no longer
 clear.
 
 Sorry for any distress this has caused you.

Ah, I didn't mean to cause distress to anyone by cross-posting to the
café. I just thought that a discussion Is there a style of writing
Haskell that makes it easy to ensure bounded space usage? could be of
general interest.



Johann, don't worry, lazy evaluation is awesome. :) See for example

   http://apfelmus.nfshost.com/quicksearch.html


It's just that laziness not a free lunch; predicting memory (space)
usage does become more difficult. Which is not really surprising since
deferring the evaluation of thunks until they are demanded also means
that they will store different expressions of possibly different sizes
before and after they are demanded.

Classic examples are

   ones = 1:ones

taking O(1) memory instead of an infinite amount and

   foldl (+) 0 [1..n]

taking O(n) memory as opposed to

   foldl' (+) 0 [1..n]

which only takes O(1) memory.



Regards,
Heinrich Apfelmus

--
http://apfelmus.nfshost.com

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

[Haskell-cafe] Re: [darcs-users] Iteratees, streams, and mmap

2009-12-13 Thread Heinrich Apfelmus

Jason Dagit wrote:
 withPending :: (a - Patch - a) -  IO a

 And withPending would start the streaming and make sure that the stream
 cannot be visible as a data dependency outside of withPending.

 [...]

 Heinrich Apfelmus wrote:

 In other words, exporting only a  foldl' -like interface does not really
 prevent us from writing functions that have O(n) instead of O(1) space
 usage. But trying to rectify that with the  forall s  trick is a doomed
 idea, too.
 
 I realize it's not perfect, but the problem we have now is that it's too
 easy to write things that have dismal space usage.  If we can't force proper
 space usage, how can we make it more natural to have bounded space?  Or at
 least a good approximation.
 
 It seems that:
  * foldl'-style helps
  * rank-n can help
  * no approach I've seen *forces* the behavior we want
  * existing code and bug reports demonstrate we need to improve the
 situation

Yep, I agree completely; except for the rank-n (rank-2) trick where I'm
more pessimistic.


I think rank-2 works fine for things like ensuring that file handles are
closed properly. But for the purpose of ensuring O(1) space usage, I
think that

withPending . flip $ const (() :)

demonstrates that rank-2 is not worth the complexity they bring. Still,
the idea of tracking resource usage in the type system is attractive.


 I'm open to suggestions on how to ensure the code has the space behavior I
 want.

I've got another, unfinished idea:

 Namely,  withPending  does not guarantee that the stream does not leak,
 it only makes it more natural/convenient to formulate one's code so that
 it doesn't leak. In particular, using  (:)  as argument pretty much
 defeats the whole purpose:
 
 Right.  And the iteratee library points out that your iteratees have to be
 well-behaved (I think there they say bounded).  I'm well aware of this
 issue and thanks for pointing it out for others who are reading along.

How about tracking the requirement of bounded in the type system? In
particular, I'm thinking of a type class

class NFData a = Small a

where the idea is that all types that can be stored in constant space
are members of this class. For example, we have

instance Small ()
instance Small Int
instance Small Char
instance (Small a, Small b) = Small (a,b)
instance (Small a, Small b) = Small (Either a b)

but recursive types like  [a]  or  String  are not allowed to be members.

Then,

withPending :: Small a = (a - Patch - a) - IO a

which is based on the function

foldlSmall :: Small b = (b - a - b) - b - [a] - b
foldlSmall f b [] =
foldlSmall f b (x:xs) = foldlSmall f b' xs
where b' = rnf (f b x)

is pretty much guaranteed to run in O(1) space. Well, that depends on  f
, but the point is it's not  foldlSmall  who introduces the space leak,
it's the argument that takes all the blame.


 In other words, the human effort to make the code behave how we want is
 currently too high and that's the issue I want to address.  I don't know how
 we could make it impossible to have space leaks, although that would be
 interesting.

I opine that aiming for the latter (making space leaks impossible) while
experimenting brings you closer to the actual goal (making them easy to
avoid) when it comes to putting these experiments into practice. But
that's just me.



Regards,
Heinrich Apfelmus

--
http://apfelmus.nfshost.com

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Re: [darcs-users] Iteratees, streams, and mmap

2009-12-13 Thread Jason Dagit

On Sun, Dec 13, 2009 at 3:47 AM, Heinrich Apfelmus 
apfel...@quantentunnel.de wrote:

 How about tracking the requirement of bounded in the type system? In
 particular, I'm thinking of a type class

class NFData a = Small a

 where the idea is that all types that can be stored in constant space
 are members of this class. For example, we have

instance Small ()
instance Small Int
instance Small Char
instance (Small a, Small b) = Small (a,b)
instance (Small a, Small b) = Small (Either a b)

 but recursive types like  [a]  or  String  are not allowed to be members.


I like this.  It's easy to audit for weird instances of Small.

It seems like we also need:
instance Small (IO ())

Which is not necessarily bounded due to things like IORefs.  See below for
why I think it would be a useful instance.


 Then,

withPending :: Small a = (a - Patch - a) - IO a

 which is based on the function

foldlSmall :: Small b = (b - a - b) - b - [a] - b
foldlSmall f b [] =
foldlSmall f b (x:xs) = foldlSmall f b' xs
where b' = rnf (f b x)

 is pretty much guaranteed to run in O(1) space. Well, that depends on  f
 , but the point is it's not  foldlSmall  who introduces the space leak,
 it's the argument that takes all the blame.


I could also see, us needing this:

bracketSmall :: (Small c) = IO a - (a - IO b) - (a - IO c) - IO c

I'm not sure if b needs to be Small, since it's just supposed to be the
return value of the deallocation step.

Actually, since we didn't (yet) make functions an instance of Small, we may
need to say this type:

bracketFoldSmall :: (Small c) = IO a - (a - IO b) - ((c - a - c) - c
- a - c) - IO c

Here I'm imagining 'a', being something like a list of patches.  It's a big
stream we want to process.  Furthermore, it seems like we probably want c =
IO ().  This would be useful if we wanted to do some output for each patch
in the sequence.  But, as I think about it, it seems like if we allow ST or
IO in these small functions or as instances of Small then we lose our
guarantees.  Yet, it seems that having functions like bracketSmall are
necessary if we want to hide the stream itself from being exposed outside of
the traversal function.  For example, your foldlSmall doesn't leak, but
something at the same level of scope as it could leak the list.

Jason
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

[Haskell-cafe] Re: [darcs-users] Iteratees, streams, and mmap

2009-12-12 Thread Jason Dagit

Hi Heinrich,

On Sat, Dec 12, 2009 at 2:42 AM, Heinrich Apfelmus
apfel...@quantentunnel.de wrote:

Jason Dagit wrote:
My next experiment will be to find ways to express take this operation
and
apply it to a stream without letting the stream leak. One implication is
that gzReadFilePS should not be used outside of a core set of modules
which
have been auideted to be resource concious. Another implication is that
we
need to be really careful about wether or not we allow returning of
sequences of patches. Possibly, we need several foldl-like functions
that
open the stream internally. For example, to process the pending maybe we
should have:
withPending :: (a - Patch - a) - IO a

And withPending would start the streaming and make sure that the stream
cannot be visible as a data dependency outside of withPending.

Just a small comment on a potential flaw in this scheme and the
observation that even the rank-2 type trick from the ST s monad
wouldn't help.

I would say it does help, but it doesn't make it perfect.

Namely, withPending does not guarantee that the stream does not leak,
it only makes it more natural/convenient to formulate one's code so that
it doesn't leak. In particular, using (:) as argument pretty much
defeats the whole purpose:

Right. And the iteratee library points out that your iteratees have to be
well-behaved (I think there they say bounded). I'm well aware of this
issue and thanks for pointing it out for others who are reading along.

withPending (flip (:))

Fortunately, the type system can ensure that the patches don't leak.

withPending :: (forall s. a - Patch s - a) - IO a

Now, a may not mention s and the type checker will reject flip (:)
as argument. See also

Oleg Kiselyov, Chung-chieh Shan.
Lightweight Monadic Regions.

http://www.cs.rutgers.edu/~ccshan/capability/region-io.pdfhttp://www.cs.rutgers.edu/%7Eccshan/capability/region-io.pdf

for an elaboration of this technique.

I'm still on the fence as to whether this style of writing it will add value
greater than the complexity it brings. I am certainly considering it :)
The darcs source does other things that are also fairly complex.

However, the line between leaking and not leaking is very thin here. As
soon as we are given for example a function

name :: Patch s - String

that discards the s , its results can leak, in the sense that we
could now build a list of names

foo :: IO [String]
foo = withPending . flip $ (:) . name

Even worse, any type a that doesn't have O(1) space usage will leak

bar :: IO [()]
bar = withPending . flip $ const (() :)

In other words, exporting only a foldl' -like interface does not really
prevent us from writing functions that have O(n) instead of O(1) space
usage. But trying to rectify that with the forall s trick is a doomed
idea, too.

I realize it's not perfect, but the problem we have now is that it's too
easy to write things that have dismal space usage. If we can't force proper
space usage, how can we make it more natural to have bounded space? Or at
least a good approximation.

It seems that:
* foldl'-style helps
* rank-n can help
* no approach I've seen *forces* the behavior we want
* existing code and bug reports demonstrate we need to improve the
situation

I'm open to suggestions on how to ensure the code has the space behavior I
want. Lazy IO* and streams of patches is more compositional and natural to
Haskell programmers, but it seems that it's too hard to ensure the code has
reasonable space usage. At least where the darcs source is concerned.
Therefore, I think the status quo demonstrates that in the darcs source it's
worth experimenting with alternatives to lazy io and streams. In other
words, the human effort to make the code behave how we want is currently too
high and that's the issue I want to address. I don't know how we could make
it impossible to have space leaks, although that would be interesting.

(*) Note: Lazy IO itself is used in very few places in darcs these days
because it has lead to serious bugs. These days me point is more about big
streams getting retained. Finding where and why has proven difficult.

Thanks,
Jason
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

[Haskell-cafe] Re: [darcs-users] Iteratees, streams, and mmap

2009-12-12 Thread Johann Höchtl

On Dec 12, 7:13 pm, Jason Dagit da...@codersbase.com wrote:
Hi Heinrich,

On Sat, Dec 12, 2009 at 2:42 AM, Heinrich Apfelmus

apfel...@quantentunnel.de wrote:
Jason Dagit wrote:
My next experiment will be to find ways to express take this operation
and
apply it to a stream without letting the stream leak. One implication is
that gzReadFilePS should not be used outside of a core set of modules
which
have been auideted to be resource concious. Another implication is that
we
need to be really careful about wether or not we allow returning of
sequences of patches. Possibly, we need several foldl-like functions
that
open the stream internally. For example, to process the pending maybe we
should have:
withPending :: (a - Patch - a) - IO a

And withPending would start the streaming and make sure that the stream
cannot be visible as a data dependency outside of withPending.

Just a small comment on a potential flaw in this scheme and the
observation that even the rank-2 type trick from the ST s monad
wouldn't help.

I would say it does help, but it doesn't make it perfect.

withPending (flip (:))

Fortunately, the type system can ensure that the patches don't leak.

withPending :: (forall s. a - Patch s - a) - IO a

Now, a may not mention s and the type checker will reject flip (:)
as argument. See also

Oleg Kiselyov, Chung-chieh Shan.
Lightweight Monadic Regions.
http://www.cs.rutgers.edu/~ccshan/capability/region-io.pdfhttp://www.cs.rutgers.edu/%7Eccshan/capability/region-io.pdf

for an elaboration of this technique.

However, the line between leaking and not leaking is very thin here. As
soon as we are given for example a function

name :: Patch s - String

that discards the s , its results can leak, in the sense that we
could now build a list of names

foo :: IO [String]
foo = withPending . flip $ (:) . name

Even worse, any type a that doesn't have O(1) space usage will leak

bar :: IO [()]
bar = withPending . flip $ const (() :)

It seems that:
* foldl'-style helps
* rank-n can help
* no approach I've seen *forces* the behavior we want
* existing code and bug reports demonstrate we need to improve the
situation

As a beginner to Haskell, I am only 1/3 through RWH, those lines scare
me in a sense to question my effort. I simply can not distinguish if
this discussion is somewhat pathological in a sense that every access
to the outside world imposes dangers and an additional exception
handler here and there and an additional if-statement to handle error
return codes will suffice.
Or lazy evaluation, IO monads and the whole story behind
unsafePerformIO was an additional layer of self-deception and
unpredictable effects from the outside world and lazy evaluation can
NEVER be satisfactory handled.

Thanks,
Jason

Re: [Haskell-cafe] Re: [darcs-users] Iteratees, streams, and mmap

2009-12-12 Thread Jason Dagit

On Sat, Dec 12, 2009 at 1:37 PM, Johann Höchtl johann.hoec...@gmail.comwrote:

On Dec 12, 7:13 pm, Jason Dagit da...@codersbase.com wrote:
Hi Heinrich,

On Sat, Dec 12, 2009 at 2:42 AM, Heinrich Apfelmus

apfel...@quantentunnel.de wrote:
Jason Dagit wrote:
My next experiment will be to find ways to express take this
operation
and
apply it to a stream without letting the stream leak. One
implication is
that gzReadFilePS should not be used outside of a core set of modules
which
have been auideted to be resource concious. Another implication is
that
we
need to be really careful about wether or not we allow returning of
sequences of patches. Possibly, we need several foldl-like functions
that
open the stream internally. For example, to process the pending
maybe we
should have:
withPending :: (a - Patch - a) - IO a

And withPending would start the streaming and make sure that the
stream
cannot be visible as a data dependency outside of withPending.

Just a small comment on a potential flaw in this scheme and the
observation that even the rank-2 type trick from the ST s monad
wouldn't help.

I would say it does help, but it doesn't make it perfect.

Namely, withPending does not guarantee that the stream does not leak,
it only makes it more natural/convenient to formulate one's code so
that
it doesn't leak. In particular, using (:) as argument pretty much
defeats the whole purpose:

Right. And the iteratee library points out that your iteratees have to
be
well-behaved (I think there they say bounded). I'm well aware of this
issue and thanks for pointing it out for others who are reading along.

withPending (flip (:))

Fortunately, the type system can ensure that the patches don't leak.

withPending :: (forall s. a - Patch s - a) - IO a

Now, a may not mention s and the type checker will reject flip (:)
as argument. See also

Oleg Kiselyov, Chung-chieh Shan.
Lightweight Monadic Regions.

http://www.cs.rutgers.edu/~ccshan/capability/region-io.pdfhttp://www.cs.rutgers.edu/%7Eccshan/capability/region-io.pdf
http://www.cs.rutgers.edu/%7Eccshan/capability/region-io.pdf

for an elaboration of this technique.

I'm still on the fence as to whether this style of writing it will add
value
greater than the complexity it brings. I am certainly considering it :)
The darcs source does other things that are also fairly complex.

However, the line between leaking and not leaking is very thin here. As
soon as we are given for example a function

name :: Patch s - String

that discards the s , its results can leak, in the sense that we
could now build a list of names

foo :: IO [String]
foo = withPending . flip $ (:) . name

Even worse, any type a that doesn't have O(1) space usage will leak

bar :: IO [()]
bar = withPending . flip $ const (() :)

In other words, exporting only a foldl' -like interface does not
really
prevent us from writing functions that have O(n) instead of O(1) space
usage. But trying to rectify that with the forall s trick is a doomed
idea, too.

I realize it's not perfect, but the problem we have now is that it's too
easy to write things that have dismal space usage. If we can't force
proper
space usage, how can we make it more natural to have bounded space? Or
at
least a good approximation.

It seems that:
* foldl'-style helps
* rank-n can help
* no approach I've seen *forces* the behavior we want
* existing code and bug reports demonstrate we need to improve the
situation

I'm open to suggestions on how to ensure the code has the space behavior
I
want. Lazy IO* and streams of patches is more compositional and natural
to
Haskell programmers, but it seems that it's too hard to ensure the code
has
reasonable space usage. At least where the darcs source is concerned.
Therefore, I think the status quo demonstrates that in the darcs source
it's
worth experimenting with alternatives to lazy io and streams. In other
words, the human effort to make the code behave how we want is currently
too
high and that's the issue I want to address. I don't know how we could
make
it impossible to have space leaks, although that would be interesting.

We're not talking about exception handling :) And yes, Heinrich is talking
about pathological cases.

Or lazy evaluation, IO monads and the whole story behind
unsafePerformIO

[Haskell-cafe] Re: [darcs-users] Iteratees, streams, and mmap

[Haskell-cafe] Re: [darcs-users] Iteratees, streams, and mmap

[Haskell-cafe] Re: [darcs-users] Iteratees, streams, and mmap

Re: [Haskell-cafe] Re: [darcs-users] Iteratees, streams, and mmap

[Haskell-cafe] Re: [darcs-users] Iteratees, streams, and mmap

[Haskell-cafe] Re: [darcs-users] Iteratees, streams, and mmap

Re: [Haskell-cafe] Re: [darcs-users] Iteratees, streams, and mmap

7 matches

Site Navigation

Mail list logo

Footer information