[Haskell-cafe] Re: [darcs-users] Iteratees, streams, and mmap

2010-01-05 Thread Heinrich Apfelmus
Jason Dagit wrote:
 Heinrich Apfelmus wrote:

 How about tracking the requirement of bounded in the type system? In
 particular, I'm thinking of a type class

class NFData a = Small a

 where the idea is that all types that can be stored in constant space
 are members of this class. For example, we have

instance Small ()
instance Small Int
instance Small Char
instance (Small a, Small b) = Small (a,b)
instance (Small a, Small b) = Small (Either a b)

 but recursive types like  [a]  or  String  are not allowed to be members.

On second (and late) thought, this idea is not as good as it sounds and
needs more thought.

The thing is if A is an instance of  Small , we are not guaranteed that
a value  x :: A  will take little memory, we are only guaranteed that
its *normal form* takes little memory.

Now, due to pervasive lazy evaluation, it's quite impossible to track
whether a value is in normal form in the type system.

 It seems like we also need:
instance Small (IO ())

 Which is not necessarily bounded due to things like IORefs. [...]

In the light of the above, I think this is related to the issue that we
can't evaluate IO () to normal form at all, i.e. there is no  deepSeq
for values of type  IO () .

 I could also see us needing this:
 
 bracketSmall :: (Small c) = IO a - (a - IO b) - (a - IO c) - IO c
 
 I'm not sure if b needs to be Small, since it's just supposed to be the
 return value of the deallocation step.

Thanks to parametricity, we know that  b  is thrown away, it can never
appear in the result type  c .

 It seems that having functions like bracketSmall are
 necessary if we want to hide the stream itself from being exposed outside of
 the traversal function.  For example, your foldlSmall doesn't leak, but
 something at the same level of scope as it could leak the list.

Yep, though this does have the unfortunate (?) consequence that we
cannot use  bracketSmall  to return any large values anymore, even if
they do not leak the stream. (A notion which literally gets muddier in
the course of the program because it might be that  c  does not leak the
vanilla stream, but for example every character of it.)


Regards,
Heinrich Apfelmus

--
http://apfelmus.nfshost.com

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: [darcs-users] Iteratees, streams, and mmap

2009-12-13 Thread Heinrich Apfelmus
Jason Dagit wrote:
 Johann Höchtl wrote:

 As a beginner to Haskell, I am only 1/3 through RWH, those lines scare
 me in a sense to question my effort. I simply can not distinguish if
 this discussion is somewhat pathological in a sense that every access
 to the outside world imposes dangers and an additional exception
 handler here and there and an additional if-statement to handle error
 return codes will suffice.

 
 We're not talking about exception handling :)  And yes, Heinrich is talking
 about pathological cases.
 
 
 Or lazy evaluation, IO monads and the whole story behind
 unsafePerformIO was an additional layer of self-deception and
 unpredictable effects from the outside world and lazy evaluation can
 NEVER be satisfactory handled.

 
 We're not talking about unsafePerformIO either.
 
 The discussion at hand is much simpler.  Given a large stream of data, how
 should you style your code so that it's easy to reason about the space
 usage?  This is an important question in every programming language I've
 ever used.  Also, IO need not enter the discussion except that in darcs the
 streams of data come from the disk.  I think moving the discussion to
 haskell-cafe was a mistake.  I said what I said in a very specific context
 (that of the darcs developers mailing list), which is likely no longer
 clear.
 
 Sorry for any distress this has caused you.

Ah, I didn't mean to cause distress to anyone by cross-posting to the
café. I just thought that a discussion Is there a style of writing
Haskell that makes it easy to ensure bounded space usage? could be of
general interest.



Johann, don't worry, lazy evaluation is awesome. :) See for example

   http://apfelmus.nfshost.com/quicksearch.html


It's just that laziness not a free lunch; predicting memory (space)
usage does become more difficult. Which is not really surprising since
deferring the evaluation of thunks until they are demanded also means
that they will store different expressions of possibly different sizes
before and after they are demanded.

Classic examples are

   ones = 1:ones

taking O(1) memory instead of an infinite amount and

   foldl (+) 0 [1..n]

taking O(n) memory as opposed to

   foldl' (+) 0 [1..n]

which only takes O(1) memory.



Regards,
Heinrich Apfelmus

--
http://apfelmus.nfshost.com

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: [darcs-users] Iteratees, streams, and mmap

2009-12-13 Thread Heinrich Apfelmus
Jason Dagit wrote:
 withPending :: (a - Patch - a) -  IO a

 And withPending would start the streaming and make sure that the stream
 cannot be visible as a data dependency outside of withPending.

 [...]

 Heinrich Apfelmus wrote:

 In other words, exporting only a  foldl' -like interface does not really
 prevent us from writing functions that have O(n) instead of O(1) space
 usage. But trying to rectify that with the  forall s  trick is a doomed
 idea, too.
 
 I realize it's not perfect, but the problem we have now is that it's too
 easy to write things that have dismal space usage.  If we can't force proper
 space usage, how can we make it more natural to have bounded space?  Or at
 least a good approximation.
 
 It seems that:
  * foldl'-style helps
  * rank-n can help
  * no approach I've seen *forces* the behavior we want
  * existing code and bug reports demonstrate we need to improve the
 situation

Yep, I agree completely; except for the rank-n (rank-2) trick where I'm
more pessimistic.


I think rank-2 works fine for things like ensuring that file handles are
closed properly. But for the purpose of ensuring O(1) space usage, I
think that

withPending . flip $ const (() :)

demonstrates that rank-2 is not worth the complexity they bring. Still,
the idea of tracking resource usage in the type system is attractive.


 I'm open to suggestions on how to ensure the code has the space behavior I
 want.

I've got another, unfinished idea:

 Namely,  withPending  does not guarantee that the stream does not leak,
 it only makes it more natural/convenient to formulate one's code so that
 it doesn't leak. In particular, using  (:)  as argument pretty much
 defeats the whole purpose:
 
 Right.  And the iteratee library points out that your iteratees have to be
 well-behaved (I think there they say bounded).  I'm well aware of this
 issue and thanks for pointing it out for others who are reading along.

How about tracking the requirement of bounded in the type system? In
particular, I'm thinking of a type class

class NFData a = Small a

where the idea is that all types that can be stored in constant space
are members of this class. For example, we have

instance Small ()
instance Small Int
instance Small Char
instance (Small a, Small b) = Small (a,b)
instance (Small a, Small b) = Small (Either a b)

but recursive types like  [a]  or  String  are not allowed to be members.

Then,

withPending :: Small a = (a - Patch - a) - IO a

which is based on the function

foldlSmall :: Small b = (b - a - b) - b - [a] - b
foldlSmall f b [] =
foldlSmall f b (x:xs) = foldlSmall f b' xs
where b' = rnf (f b x)

is pretty much guaranteed to run in O(1) space. Well, that depends on  f
, but the point is it's not  foldlSmall  who introduces the space leak,
it's the argument that takes all the blame.


 In other words, the human effort to make the code behave how we want is
 currently too high and that's the issue I want to address.  I don't know how
 we could make it impossible to have space leaks, although that would be
 interesting.

I opine that aiming for the latter (making space leaks impossible) while
experimenting brings you closer to the actual goal (making them easy to
avoid) when it comes to putting these experiments into practice. But
that's just me.



Regards,
Heinrich Apfelmus

--
http://apfelmus.nfshost.com

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: [darcs-users] Iteratees, streams, and mmap

2009-12-13 Thread Jason Dagit
On Sun, Dec 13, 2009 at 3:47 AM, Heinrich Apfelmus 
apfel...@quantentunnel.de wrote:

 How about tracking the requirement of bounded in the type system? In
 particular, I'm thinking of a type class

class NFData a = Small a

 where the idea is that all types that can be stored in constant space
 are members of this class. For example, we have

instance Small ()
instance Small Int
instance Small Char
instance (Small a, Small b) = Small (a,b)
instance (Small a, Small b) = Small (Either a b)

 but recursive types like  [a]  or  String  are not allowed to be members.


I like this.  It's easy to audit for weird instances of Small.

It seems like we also need:
instance Small (IO ())

Which is not necessarily bounded due to things like IORefs.  See below for
why I think it would be a useful instance.


 Then,

withPending :: Small a = (a - Patch - a) - IO a

 which is based on the function

foldlSmall :: Small b = (b - a - b) - b - [a] - b
foldlSmall f b [] =
foldlSmall f b (x:xs) = foldlSmall f b' xs
where b' = rnf (f b x)

 is pretty much guaranteed to run in O(1) space. Well, that depends on  f
 , but the point is it's not  foldlSmall  who introduces the space leak,
 it's the argument that takes all the blame.


I could also see, us needing this:

bracketSmall :: (Small c) = IO a - (a - IO b) - (a - IO c) - IO c

I'm not sure if b needs to be Small, since it's just supposed to be the
return value of the deallocation step.

Actually, since we didn't (yet) make functions an instance of Small, we may
need to say this type:

bracketFoldSmall :: (Small c) = IO a - (a - IO b) - ((c - a - c) - c
- a - c) - IO c

Here I'm imagining 'a', being something like a list of patches.  It's a big
stream we want to process.  Furthermore, it seems like we probably want c =
IO ().  This would be useful if we wanted to do some output for each patch
in the sequence.  But, as I think about it, it seems like if we allow ST or
IO in these small functions or as instances of Small then we lose our
guarantees.  Yet, it seems that having functions like bracketSmall are
necessary if we want to hide the stream itself from being exposed outside of
the traversal function.  For example, your foldlSmall doesn't leak, but
something at the same level of scope as it could leak the list.

Jason
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: [darcs-users] Iteratees, streams, and mmap

2009-12-12 Thread Jason Dagit
Hi Heinrich,

On Sat, Dec 12, 2009 at 2:42 AM, Heinrich Apfelmus 
apfel...@quantentunnel.de wrote:

 Jason Dagit wrote:
  My next experiment will be to find ways to express take this operation
 and
  apply it to a stream without letting the stream leak. One implication is
  that gzReadFilePS should not be used outside of a core set of modules
 which
  have been auideted to be resource concious.  Another implication is that
 we
  need to be really careful about wether or not we allow returning of
  sequences of patches.  Possibly, we need several foldl-like functions
 that
  open the stream internally.  For example, to process the pending maybe we
  should have:
  withPending :: (a - Patch - a) -  IO a
 
  And withPending would start the streaming and make sure that the stream
  cannot be visible as a data dependency outside of withPending.

 Just a small comment on a potential flaw in this scheme and the
 observation that even the rank-2 type trick from the  ST s  monad
 wouldn't help.


I would say it does help, but it doesn't make it perfect.




 Namely,  withPending  does not guarantee that the stream does not leak,
 it only makes it more natural/convenient to formulate one's code so that
 it doesn't leak. In particular, using  (:)  as argument pretty much
 defeats the whole purpose:


Right.  And the iteratee library points out that your iteratees have to be
well-behaved (I think there they say bounded).  I'm well aware of this
issue and thanks for pointing it out for others who are reading along.



   withPending (flip (:))

 Fortunately, the type system can ensure that the patches don't leak.

   withPending :: (forall s. a - Patch s - a) - IO a

 Now,  a  may not mention  s  and the type checker will reject  flip (:)
  as argument. See also

  Oleg Kiselyov, Chung-chieh Shan.
  Lightweight Monadic Regions.
  
 http://www.cs.rutgers.edu/~ccshan/capability/region-io.pdfhttp://www.cs.rutgers.edu/%7Eccshan/capability/region-io.pdf

 for an elaboration of this technique.


I'm still on the fence as to whether this style of writing it will add value
greater than the complexity it brings.  I am certainly considering it :)
The darcs source does other things that are also fairly complex.




 However, the line between leaking and not leaking is very thin here. As
 soon as we are given for example a function

   name :: Patch s - String

 that discards the  s , its results can leak, in the sense that we
 could now build a list of names

   foo :: IO [String]
   foo = withPending . flip $ (:) . name

 Even worse, any type  a  that doesn't have O(1) space usage will leak

   bar :: IO [()]
   bar = withPending . flip $ const (() :)

 In other words, exporting only a  foldl' -like interface does not really
 prevent us from writing functions that have O(n) instead of O(1) space
 usage. But trying to rectify that with the  forall s  trick is a doomed
 idea, too.


I realize it's not perfect, but the problem we have now is that it's too
easy to write things that have dismal space usage.  If we can't force proper
space usage, how can we make it more natural to have bounded space?  Or at
least a good approximation.

It seems that:
 * foldl'-style helps
 * rank-n can help
 * no approach I've seen *forces* the behavior we want
 * existing code and bug reports demonstrate we need to improve the
situation

I'm open to suggestions on how to ensure the code has the space behavior I
want.  Lazy IO* and streams of patches is more compositional and natural to
Haskell programmers, but it seems that it's too hard to ensure the code has
reasonable space usage.  At least where the darcs source is concerned.
Therefore, I think the status quo demonstrates that in the darcs source it's
worth experimenting with alternatives to lazy io and streams.  In other
words, the human effort to make the code behave how we want is currently too
high and that's the issue I want to address.  I don't know how we could make
it impossible to have space leaks, although that would be interesting.

(*) Note: Lazy IO itself is used in very few places in darcs these days
because it has lead to serious bugs.  These days me point is more about big
streams getting retained.  Finding where and why has proven difficult.

Thanks,
Jason
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: [darcs-users] Iteratees, streams, and mmap

2009-12-12 Thread Johann Höchtl


On Dec 12, 7:13 pm, Jason Dagit da...@codersbase.com wrote:
 Hi Heinrich,

 On Sat, Dec 12, 2009 at 2:42 AM, Heinrich Apfelmus 



 apfel...@quantentunnel.de wrote:
  Jason Dagit wrote:
   My next experiment will be to find ways to express take this operation
  and
   apply it to a stream without letting the stream leak. One implication is
   that gzReadFilePS should not be used outside of a core set of modules
  which
   have been auideted to be resource concious.  Another implication is that
  we
   need to be really careful about wether or not we allow returning of
   sequences of patches.  Possibly, we need several foldl-like functions
  that
   open the stream internally.  For example, to process the pending maybe we
   should have:
   withPending :: (a - Patch - a) -  IO a

   And withPending would start the streaming and make sure that the stream
   cannot be visible as a data dependency outside of withPending.

  Just a small comment on a potential flaw in this scheme and the
  observation that even the rank-2 type trick from the  ST s  monad
  wouldn't help.

 I would say it does help, but it doesn't make it perfect.



  Namely,  withPending  does not guarantee that the stream does not leak,
  it only makes it more natural/convenient to formulate one's code so that
  it doesn't leak. In particular, using  (:)  as argument pretty much
  defeats the whole purpose:

 Right.  And the iteratee library points out that your iteratees have to be
 well-behaved (I think there they say bounded).  I'm well aware of this
 issue and thanks for pointing it out for others who are reading along.





    withPending (flip (:))

  Fortunately, the type system can ensure that the patches don't leak.

    withPending :: (forall s. a - Patch s - a) - IO a

  Now,  a  may not mention  s  and the type checker will reject  flip (:)
   as argument. See also

   Oleg Kiselyov, Chung-chieh Shan.
   Lightweight Monadic Regions.
   http://www.cs.rutgers.edu/~ccshan/capability/region-io.pdfhttp://www.cs.rutgers.edu/%7Eccshan/capability/region-io.pdf

  for an elaboration of this technique.

 I'm still on the fence as to whether this style of writing it will add value
 greater than the complexity it brings.  I am certainly considering it :)
 The darcs source does other things that are also fairly complex.





  However, the line between leaking and not leaking is very thin here. As
  soon as we are given for example a function

    name :: Patch s - String

  that discards the  s , its results can leak, in the sense that we
  could now build a list of names

    foo :: IO [String]
    foo = withPending . flip $ (:) . name

  Even worse, any type  a  that doesn't have O(1) space usage will leak

    bar :: IO [()]
    bar = withPending . flip $ const (() :)

  In other words, exporting only a  foldl' -like interface does not really
  prevent us from writing functions that have O(n) instead of O(1) space
  usage. But trying to rectify that with the  forall s  trick is a doomed
  idea, too.

 I realize it's not perfect, but the problem we have now is that it's too
 easy to write things that have dismal space usage.  If we can't force proper
 space usage, how can we make it more natural to have bounded space?  Or at
 least a good approximation.

 It seems that:
  * foldl'-style helps
  * rank-n can help
  * no approach I've seen *forces* the behavior we want
  * existing code and bug reports demonstrate we need to improve the
 situation

 I'm open to suggestions on how to ensure the code has the space behavior I
 want.  Lazy IO* and streams of patches is more compositional and natural to
 Haskell programmers, but it seems that it's too hard to ensure the code has
 reasonable space usage.  At least where the darcs source is concerned.
 Therefore, I think the status quo demonstrates that in the darcs source it's
 worth experimenting with alternatives to lazy io and streams.  In other
 words, the human effort to make the code behave how we want is currently too
 high and that's the issue I want to address.  I don't know how we could make
 it impossible to have space leaks, although that would be interesting.


As a beginner to Haskell, I am only 1/3 through RWH, those lines scare
me in a sense to question my effort. I simply can not distinguish if
this discussion is somewhat pathological in a sense that every access
to the outside world imposes dangers and an additional exception
handler here and there and an additional if-statement to handle error
return codes will suffice.
Or lazy evaluation, IO monads and the whole story behind
unsafePerformIO was an additional layer of self-deception and
unpredictable effects from the outside world and lazy evaluation can
NEVER be satisfactory handled.

 (*) Note: Lazy IO itself is used in very few places in darcs these days
 because it has lead to serious bugs.  These days me point is more about big
 streams getting retained.  Finding where and why has proven difficult.

 Thanks,
 Jason

 

Re: [Haskell-cafe] Re: [darcs-users] Iteratees, streams, and mmap

2009-12-12 Thread Jason Dagit
On Sat, Dec 12, 2009 at 1:37 PM, Johann Höchtl johann.hoec...@gmail.comwrote:



 On Dec 12, 7:13 pm, Jason Dagit da...@codersbase.com wrote:
  Hi Heinrich,
 
  On Sat, Dec 12, 2009 at 2:42 AM, Heinrich Apfelmus 
 
 
 
  apfel...@quantentunnel.de wrote:
   Jason Dagit wrote:
My next experiment will be to find ways to express take this
 operation
   and
apply it to a stream without letting the stream leak. One
 implication is
that gzReadFilePS should not be used outside of a core set of modules
   which
have been auideted to be resource concious.  Another implication is
 that
   we
need to be really careful about wether or not we allow returning of
sequences of patches.  Possibly, we need several foldl-like functions
   that
open the stream internally.  For example, to process the pending
 maybe we
should have:
withPending :: (a - Patch - a) -  IO a
 
And withPending would start the streaming and make sure that the
 stream
cannot be visible as a data dependency outside of withPending.
 
   Just a small comment on a potential flaw in this scheme and the
   observation that even the rank-2 type trick from the  ST s  monad
   wouldn't help.
 
  I would say it does help, but it doesn't make it perfect.
 
 
 
   Namely,  withPending  does not guarantee that the stream does not leak,
   it only makes it more natural/convenient to formulate one's code so
 that
   it doesn't leak. In particular, using  (:)  as argument pretty much
   defeats the whole purpose:
 
  Right.  And the iteratee library points out that your iteratees have to
 be
  well-behaved (I think there they say bounded).  I'm well aware of this
  issue and thanks for pointing it out for others who are reading along.
 
 
 
 
 
 withPending (flip (:))
 
   Fortunately, the type system can ensure that the patches don't leak.
 
 withPending :: (forall s. a - Patch s - a) - IO a
 
   Now,  a  may not mention  s  and the type checker will reject  flip (:)
as argument. See also
 
Oleg Kiselyov, Chung-chieh Shan.
Lightweight Monadic Regions.

   http://www.cs.rutgers.edu/~ccshan/capability/region-io.pdfhttp://www.cs.rutgers.edu/%7Eccshan/capability/region-io.pdf
 http://www.cs.rutgers.edu/%7Eccshan/capability/region-io.pdf
 
   for an elaboration of this technique.
 
  I'm still on the fence as to whether this style of writing it will add
 value
  greater than the complexity it brings.  I am certainly considering it :)
  The darcs source does other things that are also fairly complex.
 
 
 
 
 
   However, the line between leaking and not leaking is very thin here. As
   soon as we are given for example a function
 
 name :: Patch s - String
 
   that discards the  s , its results can leak, in the sense that we
   could now build a list of names
 
 foo :: IO [String]
 foo = withPending . flip $ (:) . name
 
   Even worse, any type  a  that doesn't have O(1) space usage will leak
 
 bar :: IO [()]
 bar = withPending . flip $ const (() :)
 
   In other words, exporting only a  foldl' -like interface does not
 really
   prevent us from writing functions that have O(n) instead of O(1) space
   usage. But trying to rectify that with the  forall s  trick is a doomed
   idea, too.
 
  I realize it's not perfect, but the problem we have now is that it's too
  easy to write things that have dismal space usage.  If we can't force
 proper
  space usage, how can we make it more natural to have bounded space?  Or
 at
  least a good approximation.
 
  It seems that:
   * foldl'-style helps
   * rank-n can help
   * no approach I've seen *forces* the behavior we want
   * existing code and bug reports demonstrate we need to improve the
  situation
 
  I'm open to suggestions on how to ensure the code has the space behavior
 I
  want.  Lazy IO* and streams of patches is more compositional and natural
 to
  Haskell programmers, but it seems that it's too hard to ensure the code
 has
  reasonable space usage.  At least where the darcs source is concerned.
  Therefore, I think the status quo demonstrates that in the darcs source
 it's
  worth experimenting with alternatives to lazy io and streams.  In other
  words, the human effort to make the code behave how we want is currently
 too
  high and that's the issue I want to address.  I don't know how we could
 make
  it impossible to have space leaks, although that would be interesting.
 

 As a beginner to Haskell, I am only 1/3 through RWH, those lines scare
 me in a sense to question my effort. I simply can not distinguish if
 this discussion is somewhat pathological in a sense that every access
 to the outside world imposes dangers and an additional exception
 handler here and there and an additional if-statement to handle error
 return codes will suffice.


We're not talking about exception handling :)  And yes, Heinrich is talking
about pathological cases.


 Or lazy evaluation, IO monads and the whole story behind
 unsafePerformIO