Re: [Haskell-cafe] puzzling memory leak? (GHC)

2005-10-11 Thread Young Hyun

On Oct 9, 2005, at 11:32 PM, Ben Lippmeier wrote:



This sounds like a text-book space-leak caused by lazy evaluation.

In a lazy language, function evaluation is driven by the need to  
perform IO. Because the first version of your program never prints  
the parsed structure, the parser doesn't completely parse it. This  
implies that the system needs to hang onto all the input data (as  
well as all the partially evaluated calls to the parser) incase it  
needs it later on.




I naively believed (or hoped) that, so long as I the programmer took  
some care, the evaluator would be smart enough to discard unaccessed  
computations and values and thus avoid a space leak.  Hence the thing  
I worried about was explicit references to data structures (say, the  
head of very long lists) that prevent objects from being reclaimed.   
Now it seems I need to also worry about the hidden aspects of the  
internal implementation of lazy evaluation.


What intuition should be my guide about the proper way of employing  
lazy evaluation?  It's not yet clear to me what you mean by the  
system needs to hang onto all the input data   It seems  
counterintuitive for the evaluator to hang onto partially evaluated  
functions and input data when it can be proven (through static  
analysis) that references to that data can no longer exist in the  
remainder of the program execution; for example, consider


  case head $ drop 50 $ parseArts ... of
   Right x - hPutArts stderr x

in which the first 500,000 elements in the result of parseArts can't  
ever be referenced after 'drop' executes, and yet the space leak  
still happens (presumably while trying to skip over those 500,000  
elements).  Have I understood you correctly that this is the  
unfortunate behavior of lazy evaluation?



The default string representation is Haskell is pretty abysmal, and  
having it use hundreds of megs to represent, say a 10 meg file is  
not too surprising.




Each file I process is 70MB, so inefficient representation could be a  
problem if input data is buffered.



My advice is that if you don't want to fool around with it, just  
swallow hard, then change fixArts to do a hPutArts to /dev/null.


Either that or
  1) go read about the DeepSeq library.
  2) add strictness annotations to your structure definition.



Thanks for the advice; I'll look into both.

 --Young

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


RE: [Haskell-cafe] puzzling memory leak? (GHC)

2005-10-11 Thread Simon Marlow
On 11 October 2005 07:34, Young Hyun wrote:

 What intuition should be my guide about the proper way of employing
 lazy evaluation?  It's not yet clear to me what you mean by the
 system needs to hang onto all the input data   It seems
 counterintuitive for the evaluator to hang onto partially evaluated
 functions and input data when it can be proven (through static
 analysis) that references to that data can no longer exist in the
 remainder of the program execution; for example, consider
 
case head $ drop 50 $ parseArts ... of
 Right x - hPutArts stderr x

This example should not have a space leak (try it).

Space leaks arise when large unevaluated expressions build up, or there
are unevaluated expressions that refer to large amounts of data.  The
solution is usually to insert some strictness, to force evaluation of
the offending expressions, assuming the evaluated representation is
smaller than the unevaluated one.

I suggest you try out space profiling (with GHC or nhc98), and try
various modifications to your code to get a handle on what's happening.

It's hard to tell exactly where the space leak is in your original code,
because it depends on the actual definition of parseArts.  Ben's
analysis is definitely plausible.  Take a look at the code and ask
yourself: does each element of the list refer to previous elements, or
to an expression accumulated while traversing the list?  If this is the
case, and you never actually evaluate the head of the list, then the
entire list will be retained.

A good way to begin to understand space leaks is to study foldl (see
many previous threads on this list).

Cheers,
Simon
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] puzzling memory leak? (GHC)

2005-10-11 Thread Udo Stenzel
Young Hyun wrote:
 Here's what the program does.  It reads a binary file containing a  
 large number of varying-length structured records.  [...]
 
 Unless I'm badly mistaken, nothing about fixArts suggests that it  
 would leak memory.  So, it would appear parseArts is somehow to blame  
 for the memory leak, but this is where it gets weird.  If I just  
 slightly change fixArts (see below), then I no longer get a memory  
 leak (the process size stays at just over 3MB):
 
 fixArts ((Right x):xs) = do hPutArts stderr x
 fixArts xs

Most likely you think, your parser is building small datat structures,
when in reality it builds large and complicated suspended computations,
that never run and just take up space.  As soon as you demand their
result (hPutArts does this), they run and free up the space.

GHC itself cannot help you much.  Maybe it *could* determine that all
these results are really demanded eventually  and it won't hurt to
compute them earlier, but in practice this is too hard to do.  You'll
need to give some hints.  You should change the fields in your record to
strict types (!Integer instead of Integer) and you should force
evaluation of these structures by placing a `seq' at the appropriate
place.  Unfortunately, finding this place is not easy.  Probably the
best way is to simulate lazy evaluation in your head and see where it
goes wrong.  


 parseArts (x:xs) ... = (createArts x) : parseArts xs

In this code the result of createArts is not demanded and simply put
into a potentially large data structure.  It might help to change it to
the following:

parseArts (x:xs) ... = ((:) $! createArts x) parseArts xs


Udo.
-- 
mitten in einer Diskussion über den Evil Mangler in GHC 5.04:
shapr the evil mangler uses *perl* ??
ChilliX yes...
ChilliX it is Evil after all


signature.asc
Description: Digital signature
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] puzzling memory leak? (GHC)

2005-10-11 Thread David Roundy
Hi Young,

Just to give a somewhat different perspective, sometimes increasing
laziness is helpful for avoiding space problems--not really so much space
leaks as space wastage.

In this case, you probably can't afford to hold contents in memory at any
given time.  So you need to be certain both that as it is consumed it is no
longer needed (which is reflected in the strictness suggestions given by
others), but you also (most likely) don't want to hold the entire list of
Arts in memory either, since that'll also take a huge amount of memory.
That requires that parseArts generates its output lazily and that fixArts
consume it properly.  Both of those appear to be the case from the code you
outlined, but I figured I'd point out that strictness in the wrong
circumstances can be as bad as laziness for your memory usage.

Assuming the following is correct...

 parseArts (x:xs) ... = (createArts x) : parseArts xs

then it would be worth checking that 

 main = do
contents - getContents
fixArts [head $ parseArts contents [] (Map.singleton offset 0)]

takes very little memory.  If this takes a large amount of memory, then I
think you may have a problem with parseArts being insufficiently lazy.
-- 
David Roundy
http://www.darcs.net
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Papers from the 2005 Haskell Workshop (Tallinn)?

2005-10-11 Thread John Meacham
On Wed, Oct 05, 2005 at 10:53:01PM +0200, Nils Anders Danielsson wrote:
 Most authors do put their papers on their web pages nowadays.
 
 On a side note, it is a little strange that the research community
 does the research, writes and typesets the papers, and does most (?)
 of the arrangements for the conferences, and still someone else gets
 the copyright. University libraries have to pay lots of money for
 access to publications. I may have missed some term in the equation,
 though.

knuth wrote an open letter on just this subject that was very
interesting..
 http://www-cs-faculty.stanford.edu/~knuth/joalet.pdf


I wonder if it would be okay to make a meta-web page that points to
each authors homepage where they have their papers for each person that
presented at a conference.

I certainly think we should somehow centralize an index to papers on
haskell. I have found it extremely difficult to track down papers for
authors that have since moved out of academia or have passed on and
don't have their personal homepages with their papers anymore.

John

-- 
John Meacham - ⑆repetae.net⑆john⑈ 
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Papers from the 2005 Haskell Workshop (Tallinn)?

2005-10-11 Thread Josef Svenningsson
On 10/12/05, John Meacham [EMAIL PROTECTED] wrote:
I certainly think we should somehow centralize an index to papers onhaskell. I have found it extremely difficult to track down papers forauthors that have since moved out of academia or have passed on anddon't have their personal homepages with their papers anymore.

The have been efforts to try to do this before. Here is an example:
http://haskell.readscheme.org/
It seems that this site hasn't been updated properly. But it might make the starting point of a new attempt.

Cheers,

/Josef 

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe