Re: [Haskell-cafe] Automatic file closing after readFile

2007-10-21 Thread Albert Y. C. Lai

Magnus Therning wrote:

I'll certainly try to look into all of that.  However, I suspect your
suggestion doesn't scale very well.  On my original code it's easy, it
was less than 10 lines, but how do I know where to start looking if it's
a program of 100 lines, or 1000 lines?  The problem could occur in an
updated library that I just use... Well you get the idea :-)


A library function is supposed to tell you its time usage, memory usage, 
file usage, ... generally resource usage, as part of its specification.


A 100-line program is not supposed to be a monolith. It is supposed to 
be a combination of 10 functions (or 10 parts; I'll call them functions 
anyway), 10 lines each. Each function is supposed to come with its 
specification too, which again tells you its resource usage.


To reason about the 100-line program, you only need to reason about 10 
lines of specifications. To reason about a program that calls a library 
function, you only need to plug its specification --- emphatically not 
its code! --- at the call site, and proceed.


(It remains to reason that each 10-line function conforms to its 1-line 
specification, but we agreed that 10 lines are ok. It also remains to 
reason that the library function conforms to its documented 
specification, but that is the author's job, and again the author can 
apply the same divide-and-conquer to stay tractable.)


To reason about a 1000-line program, again it is not supposed to be a 
monolith. It is supposed to be a combination of 10 functions, each 100 
lines. The 1000-line program is 10 lines of specifications combined. We 
already know how to deal with each 100-line function... You get the idea.


Divide-and-conquer. Abstraction. Modularization. Separation of Concerns. 
 That is how reasoning about programs scales. That is how writing 
programs in the first place scales. That is how anything at all scales.


Some suggested musings:

Rome is not built in one day or whatever the proverb's wording is.

Intelligent Design vs Random Mutation.

Is your program invented or discovered?

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Automatic file closing after readFile

2007-10-21 Thread Duncan Coutts
On Sun, 2007-10-21 at 17:15 -0400, Albert Y. C. Lai wrote:
 Magnus Therning wrote:
  I'll certainly try to look into all of that.  However, I suspect your
  suggestion doesn't scale very well.  On my original code it's easy, it
  was less than 10 lines, but how do I know where to start looking if it's
  a program of 100 lines, or 1000 lines?  The problem could occur in an
  updated library that I just use... Well you get the idea :-)
 
 A library function is supposed to tell you its time usage, memory usage, 
 file usage, ... generally resource usage, as part of its specification.
 
 A 100-line program is not supposed to be a monolith. It is supposed to 
 be a combination of 10 functions (or 10 parts; I'll call them functions 
 anyway), 10 lines each. Each function is supposed to come with its 
 specification too, which again tells you its resource usage.
 
 To reason about the 100-line program, you only need to reason about 10 
 lines of specifications.

I'm not sure what semantics we would use to reason about resource use in
specifications like this. Our standard semantics abstract over space,
time and sharing properties of our programs.

For a lazy language, resource specifications of functions do not compose
in a simple way. For example we might naively say that [1..m] uses m
time and space and that take n takes at most n time and space but then
take n [1..m] does not take the sum of these two time/space
specifications. In more complex examples the connection is even less
obvious.

One more accurate way to look at resource use is to say that we only
consider time and space to reduce to WHNF and then ask that question
when we apply various evaluation functions to the expression. Different
evaluation functions would force various parts of the value. Then when
we plug an expression into different contexts we see what kind of
evaluation function that context is and use that in our question about
the resource use to evaluate the expression. Still, that only gives you
total resource use, not maximum resource use at any point during
evaluation which is important for space.

Summary: it's not so simple.

Duncan
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Automatic file closing after readFile

2007-10-19 Thread Magnus Therning
On Fri, Oct 19, 2007 at 02:09:01 +1000, Matthew Brecknell wrote:
Magnus Therning:
 Just out of curiosity, how would I go about finding this myself?
 (Ideally it'd be an answer other than read the source for the libraries
 you are using. :-)

Well, I can at least try to expand a little on read the source. :-)

You'll first need a solid understanding of lazy evaluation in the
context of pure computations. Read about normal evaluation order, WHNF
(weak head normal form), and which contexts force WHNF. Use pen and
paper to manually derive the evaluation order for some pure
computations of your choice (folds over infinite lists would be a good
start).  Experiment with seq. Experiment with the various causes of
stack overflows. Next, understand that while the IO monad is quite
strict about sequencing actions, its return is not strict in its
argument.  Observe that the combinators in Control.Monad generally do
not force returned computations. Browse the source of GHC's IO library
to understand how it sequences IO actions. Read the source for
unsafePerformIO and unsafeInterleaveIO to understand how and for what
purposes they allow you to break that sequencing. Next, read the source
for hGetContents. After all that, you should be a little better
equipped to diagnose problems with lazy IO!

I'll certainly try to look into all of that.  However, I suspect your
suggestion doesn't scale very well.  On my original code it's easy, it
was less than 10 lines, but how do I know where to start looking if it's
a program of 100 lines, or 1000 lines?  The problem could occur in an
updated library that I just use... Well you get the idea :-)

I can use profiling to find where time and space is spent.  But what
about other finite resources that my program uses, such as file handles?

/M

-- 
Magnus Therning (OpenPGP: 0xAB4DFBA4)
magnus@therning.org Jabber: magnus.therning@gmail.com
http://therning.org/magnus


signature.asc
Description: Digital signature
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Automatic file closing after readFile

2007-10-19 Thread Jules Bean
I agree with Matthew's comments in the post immediately before this. It 
takes him two decent paragraphs to explain what is going on, including a 
description of WHNF, a suggestion to use pen  paper, a suggestion to 
read up on the semantics of unsafeInterleaveIO and more.


What I find inconceivable is that people can really believe that these 
unsafe-lazy functions are a sensible default IO API, given that to use 
them safely[*] you need to understand all those details. Are we claiming 
that anyone who doesn't understand the finer points of WHNF, forcing, 
and GHC's evaluation strategy should not use the haskell IO libraries?


[*] Of course, you can use them safely in certain circumstances. But 
then that doesn't scale. Because your 'first program' works, and then 
you scale up, and then it breaks, and you find yourself quite unequipped 
to work out why. Exactly the experience of the original poster in this 
thread.


readFile is actually a worse culprit than hGetContents. If you use 
hGetContents, at least you have a handle around which you can close 
explicitly with hClose, which solves the serious OS resource leak (and 
you can use a bracketing style to do it 'automatically'). Then just 
remains the asynchronous exception 'hidden bottoms' problem.


I suggest the following two versions of readFile as preferable:

readFile :: FilePath - IO String
readFile f = B.unpack . B.readFile f
-- use strict bytestrings to read the file strictly, but into a
-- compact memory structure. Then unpack lazily into a conventional
-- string but with some luck this intermediate list might fuse away.

readFile :: FilePath - (String - IO a) - IO a
readFile f act = bracket (openFile f ReadMode)
 (hClose)
 (hGetContents h = act)

-- this version still uses lazy IO inside, which I don't really like,
-- but at least it (a) doesn't open the file unless it actually gets run
-- and (b) guarantees to close it again afterwards

Jules

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Automatic file closing after readFile

2007-10-18 Thread Matthew Brecknell
Magnus Therning:
 hasEmpty s = let
 _first_empty = s !! 0 == '\n'
 _last_empty = (reverse s) !! 1 == '\n'
 in _first_empty || _last_empty
 
 loadAndCheck fp = liftM hasEmpty $ readFile fp
 
 main = getArgs = filterM loadAndCheck = mapM_ putStrLn
 The one problem I had was that running this on /all/ files in my WP
 resulted in an exception:
 
 *** Exception: ./wp-includes/images/smilies/icon_mrgreen.gif:
 openFile: resource exhausted (Too many open files)
 
 Is there some (easy) way to avoid this while still using readFile?

The perils of lazy IO: readFile closes its file descriptor only if it is
is forced to read to end of file. Otherwise, you rely on the garbage
collector to run the finaliser on the file handle sometime after it
becomes unreachable, but since file handle exhaustion does not trigger
garbage collection, there are no guarantees. See also the System.IO
documentation.

For an extremely hackish (and untested) solution to your problem,
replace the expression (_first_empty || _last_empty) with (_last_empty
|| _first_empty). If you can explain why that works, then you advance to
the next level in your training!

For a less hackish solution, you need to do a bit more work. Again, this
is untested.

 loadAndCheck fn = bracket (openFile fn ReadMode) hClose checkContents
 checkContents fh = do { s - hGetContents fh; return $! hasEmpty s }

Note the explicit close, and return $! to ensure that hasEmpty has
done its work before the file is closed.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Automatic file closing after readFile

2007-10-18 Thread Jules Bean

Is there some (easy) way to avoid this while still using readFile?


readFile' f = do s - readFile f
 return (length s `seq` s)

(and curse the fact that the default readFile is unsafelazy).

Jules
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Automatic file closing after readFile

2007-10-18 Thread Magnus Therning
On Thu, Oct 18, 2007 at 12:05:40 +0100, Jules Bean wrote:
Is there some (easy) way to avoid this while still using readFile?

readFile' f = do s - readFile f
return (length s `seq` s)

(and curse the fact that the default readFile is unsafelazy).

:( Doesn't work.  I'm starting to suspect there is no automatic close of
opened files even after they've been completely read.

/M

-- 
Magnus Therning (OpenPGP: 0xAB4DFBA4)
magnus@therning.org Jabber: magnus.therning@gmail.com
http://therning.org/magnus


pgpB34xnOe4q7.pgp
Description: PGP signature
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Automatic file closing after readFile

2007-10-18 Thread Magnus Therning
On Thu, Oct 18, 2007 at 20:58:45 +1000, Matthew Brecknell wrote:
Magnus Therning:
 hasEmpty s = let
 _first_empty = s !! 0 == '\n'
 _last_empty = (reverse s) !! 1 == '\n'
 in _first_empty || _last_empty
 
 loadAndCheck fp = liftM hasEmpty $ readFile fp
 
 main = getArgs = filterM loadAndCheck = mapM_ putStrLn
 The one problem I had was that running this on /all/ files in my WP
 resulted in an exception:
 
 *** Exception: ./wp-includes/images/smilies/icon_mrgreen.gif:
 openFile: resource exhausted (Too many open files)
 
 Is there some (easy) way to avoid this while still using readFile?

The perils of lazy IO: readFile closes its file descriptor only if it
is is forced to read to end of file. Otherwise, you rely on the garbage
collector to run the finaliser on the file handle sometime after it
becomes unreachable, but since file handle exhaustion does not trigger
garbage collection, there are no guarantees. See also the System.IO
documentation.

For an extremely hackish (and untested) solution to your problem,
replace the expression (_first_empty || _last_empty) with (_last_empty
|| _first_empty). If you can explain why that works, then you advance
to the next level in your training!

I can see how that could have worked.  Lazy thinking indeed :-)  I
wonder if the fact that it doesn't work should be considered a bug?

Just to make sure I changed (broke) my code so that the very last
character would be read (in case there is an almost crazy level of
optimisation in ghc):

 _last_empty = (reverse s) !! 0 == '\n'

Still no cigar :(

For a less hackish solution, you need to do a bit more work. Again, this
is untested.

 loadAndCheck fn = bracket (openFile fn ReadMode) hClose checkContents
 checkContents fh = do { s - hGetContents fh; return $! hasEmpty s }

Note the explicit close, and return $! to ensure that hasEmpty has
done its work before the file is closed.

Ah, `bracket` is a handy function to know!

And it does work!

/M

-- 
Magnus Therning (OpenPGP: 0xAB4DFBA4)
magnus@therning.org Jabber: magnus.therning@gmail.com
http://therning.org/magnus


pgpAG1ZpHktr7.pgp
Description: PGP signature
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Automatic file closing after readFile

2007-10-18 Thread Matthew Brecknell
Magnus Therning:
 Still no cigar :(

Yes, this is a little more subtle than I first thought. Look at liftM
and filterM:

liftM f m1 = do { x1 - m1; return (f x1) }

filterM :: (Monad m) = (a - m Bool) - [a] - m [a]
filterM _ [] =  return []
filterM p (x:xs) = do
   flg - p x
   ys - filterM p xs
   return (if flg then x:ys else ys)

In liftM, the result of (f x1) is not forced, and in filterM, flg is not
tested until after xs is traversed. The result is that when filterM runs
the (p x) action, a file is opened, but hasEmpty (and thus readFile) is
not forced until all other files have likewise been opened.

It should suffice to use a more strict version of liftM:

liftM' f m1 = do { x1 - m1; return $! f x1 }

That should also fix the problem with Jules' solution, or alternatively:

readFile' f = do s - readFile f
 return $! (length s `seq` s)


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Automatic file closing after readFile

2007-10-18 Thread Jules Bean

Magnus Therning wrote:

On Thu, Oct 18, 2007 at 12:05:40 +0100, Jules Bean wrote:

Is there some (easy) way to avoid this while still using readFile?

readFile' f = do s - readFile f
 return (length s `seq` s)

(and curse the fact that the default readFile is unsafelazy).


:( Doesn't work.  I'm starting to suspect there is no automatic close of
opened files even after they've been completely read.


That's odd. Sounds like a bug? (in readFile, I mean)
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Automatic file closing after readFile

2007-10-18 Thread David Roundy
On Thu, Oct 18, 2007 at 01:16:37PM +0100, Magnus Therning wrote:
 On Thu, Oct 18, 2007 at 20:58:45 +1000, Matthew Brecknell wrote:
 For a less hackish solution, you need to do a bit more work. Again, this
 is untested.
 
  loadAndCheck fn = bracket (openFile fn ReadMode) hClose checkContents
  checkContents fh = do { s - hGetContents fh; return $! hasEmpty s }
 
 Note the explicit close, and return $! to ensure that hasEmpty has
 done its work before the file is closed.
 
 Ah, `bracket` is a handy function to know!

Note that you almost never want to use the bracket defined in Haskell98,
nor the one in IO.Error.  You want the bracket defined in
Control.Exception.
-- 
David Roundy
Department of Physics
Oregon State University


signature.asc
Description: Digital signature
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Automatic file closing after readFile

2007-10-18 Thread Magnus Therning
On Thu, Oct 18, 2007 at 09:25:32 -0400, David Roundy wrote:
On Thu, Oct 18, 2007 at 01:16:37PM +0100, Magnus Therning wrote:
 On Thu, Oct 18, 2007 at 20:58:45 +1000, Matthew Brecknell wrote:
 For a less hackish solution, you need to do a bit more work. Again, this
 is untested.
 
  loadAndCheck fn = bracket (openFile fn ReadMode) hClose checkContents
  checkContents fh = do { s - hGetContents fh; return $! hasEmpty s }
 
 Note the explicit close, and return $! to ensure that hasEmpty has
 done its work before the file is closed.
 
 Ah, `bracket` is a handy function to know!

Note that you almost never want to use the bracket defined in Haskell98,
nor the one in IO.Error.  You want the bracket defined in
Control.Exception.

Yes, thankfully that's documented in the Haddock for the only one that
turns up on Hoogle.

/M

-- 
Magnus Therning (OpenPGP: 0xAB4DFBA4)
magnus@therning.org Jabber: magnus.therning@gmail.com
http://therning.org/magnus


pgpS5kB9LvJaR.pgp
Description: PGP signature
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Automatic file closing after readFile

2007-10-18 Thread Magnus Therning
On Thu, Oct 18, 2007 at 22:58:48 +1000, Matthew Brecknell wrote:
Magnus Therning:
 Still no cigar :(

Yes, this is a little more subtle than I first thought. Look at liftM
and filterM:

liftM f m1 = do { x1 - m1; return (f x1) }

filterM :: (Monad m) = (a - m Bool) - [a] - m [a]
filterM _ [] =  return []
filterM p (x:xs) = do
   flg - p x
   ys - filterM p xs
   return (if flg then x:ys else ys)

In liftM, the result of (f x1) is not forced, and in filterM, flg is
not tested until after xs is traversed. The result is that when filterM
runs the (p x) action, a file is opened, but hasEmpty (and thus
readFile) is not forced until all other files have likewise been
opened.

It should suffice to use a more strict version of liftM:

liftM' f m1 = do { x1 - m1; return $! f x1 }

That did indeed fix it.

Just out of curiosity, how would I go about finding this myself?
(Ideally it'd be an answer other than read the source for the libraries
you are using. :-)

/M

-- 
Magnus Therning (OpenPGP: 0xAB4DFBA4)
magnus@therning.org Jabber: magnus.therning@gmail.com
http://therning.org/magnus


pgpJO22GUwQ7M.pgp
Description: PGP signature
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Automatic file closing after readFile

2007-10-18 Thread Magnus Therning
On Thu, Oct 18, 2007 at 22:58:48 +1000, Matthew Brecknell wrote:
Magnus Therning:
 Still no cigar :(

Yes, this is a little more subtle than I first thought. Look at liftM
and filterM:

liftM f m1 = do { x1 - m1; return (f x1) }

filterM :: (Monad m) = (a - m Bool) - [a] - m [a]
filterM _ [] =  return []
filterM p (x:xs) = do
   flg - p x
   ys - filterM p xs
   return (if flg then x:ys else ys)

In liftM, the result of (f x1) is not forced, and in filterM, flg is
not tested until after xs is traversed. The result is that when filterM
runs the (p x) action, a file is opened, but hasEmpty (and thus
readFile) is not forced until all other files have likewise been
opened.

It should suffice to use a more strict version of liftM:

liftM' f m1 = do { x1 - m1; return $! f x1 }

That should also fix the problem with Jules' solution, or alternatively:

readFile' f = do s - readFile f
 return $! (length s `seq` s)

Another question that came up when talking to a (much more clever)
colleague was whether the introduction of either of the solutions in
fact means that only a single file is open at any time?

/M

(I really miss IRC connectivity at work at times like this. :-)

-- 
Magnus Therning (OpenPGP: 0xAB4DFBA4)
magnus@therning.org Jabber: magnus.therning@gmail.com
http://therning.org/magnus


pgpAodV26Gxel.pgp
Description: PGP signature
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Automatic file closing after readFile

2007-10-18 Thread Matthew Brecknell
Magnus Therning:
 Just out of curiosity, how would I go about finding this myself?
 (Ideally it'd be an answer other than read the source for the libraries
 you are using. :-)

Well, I can at least try to expand a little on read the source. :-)

You'll first need a solid understanding of lazy evaluation in the
context of pure computations. Read about normal evaluation order, WHNF
(weak head normal form), and which contexts force WHNF. Use pen and
paper to manually derive the evaluation order for some pure computations
of your choice (folds over infinite lists would be a good start).
Experiment with seq. Experiment with the various causes of stack
overflows. Next, understand that while the IO monad is quite strict
about sequencing actions, its return is not strict in its argument.
Observe that the combinators in Control.Monad generally do not force
returned computations. Browse the source of GHC's IO library to
understand how it sequences IO actions. Read the source for
unsafePerformIO and unsafeInterleaveIO to understand how and for what
purposes they allow you to break that sequencing. Next, read the source
for hGetContents. After all that, you should be a little better equipped
to diagnose problems with lazy IO!

Magnus:
 Another question that came up when talking to a (much more clever)
 colleague was whether the introduction of either of the solutions in
 fact means that only a single file is open at any time?

You program is single-threaded, so it's probably safe to conclude that
either:

a) it has only one file open at a time, or
b) it has all the files open at once, or
c) leaked handles are being garbage collected or are insufficient to
cause exhaustion.

Perhaps you can use your newly discovered understanding of lazy IO to
select the correct answer. :-)

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Automatic file closing after readFile

2007-10-18 Thread Don Stewart
magnus:
 On Thu, Oct 18, 2007 at 12:05:40 +0100, Jules Bean wrote:
 Is there some (easy) way to avoid this while still using readFile?
 
 readFile' f = do s - readFile f
   return (length s `seq` s)
 
 (and curse the fact that the default readFile is unsafelazy).
 
 :( Doesn't work.  I'm starting to suspect there is no automatic close of
 opened files even after they've been completely read.
 

I'd also suggest using strict io here, either vi:

readFile' f = do s - readFile f
 length s `seq` return s -- n.b Jules :)

or

Data.ByteString.readFile 

-- Don
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe