Re: [Haskell-cafe] Re: Lazy IO and closing of file handles

2007-03-21 Thread Claus Reinke

[trigger garbage collection when open runs out of free file descriptors, then 
try again]

so, instead of documenting limitations and workarounds, this issue should be
fixed in GHC as well.


This may help in some cases but it cannot be relied upon. Finalizers are
always run in a separate thread (must be, see
http://www.hpl.hp.com/techreports/2002/HPL-2002-335.html). Thus, even if
you force a GC when handles are exhausted, as hugs seems to do, there is no
guarantee that by the time the GC is done the finalizers have freed any
handles (assuming that the GC run really detects any handles to be
garbage).


useful reference to collect!-) but even that mentions giving back os resources 
such
as file descriptors as one of the simpler cases. running the GC/finalizers 
sequence
repeatedly until nothing more changes might be worth thinking about, as are 
possible
race conditions. here is the thread the paper is refering to as one of its 
origins:

   http://gcc.gnu.org/ml/java/2001-12/msg00113.html
   http://gcc.gnu.org/ml/java/2001-12/msg00390.html

i also like the idea mentioned as one of the alternatives in 3.1, where the 
finalizer does
not notify the object that is to become garbage, but a different manager 
object. in this
case, one might notify the i/o handler, and that could take care of avoiding 
trouble.

in my opinion, if my code or my finalizers hold on to resources i'd like to see 
freed,
then i'm responsible, even if i might need language help to remedy the 
situation.
but if i take care to avoid such references, and the system still runs out of 
resources
just because it can't be bothered to check right now whether it has some left 
to free,
there is nothing i can do about it (apart from complaining, that is!-).

of course, this isn't new. see, for instance, this thread view:
http://groups.google.com/group/fa.haskell/browse_thread/thread/2f1f855c8ba33a5/74d32070dbcc92fc?lnk=stq=hugs+openFile+file+descriptor+garbage+collectionrnum=1#74d32070dbcc92fc

where Remi Turk points out System.Mem.performGC, and Simon Marlow
agrees that GHC should do more to free file descriptors, but also mentions that
performGC doesn't run finalizers.

actually, if i have readFile-based code that immediately processes the file 
contents
before the next readFile, as in Matthew's test code, my ghci (on windows) 
doesn't
seem to run out of file descriptors easily, but if i force a descriptor leak by 
leaving
unreferenced contents unprocessed, then performGC does seem to help (not that
this is ideal in general, as discussed in the thread above):

   import System.Environment
   import System.Mem
   import System.IO

   main = do
 n:f:_ - getArgs
 (sequence (repeat (openFile f ReadMode))  return ()) `catch` (\_-return 
())
 test1 (take (read n) $ repeat f)

   test1 files = mapM_ doStuff files where
 doStuff f = {- performGC  -} readFile f = print.map length.take 
10.lines

interestingly, if i do that, even Hugs seems to need the performGC?

claus

ps. one could even try to go further, and have virtual file descriptors, like 
virtual
   memory. but that is something for the os, i guess.


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: Lazy IO and closing of file handles

2007-03-19 Thread Bryan O'Sullivan

Pete Kazmier wrote:


I understand the intent of this code, but I am having a hard time
understanding the implementation, specifically the combination of
'fix', 'flip', and 'interate'.  I looked up 'fix' and I'm unsure how
one can call 'flip' on a function that takes one argument.


If you look at the code, that's not really what's happening.  See the 
embedded anonymous function below?


  flip fix accum $
 \iterate accum - do
   ...

It's a function of two arguments.  All flip is doing is switching the 
order of the arguments to fix, in this case for readability.  If you 
were to get rid of the flip, you'd need to remove the accum after 
fix and move it after the lambda expression, which would make the 
expression much uglier to write and read.  So all the flip is doing 
here is tidying up the code.


(If you're still confused, look at the difference between forM and mapM. 
 The only reason forM exists is readability when you have - in terms of 
the amount of screen space they consume - a big function and a small 
piece of data, just as here.)


As to why it's okay to call flip on fix at all, look at the types 
involved.


fix :: (a - a) - a
flip :: (a - b - c) - b - a - c

By substitution:

flip fix :: a - ((a - b) - a - b) - b

In the case above, accum has type a, and the lambda has type
(a - IO a) - a - IO a, and these fit nicely into the type expected by 
flip fix.


b
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: Lazy IO and closing of file handles

2007-03-19 Thread Lennart Augustsson

Here's what happens:
fix has type (x-x)-x
and that has to match the first argument to flip, namely 'a-b-c'.
The only chance of that is if x is actually a function type.
Pick x=b-c, now we have
fix has type ((b-c)-b-c)-b-c
and it matches a-b-c if a=(b-c)-b-c

Flip returns b-a-c, and if we substitute we get
b-((b-c)-b-c)-c

If you rename the variables you get the suggested type.

-- Lennart


On Mar 19, 2007, at 20:35 , Pete Kazmier wrote:


Bryan O'Sullivan [EMAIL PROTECTED] writes:


Pete Kazmier wrote:


I understand the intent of this code, but I am having a hard time
understanding the implementation, specifically the combination of
'fix', 'flip', and 'interate'.  I looked up 'fix' and I'm unsure how
one can call 'flip' on a function that takes one argument.



As to why it's okay to call flip on fix at all, look at the types
involved.

fix :: (a - a) - a
flip :: (a - b - c) - b - a - c

By substitution:

flip fix :: a - ((a - b) - a - b) - b


Sadly, I'm still confused.  I understand how 'flip' works in the case
where its argument is a function that takes two arguments.  I've
started to use this in my own code lately.  But my brain refuses to
understand how 'flip' is applied to 'fix', a function that takes one
argument only, which happens to be a function itself.  What is 'flip'
flipping when the function passed to it only takes one argument?

Thanks,
Pete

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: Lazy IO and closing of file handles

2007-03-19 Thread Isaac Dupree
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Pete Kazmier wrote:
 Bryan O'Sullivan [EMAIL PROTECTED] writes:
 
 Pete Kazmier wrote:

 I understand the intent of this code, but I am having a hard time
 understanding the implementation, specifically the combination of
 'fix', 'flip', and 'interate'.  I looked up 'fix' and I'm unsure how
 one can call 'flip' on a function that takes one argument.
 
 As to why it's okay to call flip on fix at all, look at the types
 involved.

 fix :: (a - a) - a
 flip :: (a - b - c) - b - a - c

 By substitution:

 flip fix :: a - ((a - b) - a - b) - b
 
 Sadly, I'm still confused.  I understand how 'flip' works in the case
 where its argument is a function that takes two arguments.  I've
 started to use this in my own code lately.  But my brain refuses to
 understand how 'flip' is applied to 'fix', a function that takes one
 argument only, which happens to be a function itself.  What is 'flip'
 flipping when the function passed to it only takes one argument?

fix :: (a - a) - a
In this case, we know something about 'a': it is a function (b - c).
Substitute:
fix :: ((b - c) - (b - c)) - (b - c)
Take advantage of the right-associativity of (-)
fix :: ((b - c) - b - c) - b - c
Now it looks like a function of two arguments, because the return value
(normally ordinary data) can in fact, in this case, take arguments.

Here's another example of that:

data Box a = Box a
get (Box a) = a
- -- get (Box 1) :: Int
- -- get (Box (\a - a)) :: Int - Int
- -- (get (Box (\a - a))) 1 :: Int
 --function application is left-associative:
- -- get (Box (\a - a)) 1 :: Int
- -- flip get 1 (Box (\a - a)) :: Int

Yes, it sometimes confuses me too.

Isaac
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.3 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFF/vcXHgcxvIWYTTURAj5RAKCUMeAF0vosJ6ROAVlBIDHsEq/vzgCfflnR
50BmW6tuAF6mKXBtrlHdQ5Y=
=uv3G
-END PGP SIGNATURE-
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: Lazy IO and closing of file handles

2007-03-14 Thread Donald Bruce Stewart
pete-expires-20070513:
 [EMAIL PROTECTED] (Donald Bruce Stewart) writes:
 
  pete-expires-20070513:
  When using readFile to process a large number of files, I am exceeding
  the resource limits for the maximum number of open file descriptors on
  my system.  How can I enhance my program to deal with this situation
  without making significant changes?
 
  Read in data strictly, and there are two obvious ways to do that:
 
  -- Via strings:
 
  readFileStrict f = do
  s - readFile f
  length s `seq` return s
 
  -- Via ByteStrings
  readFileStrict  = Data.ByteString.readFile
  readFileStrictString  = liftM Data.ByteString.unpack 
  Data.ByteString.readFile
 
  If you're reading more than say, 100k of data, I'd use strict
  ByteStrings without hesitation. More than 10M, and I'd use lazy
  bytestrings.
 
 Correct me if I'm wrong, but isn't this exactly what I wanted to
 avoid?  Reading the entire file into memory?  In my previous email, I
 was trying to state that I wanted to lazily read the file because some
 of the files are quite large and there is no reason to read beyond the
 small set of headers.  If I read the entire file into memory, this
 design goal is no longer met.
 
 Nevertheless, I was benchmarking with ByteStrings (both lazy and
 strict), and in both cases, the ByteString versions of readFile yield
 the same error regarding max open files.  Incidentally, the lazy
 bytestring version of my program was by far the fastest and used the
 least amount of memory, but it still crapped out regarding max open
 files. 
 
 So I'm back to square one.  Any other ideas?

Hmm. Ok. So we need to have more hClose's happen somehow. Can you
process files one at a time?

-- Don
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: Lazy IO and closing of file handles

2007-03-14 Thread Dougal Stanton
Quoth Pete Kazmier, nevermore,
 the same error regarding max open files.  Incidentally, the lazy
 bytestring version of my program was by far the fastest and used the
 least amount of memory, but it still crapped out regarding max open
 files. 

I've tried the approach you appear to be using and it can be tricky
to predict how the laziness will interact with the list of actions.

For example, I tried to download a temporary file, read a bit of data
out of it and then download another one. I thought I would save thinking
and use the same file name for each download: /tmp/feed.xml. What
happened was that it downloaded them all in rapid succession,
over-writing each one with the next and not actually reading the data
until the end. So I ended up parsing N identical copies of the final
file, instead of one of each.

You need to refactor how you map the functions so that fewer whole lists
are passed around. I'd guess that (1) is being executed in its entirety
before being passed to (2), but it's not until (2) that the file data is
actually used.

 main =
 getArgs  =
 mapM fileContentsOfDirectory = -- (1)
 mapM_ print . threadEmails . map parseEmail . concat -- (2)

This means there are a lot of files sitting open doing nothing. I've had
a lot of success by recreating this as:

 main = 
  getArgs =
  mapM_ readAndPrint
   where readAndPrint = fileContentsOfDirectory = print -- etc.

It may seem semantically identical but it sometimes makes a difference
when things actually happen.

-- 
Dougal Stanton
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe