subject:"\[Haskell\-cafe\] Re\: Lazy IO and closing of file handles"

[Haskell-cafe] Re: Lazy IO and closing of file handles

2007-03-21 Thread Benjamin Franksen

Benjamin Franksen wrote:
> Bertram Felgenhauer wrote:
>> Having to rely on GC to close the fds quickly enough is another problem;
>> can this be solved on the library side, maybe by performing GCs when
>> running out of FDs?
> 
> Claus Reinke wrote:
>> in good old Hugs, for instance, we find in function newHandle in
> src/iomonad.c
>> [...snip...]
>>     /* Search for unused handle*/
>>     /* If at first we don't    */
>>     /* succeed, garbage collect*/
>>     /* and try again ...       */
>>     /* ... before we give up   */
>> 
>> so, instead of documenting limitations and workarounds, this issue should
> be
>> fixed in GHC as well.
> 
> This may help in some cases but it cannot be relied upon. Finalizers are
> always run in a separate thread (must be, see
> http://www.hpl.hp.com/techreports/2002/HPL-2002-335.html). Thus, even if
> you force a GC when handles are exhausted, as hugs seems to do, there is
no
> guarantee that by the time the GC is done the finalizers have freed any
> handles (assuming that the GC run really detects any handles to be
> garbage).

Sorry for replying to myself, but I just realized that the argument brought
forth by Boehm applies only to general purpose finalizing facilites, and
not necessarily to each and every special case. I think one could make up
an argument that file handles in Haskell are indeed a special kind of
object and that the language runtime /can/ run finalizers for file handles
in a more 'synchronous' way (i.e. GC could call them directly as soon as it
determines they are garbage). The main point here is that a file descriptor
does not contain references to other language objects.

The same would apply to all sorts of OS resource handles. However, the whole
argument is a priori valid only for raw system handles, such as file
descriptors. No idea what issues come up if one considers e.g. buffering,
or more generally, any additional data structure that gets associated with
the handle.

Cheers
Ben

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Re: Lazy IO and closing of file handles

2007-03-21 Thread Claus Reinke


[trigger garbage collection when open runs out of free file descriptors, then 
try again]

so, instead of documenting limitations and workarounds, this issue should be
fixed in GHC as well.


This may help in some cases but it cannot be relied upon. Finalizers are
always run in a separate thread (must be, see
http://www.hpl.hp.com/techreports/2002/HPL-2002-335.html). Thus, even if
you force a GC when handles are exhausted, as hugs seems to do, there is no
guarantee that by the time the GC is done the finalizers have freed any
handles (assuming that the GC run really detects any handles to be
garbage).


useful reference to collect!-) but even that mentions giving back os resources 
such
as file descriptors as one of the simpler cases. running the GC/finalizers 
sequence
repeatedly until nothing more changes might be worth thinking about, as are 
possible
race conditions. here is the thread the paper is refering to as one of its 
origins:

   http://gcc.gnu.org/ml/java/2001-12/msg00113.html
   http://gcc.gnu.org/ml/java/2001-12/msg00390.html

i also like the idea mentioned as one of the alternatives in 3.1, where the 
finalizer does
not notify the object that is to become garbage, but a different manager 
object. in this
case, one might notify the i/o handler, and that could take care of avoiding 
trouble.

in my opinion, if my code or my finalizers hold on to resources i'd like to see 
freed,
then i'm responsible, even if i might need language help to remedy the 
situation.
but if i take care to avoid such references, and the system still runs out of 
resources
just because it can't be bothered to check right now whether it has some left 
to free,
there is nothing i can do about it (apart from complaining, that is!-).

of course, this isn't new. see, for instance, this thread view:
http://groups.google.com/group/fa.haskell/browse_thread/thread/2f1f855c8ba33a5/74d32070dbcc92fc?lnk=st&q=hugs+openFile+file+descriptor+garbage+collection&rnum=1#74d32070dbcc92fc

where Remi Turk points out System.Mem.performGC, and Simon Marlow
agrees that GHC should do more to free file descriptors, but also mentions that
performGC doesn't run finalizers.

actually, if i have readFile-based code that immediately processes the file 
contents
before the next readFile, as in Matthew's test code, my ghci (on windows) 
doesn't
seem to run out of file descriptors easily, but if i force a descriptor leak by 
leaving
unreferenced contents unprocessed, then performGC does seem to help (not that
this is ideal in general, as discussed in the thread above):

   import System.Environment
   import System.Mem
   import System.IO

   main = do
 n:f:_ <- getArgs
 (sequence (repeat (openFile f ReadMode)) >> return ()) `catch` (\_->return 
())
 test1 (take (read n) $ repeat f)

   test1 files = mapM_ doStuff files where
 doStuff f = {- performGC >> -} readFile f >>= print.map length.take 
10.lines

interestingly, if i do that, even Hugs seems to need the performGC?

claus

ps. one could even try to go further, and have virtual file descriptors, like 
virtual
   memory. but that is something for the os, i guess.


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

[Haskell-cafe] Re: Lazy IO and closing of file handles

2007-03-20 Thread Benjamin Franksen

Bertram Felgenhauer wrote:
> Having to rely on GC to close the fds quickly enough is another problem;
> can this be solved on the library side, maybe by performing GCs when
> running out of FDs?

Claus Reinke wrote:
> in good old Hugs, for instance, we find in function newHandle in
src/iomonad.c
> [...snip...]
>     /* Search for unused handle*/
>     /* If at first we don't    */
>     /* succeed, garbage collect*/
>     /* and try again ...       */
>     /* ... before we give up   */
> 
> so, instead of documenting limitations and workarounds, this issue should
be
> fixed in GHC as well.

This may help in some cases but it cannot be relied upon. Finalizers are
always run in a separate thread (must be, see
http://www.hpl.hp.com/techreports/2002/HPL-2002-335.html). Thus, even if
you force a GC when handles are exhausted, as hugs seems to do, there is no
guarantee that by the time the GC is done the finalizers have freed any
handles (assuming that the GC run really detects any handles to be
garbage).

Cheers
Ben

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

[Haskell-cafe] Re: Lazy IO and closing of file handles

2007-03-20 Thread Ferenc Wagner

"Matthew Brecknell" <[EMAIL PROTECTED]> writes:

>> enumLines :: (a -> String -> Either a a) -> a -> FilePath -> IO a
>> enumLines iter accum filename = do
>>   h <- openFile filename ReadMode
>>   flip fix accum $
>> \iterate accum -> do
>>   try_line <- try (hGetLine h)
>>   case try_line of
>> Left e -> hClose h >> return accum
>> Right line -> do
>>   case iter accum line of
>> Left accum -> hClose h >> return accum
>> Right accum -> iterate accum

Another variation, enabling multiple iteratees (like a state machine),
exception propagation, and no flip fix :)

> newtype Iterator a = Iterator (a -> String -> (a,Maybe (Iterator a)))
> 
> enumLines :: Iterator a -> a -> FilePath -> IO (a,Maybe Exception)
> enumLines iterator start filename = do
>   h <- openFile filename ReadMode
>   let f (Iterator iter) accum = do
> try_line <- try (hGetLine h)
> case try_line of
>   Left e -> hClose h >> return (accum,Just e)
>   Right line -> do
> case iter accum line of
>   (acc',Nothing)   -> hClose h >> return (acc',Nothing)
>   (acc',Just cont) -> f cont acc'
>   f iterator start
-- 
Feri.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Re: Lazy IO and closing of file handles

2007-03-19 Thread Isaac Dupree

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Pete Kazmier wrote:
> Bryan O'Sullivan <[EMAIL PROTECTED]> writes:
> 
>> Pete Kazmier wrote:
>>
>>> I understand the intent of this code, but I am having a hard time
>>> understanding the implementation, specifically the combination of
>>> 'fix', 'flip', and 'interate'.  I looked up 'fix' and I'm unsure how
>>> one can call 'flip' on a function that takes one argument.
> 
>> As to why it's okay to call "flip" on "fix" at all, look at the types
>> involved.
>>
>> fix :: (a -> a) -> a
>> flip :: (a -> b -> c) -> b -> a -> c
>>
>> By substitution:
>>
>> flip fix :: a -> ((a -> b) -> a -> b) -> b
> 
> Sadly, I'm still confused.  I understand how 'flip' works in the case
> where its argument is a function that takes two arguments.  I've
> started to use this in my own code lately.  But my brain refuses to
> understand how 'flip' is applied to 'fix', a function that takes one
> argument only, which happens to be a function itself.  What is 'flip'
> flipping when the function passed to it only takes one argument?

fix :: (a -> a) -> a
In this case, we know something about 'a': it is a function (b -> c).
Substitute:
fix :: ((b -> c) -> (b -> c)) -> (b -> c)
Take advantage of the right-associativity of (->)
fix :: ((b -> c) -> b -> c) -> b -> c
Now it looks like a function of two arguments, because the return value
(normally ordinary data) can in fact, in this case, take arguments.

Here's another example of that:

data Box a = Box a
get (Box a) = a
- -- get (Box 1) :: Int
- -- get (Box (\a -> a)) :: Int -> Int
- -- (get (Box (\a -> a))) 1 :: Int
 --function application is left-associative:
- -- get (Box (\a -> a)) 1 :: Int
- -- flip get 1 (Box (\a -> a)) :: Int

Yes, it sometimes confuses me too.

Isaac
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.3 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFF/vcXHgcxvIWYTTURAj5RAKCUMeAF0vosJ6ROAVlBIDHsEq/vzgCfflnR
50BmW6tuAF6mKXBtrlHdQ5Y=
=uv3G
-END PGP SIGNATURE-
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Re: Lazy IO and closing of file handles

2007-03-19 Thread Lennart Augustsson


Here's what happens:
fix has type (x->x)->x
and that has to match the first argument to flip, namely 'a->b->c'.
The only chance of that is if x is actually a function type.
Pick x=b->c, now we have
fix has type ((b->c)->b->c)->b->c
and it matches a->b->c if a=(b->c)->b->c

Flip returns b->a->c, and if we substitute we get
b->((b->c)->b->c)->c

If you rename the variables you get the suggested type.

-- Lennart


On Mar 19, 2007, at 20:35 , Pete Kazmier wrote:


Bryan O'Sullivan <[EMAIL PROTECTED]> writes:


Pete Kazmier wrote:


I understand the intent of this code, but I am having a hard time
understanding the implementation, specifically the combination of
'fix', 'flip', and 'interate'.  I looked up 'fix' and I'm unsure how
one can call 'flip' on a function that takes one argument.



As to why it's okay to call "flip" on "fix" at all, look at the types
involved.

fix :: (a -> a) -> a
flip :: (a -> b -> c) -> b -> a -> c

By substitution:

flip fix :: a -> ((a -> b) -> a -> b) -> b


Sadly, I'm still confused.  I understand how 'flip' works in the case
where its argument is a function that takes two arguments.  I've
started to use this in my own code lately.  But my brain refuses to
understand how 'flip' is applied to 'fix', a function that takes one
argument only, which happens to be a function itself.  What is 'flip'
flipping when the function passed to it only takes one argument?

Thanks,
Pete

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

[Haskell-cafe] Re: Lazy IO and closing of file handles

2007-03-19 Thread Pete Kazmier

Bryan O'Sullivan <[EMAIL PROTECTED]> writes:

> Pete Kazmier wrote:
>
>> I understand the intent of this code, but I am having a hard time
>> understanding the implementation, specifically the combination of
>> 'fix', 'flip', and 'interate'.  I looked up 'fix' and I'm unsure how
>> one can call 'flip' on a function that takes one argument.

> As to why it's okay to call "flip" on "fix" at all, look at the types
> involved.
>
> fix :: (a -> a) -> a
> flip :: (a -> b -> c) -> b -> a -> c
>
> By substitution:
>
> flip fix :: a -> ((a -> b) -> a -> b) -> b

Sadly, I'm still confused.  I understand how 'flip' works in the case
where its argument is a function that takes two arguments.  I've
started to use this in my own code lately.  But my brain refuses to
understand how 'flip' is applied to 'fix', a function that takes one
argument only, which happens to be a function itself.  What is 'flip'
flipping when the function passed to it only takes one argument?

Thanks,
Pete

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Re: Lazy IO and closing of file handles

2007-03-19 Thread Bryan O'Sullivan

Pete Kazmier wrote:

I understand the intent of this code, but I am having a hard time
understanding the implementation, specifically the combination of
'fix', 'flip', and 'interate'.  I looked up 'fix' and I'm unsure how
one can call 'flip' on a function that takes one argument.

If you look at the code, that's not really what's happening.  See the 
embedded anonymous function below?

>  flip fix accum $
> \iterate accum -> do
>   ...

It's a function of two arguments.  All "flip" is doing is switching the 
order of the arguments to "fix", in this case for readability.  If you 
were to get rid of the "flip", you'd need to remove the "accum" after 
"fix" and move it after the lambda expression, which would make the 
expression much uglier to write and read.  So all the "flip" is doing 
here is tidying up the code.

(If you're still confused, look at the difference between forM and mapM. 
 The only reason forM exists is readability when you have - in terms of 
the amount of screen space they consume - a big function and a small 
piece of data, just as here.)

As to why it's okay to call "flip" on "fix" at all, look at the types 
involved.

fix :: (a -> a) -> a
flip :: (a -> b -> c) -> b -> a -> c

By substitution:

flip fix :: a -> ((a -> b) -> a -> b) -> b

In the case above, accum has type a, and the lambda has type
(a -> IO a) -> a -> IO a, and these fit nicely into the type expected by 
"flip fix".

http://www.haskell.org/mailman/listinfo/haskell-cafe

[Haskell-cafe] Re: Lazy IO and closing of file handles

2007-03-19 Thread Pete Kazmier

Pete Kazmier <[EMAIL PROTECTED]> writes:

> I attempted to read Oleg's fold-stream implementation [1] as this
> sounds quite appealing to me, but I was completely overwhelmed,
> especially with all of the various type signatures used.  It would be
> great if one of the regular Haskell bloggers (Tom Moertel are you
> reading this?) might write a blog entry or two interpreting his
> implementation for those of us starting out in Haskell perhaps by
> starting out with a non-polymorphic version so as to emphasize the
> approach.
>
> [1] http://okmij.org/ftp/Haskell/fold-stream.lhs

In the event any other Haskell newbie comes along someday and is just
as overwhelmed as I was, I've found this post by Oleg to be a much
easier to understand than the above paper because it is not as generic
and thus the type signatures are a bit easier on the eyes:

http://www.haskell.org/pipermail/haskell/2003-September/012741.html

With that said, I have a question regarding Hal's response to the
above email in which he states:

> Just thought I'd mention that this is, in fact, my preferred method of
> iterating over a file.  It alleviates the pain associated with lazy file
> IO, and simultaneously provides a useful abstraction.  I actually have
> 3*2 functions that I use which look like:
> 
> > type Iteratee  iter seed = seed -> iter -> Either seed seed
> > hFoldChars  :: FilePath -> Iteratee  Char seed -> seed -> IO seed
> > hFoldLines  :: FilePath -> Iteratee  String   seed -> seed -> IO seed
> > hFoldWords  :: FilePath -> Iteratee  [String] seed -> seed -> IO seed
> 
> > type IterateeM iter seed = seed -> iter -> IO (Either seed seed)
> > hFoldCharsM :: FilePath -> IterateeM Char seed -> seed -> IO seed
> > hFoldLinesM :: FilePath -> IterateeM String   seed -> seed -> IO seed
> > hFoldWordsM :: FilePath -> IterateeM [String] seed -> seed -> IO seed
> 
> Which perform as expected (hFoldWords(M) can be written in terms of
> hFoldLinesM, but I find I use it sufficiently frequently to warrent
> having it stand out).  Also, of course, the only ones actually
> implemented are the (M) variants; the non-M variants just throw a return
> into the Iteratee.

What does he mean by the very last sentence?  Oleg's version seems
more like the non-M versions.  What is his implication?

Thanks,
Pete

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

[Haskell-cafe] Re: Lazy IO and closing of file handles

2007-03-19 Thread Pete Kazmier

"Matthew Brecknell" <[EMAIL PROTECTED]> writes:

> Pete Kazmier:
>> I attempted to read Oleg's fold-stream implementation [1] as this
>> sounds quite appealing to me, but I was completely overwhelmed,
>> especially with all of the various type signatures used.  It would be
>> great if one of the regular Haskell bloggers (Tom Moertel are you
>> reading this?) might write a blog entry or two interpreting his
>> implementation for those of us starting out in Haskell perhaps by
>> starting out with a non-polymorphic version so as to emphasize the
>> approach.
>> 
>> [1] http://okmij.org/ftp/Haskell/fold-stream.lhs
>
> The basic idea of the paper is the use of a left-fold operator as the
> primary interface for enumarating collections. The recursive version
> (less general than the non-recursive version) of a left-fold operator
> for enumerating the lines of a text file might look something like this:
>
>> import Control.Monad.Fix
>> import Control.Exception
>> import Data.List
>> import qualified Data.Set as S
>> import qualified Data.Map as M
>> import System.IO
>> 
>> enumLines :: (a -> String -> Either a a) -> a -> FilePath -> IO a
>> enumLines iter accum filename = do
>>   h <- openFile filename ReadMode
>>   flip fix accum $
>> \iterate accum -> do
>>   try_line <- try (hGetLine h)
>>   case try_line of
>> Left e -> hClose h >> return accum
>> Right line -> do
>>   case iter accum line of
>> Left accum -> hClose h >> return accum
>> Right accum -> iterate accum

I understand the intent of this code, but I am having a hard time
understanding the implementation, specifically the combination of
'fix', 'flip', and 'interate'.  I looked up 'fix' and I'm unsure how
one can call 'flip' on a function that takes one argument.

> To use this, you provide an "iteratee", a function which takes an
> accumulator and a line from the file, and returns a new accumulator
> embedded in an Either. Using the Left branch causes immediate
> termination of the enumeration. For example, to search for the first
> occurrence of each of a set of email headers:
>
>> getHeaders :: S.Set String -> FilePath -> IO (S.Set String, M.Map String 
>> String)
>> getHeaders hdrs file = enumLines findHdrs (hdrs,M.empty) file where
>>   findHdrs accum@(wanted,found) line =
>> if null line
>>   then Left accum
>>   else
>> case headerLine line of
>>   Nothing -> Right accum
>>   Just hdr ->
>> case findDelete hdr wanted of
>>   Nothing -> Right accum
>>   Just wanted ->
>> let accum = (wanted, M.insert hdr line found) in
>>   if S.null wanted
>> then Left accum
>> else Right accum
>> 
>> headerLine :: String -> Maybe String
>> headerLine (':':xs) = Just []
>> headerLine (x:xs) = fmap (x:) (headerLine xs)
>> headerLine [] = Nothing
>> 
>> findDelete :: Ord a => a -> S.Set a -> Maybe (S.Set a)
>> findDelete e s = if S.member e s
>>   then Just (S.delete e s)
>>   else Nothing
>
> It's a bit of a case-analysis nightmare, but when comparing this to
> previous approaches, note that file traversal and processing are cleanly
> separated, file handle closure is guaranteed to be timely, file
> traversal stops as soon as all the required headers have been found,
> memory usage is minimised.

Very nice.  I like the clean separation, but as you say, its one ugly
bit of code compared to my original code, although much more elegant
no doubt.

> I hope that helps.

Very much so.  Thank you for you help.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

[Haskell-cafe] Re: Lazy IO and closing of file handles

2007-03-17 Thread Pete Kazmier

"Matthew Brecknell" <[EMAIL PROTECTED]> writes:

> So here's a test. I don't have any big maildirs handy, so this is based
> on the simple exercise of printing the first line of each of a large
> number of files. First, the preamble.
>
>> import Control.Exception (bracket)
>> import System.Environment
>> import System.IO
>
>> main = do
>>   t:n:fs <- getArgs
>>   ([test0,test1,test2,test3] !! read t) (take (read n) $ cycle fs)
> 
> [snip]

Thank you for summarizing the approaches presented by others.  As a
Haskell newbie, there seems to be quite a few esoteric concepts to
conquer.  Your concrete examples were helpful in my understanding of
the ramifications associated with the various approaches.

After reading the various threads you cited, I decided to avoid lazy
IO altogether.  By using 'readFile' without forcing the strict
evaluation of my parser, I inadvertently relinquished control of the
resource management--closing of the file handles was left to the GC.
And although I could have used 'seq' to address the issue, why bother
fixing a problem that could have been avoided altogther by using
strict IO.

With that said, I added the following function to my program and then
replaced the invocation of 'readFile' with it:

  readEmailHeaders :: FilePath -> IO String
  readEmailHeaders file = 
  bracket (openFile file ReadMode) (hClose) (headers [])
  where
headers acc h = do
line <- hGetLine h
case line of
  -- Stop reading file once we hit the empty separator
  -- line, no need to read the rest of the file (body).
  "" -> return . concat . reverse $ acc
  _  -> headers ("\n":line:acc) h

I'm not sure if this is the best implementation, but the speed is
comparable to the lazy IO version without the annoying defect of
running out of file handles.  I also tried an implementation using
'hGetChar' but that was much slower.

I attempted to read Oleg's fold-stream implementation [1] as this
sounds quite appealing to me, but I was completely overwhelmed,
especially with all of the various type signatures used.  It would be
great if one of the regular Haskell bloggers (Tom Moertel are you
reading this?) might write a blog entry or two interpreting his
implementation for those of us starting out in Haskell perhaps by
starting out with a non-polymorphic version so as to emphasize the
approach.

Thanks,
Pete

[1] http://okmij.org/ftp/Haskell/fold-stream.lhs

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Re: Lazy IO and closing of file handles

2007-03-14 Thread Dougal Stanton

Quoth Pete Kazmier, nevermore,
> the same error regarding max open files.  Incidentally, the lazy
> bytestring version of my program was by far the fastest and used the
> least amount of memory, but it still crapped out regarding max open
> files. 

I've tried the approach you appear to be using and it can be tricky
to predict how the laziness will interact with the list of actions.

For example, I tried to download a temporary file, read a bit of data
out of it and then download another one. I thought I would save thinking
and use the same file name for each download: /tmp/feed.xml. What
happened was that it downloaded them all in rapid succession,
over-writing each one with the next and not actually reading the data
until the end. So I ended up parsing N identical copies of the final
file, instead of one of each.

You need to refactor how you map the functions so that fewer whole lists
are passed around. I'd guess that (1) is being executed in its entirety
before being passed to (2), but it's not until (2) that the file data is
actually used.

> main =
> getArgs  >>=
> mapM fileContentsOfDirectory >>= -- (1)
> mapM_ print . threadEmails . map parseEmail . concat -- (2)

This means there are a lot of files sitting open doing nothing. I've had
a lot of success by recreating this as:

> main = 
>  getArgs >>=
>  mapM_ readAndPrint
>   where readAndPrint = fileContentsOfDirectory >>= print -- etc.

It may seem semantically identical but it sometimes makes a difference
when things actually happen.

-- 
Dougal Stanton
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Re: Lazy IO and closing of file handles

2007-03-14 Thread Donald Bruce Stewart

pete-expires-20070513:
> [EMAIL PROTECTED] (Donald Bruce Stewart) writes:
> 
> > pete-expires-20070513:
> >> When using readFile to process a large number of files, I am exceeding
> >> the resource limits for the maximum number of open file descriptors on
> >> my system.  How can I enhance my program to deal with this situation
> >> without making significant changes?
> >
> > Read in data strictly, and there are two obvious ways to do that:
> >
> > -- Via strings:
> >
> > readFileStrict f = do
> > s <- readFile f
> > length s `seq` return s
> >
> > -- Via ByteStrings
> > readFileStrict  = Data.ByteString.readFile
> > readFileStrictString  = liftM Data.ByteString.unpack 
> > Data.ByteString.readFile
> >
> > If you're reading more than say, 100k of data, I'd use strict
> > ByteStrings without hesitation. More than 10M, and I'd use lazy
> > bytestrings.
> 
> Correct me if I'm wrong, but isn't this exactly what I wanted to
> avoid?  Reading the entire file into memory?  In my previous email, I
> was trying to state that I wanted to lazily read the file because some
> of the files are quite large and there is no reason to read beyond the
> small set of headers.  If I read the entire file into memory, this
> design goal is no longer met.
> 
> Nevertheless, I was benchmarking with ByteStrings (both lazy and
> strict), and in both cases, the ByteString versions of readFile yield
> the same error regarding max open files.  Incidentally, the lazy
> bytestring version of my program was by far the fastest and used the
> least amount of memory, but it still crapped out regarding max open
> files. 
> 
> So I'm back to square one.  Any other ideas?

Hmm. Ok. So we need to have more hClose's happen somehow. Can you
process files one at a time?

-- Don
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

[Haskell-cafe] Re: Lazy IO and closing of file handles

2007-03-14 Thread Pete Kazmier

[EMAIL PROTECTED] (Donald Bruce Stewart) writes:

> pete-expires-20070513:
>> When using readFile to process a large number of files, I am exceeding
>> the resource limits for the maximum number of open file descriptors on
>> my system.  How can I enhance my program to deal with this situation
>> without making significant changes?
>
> Read in data strictly, and there are two obvious ways to do that:
>
> -- Via strings:
>
> readFileStrict f = do
> s <- readFile f
> length s `seq` return s
>
> -- Via ByteStrings
> readFileStrict  = Data.ByteString.readFile
> readFileStrictString  = liftM Data.ByteString.unpack 
> Data.ByteString.readFile
>
> If you're reading more than say, 100k of data, I'd use strict
> ByteStrings without hesitation. More than 10M, and I'd use lazy
> bytestrings.

Correct me if I'm wrong, but isn't this exactly what I wanted to
avoid?  Reading the entire file into memory?  In my previous email, I
was trying to state that I wanted to lazily read the file because some
of the files are quite large and there is no reason to read beyond the
small set of headers.  If I read the entire file into memory, this
design goal is no longer met.

Nevertheless, I was benchmarking with ByteStrings (both lazy and
strict), and in both cases, the ByteString versions of readFile yield
the same error regarding max open files.  Incidentally, the lazy
bytestring version of my program was by far the fastest and used the
least amount of memory, but it still crapped out regarding max open
files. 

So I'm back to square one.  Any other ideas?

Thanks,
Pete

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

[Haskell-cafe] Re: Lazy IO and closing of file handles

Re: [Haskell-cafe] Re: Lazy IO and closing of file handles

[Haskell-cafe] Re: Lazy IO and closing of file handles

[Haskell-cafe] Re: Lazy IO and closing of file handles

Re: [Haskell-cafe] Re: Lazy IO and closing of file handles

Re: [Haskell-cafe] Re: Lazy IO and closing of file handles

[Haskell-cafe] Re: Lazy IO and closing of file handles

Re: [Haskell-cafe] Re: Lazy IO and closing of file handles

[Haskell-cafe] Re: Lazy IO and closing of file handles

[Haskell-cafe] Re: Lazy IO and closing of file handles

[Haskell-cafe] Re: Lazy IO and closing of file handles

Re: [Haskell-cafe] Re: Lazy IO and closing of file handles

Re: [Haskell-cafe] Re: Lazy IO and closing of file handles

[Haskell-cafe] Re: Lazy IO and closing of file handles

14 matches

Site Navigation

Mail list logo

Footer information