[Haskell-cafe] Re: Lazy IO and closing of file handles
Benjamin Franksen wrote: > Bertram Felgenhauer wrote: >> Having to rely on GC to close the fds quickly enough is another problem; >> can this be solved on the library side, maybe by performing GCs when >> running out of FDs? > > Claus Reinke wrote: >> in good old Hugs, for instance, we find in function newHandle in > src/iomonad.c >> [...snip...] >> /* Search for unused handle*/ >> /* If at first we don't */ >> /* succeed, garbage collect*/ >> /* and try again ... */ >> /* ... before we give up */ >> >> so, instead of documenting limitations and workarounds, this issue should > be >> fixed in GHC as well. > > This may help in some cases but it cannot be relied upon. Finalizers are > always run in a separate thread (must be, see > http://www.hpl.hp.com/techreports/2002/HPL-2002-335.html). Thus, even if > you force a GC when handles are exhausted, as hugs seems to do, there is no > guarantee that by the time the GC is done the finalizers have freed any > handles (assuming that the GC run really detects any handles to be > garbage). Sorry for replying to myself, but I just realized that the argument brought forth by Boehm applies only to general purpose finalizing facilites, and not necessarily to each and every special case. I think one could make up an argument that file handles in Haskell are indeed a special kind of object and that the language runtime /can/ run finalizers for file handles in a more 'synchronous' way (i.e. GC could call them directly as soon as it determines they are garbage). The main point here is that a file descriptor does not contain references to other language objects. The same would apply to all sorts of OS resource handles. However, the whole argument is a priori valid only for raw system handles, such as file descriptors. No idea what issues come up if one considers e.g. buffering, or more generally, any additional data structure that gets associated with the handle. Cheers Ben ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: Lazy IO and closing of file handles
[trigger garbage collection when open runs out of free file descriptors, then try again] so, instead of documenting limitations and workarounds, this issue should be fixed in GHC as well. This may help in some cases but it cannot be relied upon. Finalizers are always run in a separate thread (must be, see http://www.hpl.hp.com/techreports/2002/HPL-2002-335.html). Thus, even if you force a GC when handles are exhausted, as hugs seems to do, there is no guarantee that by the time the GC is done the finalizers have freed any handles (assuming that the GC run really detects any handles to be garbage). useful reference to collect!-) but even that mentions giving back os resources such as file descriptors as one of the simpler cases. running the GC/finalizers sequence repeatedly until nothing more changes might be worth thinking about, as are possible race conditions. here is the thread the paper is refering to as one of its origins: http://gcc.gnu.org/ml/java/2001-12/msg00113.html http://gcc.gnu.org/ml/java/2001-12/msg00390.html i also like the idea mentioned as one of the alternatives in 3.1, where the finalizer does not notify the object that is to become garbage, but a different manager object. in this case, one might notify the i/o handler, and that could take care of avoiding trouble. in my opinion, if my code or my finalizers hold on to resources i'd like to see freed, then i'm responsible, even if i might need language help to remedy the situation. but if i take care to avoid such references, and the system still runs out of resources just because it can't be bothered to check right now whether it has some left to free, there is nothing i can do about it (apart from complaining, that is!-). of course, this isn't new. see, for instance, this thread view: http://groups.google.com/group/fa.haskell/browse_thread/thread/2f1f855c8ba33a5/74d32070dbcc92fc?lnk=st&q=hugs+openFile+file+descriptor+garbage+collection&rnum=1#74d32070dbcc92fc where Remi Turk points out System.Mem.performGC, and Simon Marlow agrees that GHC should do more to free file descriptors, but also mentions that performGC doesn't run finalizers. actually, if i have readFile-based code that immediately processes the file contents before the next readFile, as in Matthew's test code, my ghci (on windows) doesn't seem to run out of file descriptors easily, but if i force a descriptor leak by leaving unreferenced contents unprocessed, then performGC does seem to help (not that this is ideal in general, as discussed in the thread above): import System.Environment import System.Mem import System.IO main = do n:f:_ <- getArgs (sequence (repeat (openFile f ReadMode)) >> return ()) `catch` (\_->return ()) test1 (take (read n) $ repeat f) test1 files = mapM_ doStuff files where doStuff f = {- performGC >> -} readFile f >>= print.map length.take 10.lines interestingly, if i do that, even Hugs seems to need the performGC? claus ps. one could even try to go further, and have virtual file descriptors, like virtual memory. but that is something for the os, i guess. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: Lazy IO and closing of file handles
Bertram Felgenhauer wrote: > Having to rely on GC to close the fds quickly enough is another problem; > can this be solved on the library side, maybe by performing GCs when > running out of FDs? Claus Reinke wrote: > in good old Hugs, for instance, we find in function newHandle in src/iomonad.c > [...snip...] > /* Search for unused handle*/ > /* If at first we don't */ > /* succeed, garbage collect*/ > /* and try again ... */ > /* ... before we give up */ > > so, instead of documenting limitations and workarounds, this issue should be > fixed in GHC as well. This may help in some cases but it cannot be relied upon. Finalizers are always run in a separate thread (must be, see http://www.hpl.hp.com/techreports/2002/HPL-2002-335.html). Thus, even if you force a GC when handles are exhausted, as hugs seems to do, there is no guarantee that by the time the GC is done the finalizers have freed any handles (assuming that the GC run really detects any handles to be garbage). Cheers Ben ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: Lazy IO and closing of file handles
"Matthew Brecknell" <[EMAIL PROTECTED]> writes: >> enumLines :: (a -> String -> Either a a) -> a -> FilePath -> IO a >> enumLines iter accum filename = do >> h <- openFile filename ReadMode >> flip fix accum $ >> \iterate accum -> do >> try_line <- try (hGetLine h) >> case try_line of >> Left e -> hClose h >> return accum >> Right line -> do >> case iter accum line of >> Left accum -> hClose h >> return accum >> Right accum -> iterate accum Another variation, enabling multiple iteratees (like a state machine), exception propagation, and no flip fix :) > newtype Iterator a = Iterator (a -> String -> (a,Maybe (Iterator a))) > > enumLines :: Iterator a -> a -> FilePath -> IO (a,Maybe Exception) > enumLines iterator start filename = do > h <- openFile filename ReadMode > let f (Iterator iter) accum = do > try_line <- try (hGetLine h) > case try_line of > Left e -> hClose h >> return (accum,Just e) > Right line -> do > case iter accum line of > (acc',Nothing) -> hClose h >> return (acc',Nothing) > (acc',Just cont) -> f cont acc' > f iterator start -- Feri. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: Lazy IO and closing of file handles
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Pete Kazmier wrote: > Bryan O'Sullivan <[EMAIL PROTECTED]> writes: > >> Pete Kazmier wrote: >> >>> I understand the intent of this code, but I am having a hard time >>> understanding the implementation, specifically the combination of >>> 'fix', 'flip', and 'interate'. I looked up 'fix' and I'm unsure how >>> one can call 'flip' on a function that takes one argument. > >> As to why it's okay to call "flip" on "fix" at all, look at the types >> involved. >> >> fix :: (a -> a) -> a >> flip :: (a -> b -> c) -> b -> a -> c >> >> By substitution: >> >> flip fix :: a -> ((a -> b) -> a -> b) -> b > > Sadly, I'm still confused. I understand how 'flip' works in the case > where its argument is a function that takes two arguments. I've > started to use this in my own code lately. But my brain refuses to > understand how 'flip' is applied to 'fix', a function that takes one > argument only, which happens to be a function itself. What is 'flip' > flipping when the function passed to it only takes one argument? fix :: (a -> a) -> a In this case, we know something about 'a': it is a function (b -> c). Substitute: fix :: ((b -> c) -> (b -> c)) -> (b -> c) Take advantage of the right-associativity of (->) fix :: ((b -> c) -> b -> c) -> b -> c Now it looks like a function of two arguments, because the return value (normally ordinary data) can in fact, in this case, take arguments. Here's another example of that: data Box a = Box a get (Box a) = a - -- get (Box 1) :: Int - -- get (Box (\a -> a)) :: Int -> Int - -- (get (Box (\a -> a))) 1 :: Int --function application is left-associative: - -- get (Box (\a -> a)) 1 :: Int - -- flip get 1 (Box (\a -> a)) :: Int Yes, it sometimes confuses me too. Isaac -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.3 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFF/vcXHgcxvIWYTTURAj5RAKCUMeAF0vosJ6ROAVlBIDHsEq/vzgCfflnR 50BmW6tuAF6mKXBtrlHdQ5Y= =uv3G -END PGP SIGNATURE- ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: Lazy IO and closing of file handles
Here's what happens: fix has type (x->x)->x and that has to match the first argument to flip, namely 'a->b->c'. The only chance of that is if x is actually a function type. Pick x=b->c, now we have fix has type ((b->c)->b->c)->b->c and it matches a->b->c if a=(b->c)->b->c Flip returns b->a->c, and if we substitute we get b->((b->c)->b->c)->c If you rename the variables you get the suggested type. -- Lennart On Mar 19, 2007, at 20:35 , Pete Kazmier wrote: Bryan O'Sullivan <[EMAIL PROTECTED]> writes: Pete Kazmier wrote: I understand the intent of this code, but I am having a hard time understanding the implementation, specifically the combination of 'fix', 'flip', and 'interate'. I looked up 'fix' and I'm unsure how one can call 'flip' on a function that takes one argument. As to why it's okay to call "flip" on "fix" at all, look at the types involved. fix :: (a -> a) -> a flip :: (a -> b -> c) -> b -> a -> c By substitution: flip fix :: a -> ((a -> b) -> a -> b) -> b Sadly, I'm still confused. I understand how 'flip' works in the case where its argument is a function that takes two arguments. I've started to use this in my own code lately. But my brain refuses to understand how 'flip' is applied to 'fix', a function that takes one argument only, which happens to be a function itself. What is 'flip' flipping when the function passed to it only takes one argument? Thanks, Pete ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: Lazy IO and closing of file handles
Bryan O'Sullivan <[EMAIL PROTECTED]> writes: > Pete Kazmier wrote: > >> I understand the intent of this code, but I am having a hard time >> understanding the implementation, specifically the combination of >> 'fix', 'flip', and 'interate'. I looked up 'fix' and I'm unsure how >> one can call 'flip' on a function that takes one argument. > As to why it's okay to call "flip" on "fix" at all, look at the types > involved. > > fix :: (a -> a) -> a > flip :: (a -> b -> c) -> b -> a -> c > > By substitution: > > flip fix :: a -> ((a -> b) -> a -> b) -> b Sadly, I'm still confused. I understand how 'flip' works in the case where its argument is a function that takes two arguments. I've started to use this in my own code lately. But my brain refuses to understand how 'flip' is applied to 'fix', a function that takes one argument only, which happens to be a function itself. What is 'flip' flipping when the function passed to it only takes one argument? Thanks, Pete ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: Lazy IO and closing of file handles
Pete Kazmier wrote: I understand the intent of this code, but I am having a hard time understanding the implementation, specifically the combination of 'fix', 'flip', and 'interate'. I looked up 'fix' and I'm unsure how one can call 'flip' on a function that takes one argument. If you look at the code, that's not really what's happening. See the embedded anonymous function below? > flip fix accum $ > \iterate accum -> do > ... It's a function of two arguments. All "flip" is doing is switching the order of the arguments to "fix", in this case for readability. If you were to get rid of the "flip", you'd need to remove the "accum" after "fix" and move it after the lambda expression, which would make the expression much uglier to write and read. So all the "flip" is doing here is tidying up the code. (If you're still confused, look at the difference between forM and mapM. The only reason forM exists is readability when you have - in terms of the amount of screen space they consume - a big function and a small piece of data, just as here.) As to why it's okay to call "flip" on "fix" at all, look at the types involved. fix :: (a -> a) -> a flip :: (a -> b -> c) -> b -> a -> c By substitution: flip fix :: a -> ((a -> b) -> a -> b) -> b In the case above, accum has type a, and the lambda has type (a -> IO a) -> a -> IO a, and these fit nicely into the type expected by "flip fix". http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: Lazy IO and closing of file handles
Pete Kazmier <[EMAIL PROTECTED]> writes: > I attempted to read Oleg's fold-stream implementation [1] as this > sounds quite appealing to me, but I was completely overwhelmed, > especially with all of the various type signatures used. It would be > great if one of the regular Haskell bloggers (Tom Moertel are you > reading this?) might write a blog entry or two interpreting his > implementation for those of us starting out in Haskell perhaps by > starting out with a non-polymorphic version so as to emphasize the > approach. > > [1] http://okmij.org/ftp/Haskell/fold-stream.lhs In the event any other Haskell newbie comes along someday and is just as overwhelmed as I was, I've found this post by Oleg to be a much easier to understand than the above paper because it is not as generic and thus the type signatures are a bit easier on the eyes: http://www.haskell.org/pipermail/haskell/2003-September/012741.html With that said, I have a question regarding Hal's response to the above email in which he states: > Just thought I'd mention that this is, in fact, my preferred method of > iterating over a file. It alleviates the pain associated with lazy file > IO, and simultaneously provides a useful abstraction. I actually have > 3*2 functions that I use which look like: > > > type Iteratee iter seed = seed -> iter -> Either seed seed > > hFoldChars :: FilePath -> Iteratee Char seed -> seed -> IO seed > > hFoldLines :: FilePath -> Iteratee String seed -> seed -> IO seed > > hFoldWords :: FilePath -> Iteratee [String] seed -> seed -> IO seed > > > type IterateeM iter seed = seed -> iter -> IO (Either seed seed) > > hFoldCharsM :: FilePath -> IterateeM Char seed -> seed -> IO seed > > hFoldLinesM :: FilePath -> IterateeM String seed -> seed -> IO seed > > hFoldWordsM :: FilePath -> IterateeM [String] seed -> seed -> IO seed > > Which perform as expected (hFoldWords(M) can be written in terms of > hFoldLinesM, but I find I use it sufficiently frequently to warrent > having it stand out). Also, of course, the only ones actually > implemented are the (M) variants; the non-M variants just throw a return > into the Iteratee. What does he mean by the very last sentence? Oleg's version seems more like the non-M versions. What is his implication? Thanks, Pete ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: Lazy IO and closing of file handles
"Matthew Brecknell" <[EMAIL PROTECTED]> writes: > Pete Kazmier: >> I attempted to read Oleg's fold-stream implementation [1] as this >> sounds quite appealing to me, but I was completely overwhelmed, >> especially with all of the various type signatures used. It would be >> great if one of the regular Haskell bloggers (Tom Moertel are you >> reading this?) might write a blog entry or two interpreting his >> implementation for those of us starting out in Haskell perhaps by >> starting out with a non-polymorphic version so as to emphasize the >> approach. >> >> [1] http://okmij.org/ftp/Haskell/fold-stream.lhs > > The basic idea of the paper is the use of a left-fold operator as the > primary interface for enumarating collections. The recursive version > (less general than the non-recursive version) of a left-fold operator > for enumerating the lines of a text file might look something like this: > >> import Control.Monad.Fix >> import Control.Exception >> import Data.List >> import qualified Data.Set as S >> import qualified Data.Map as M >> import System.IO >> >> enumLines :: (a -> String -> Either a a) -> a -> FilePath -> IO a >> enumLines iter accum filename = do >> h <- openFile filename ReadMode >> flip fix accum $ >> \iterate accum -> do >> try_line <- try (hGetLine h) >> case try_line of >> Left e -> hClose h >> return accum >> Right line -> do >> case iter accum line of >> Left accum -> hClose h >> return accum >> Right accum -> iterate accum I understand the intent of this code, but I am having a hard time understanding the implementation, specifically the combination of 'fix', 'flip', and 'interate'. I looked up 'fix' and I'm unsure how one can call 'flip' on a function that takes one argument. > To use this, you provide an "iteratee", a function which takes an > accumulator and a line from the file, and returns a new accumulator > embedded in an Either. Using the Left branch causes immediate > termination of the enumeration. For example, to search for the first > occurrence of each of a set of email headers: > >> getHeaders :: S.Set String -> FilePath -> IO (S.Set String, M.Map String >> String) >> getHeaders hdrs file = enumLines findHdrs (hdrs,M.empty) file where >> findHdrs accum@(wanted,found) line = >> if null line >> then Left accum >> else >> case headerLine line of >> Nothing -> Right accum >> Just hdr -> >> case findDelete hdr wanted of >> Nothing -> Right accum >> Just wanted -> >> let accum = (wanted, M.insert hdr line found) in >> if S.null wanted >> then Left accum >> else Right accum >> >> headerLine :: String -> Maybe String >> headerLine (':':xs) = Just [] >> headerLine (x:xs) = fmap (x:) (headerLine xs) >> headerLine [] = Nothing >> >> findDelete :: Ord a => a -> S.Set a -> Maybe (S.Set a) >> findDelete e s = if S.member e s >> then Just (S.delete e s) >> else Nothing > > It's a bit of a case-analysis nightmare, but when comparing this to > previous approaches, note that file traversal and processing are cleanly > separated, file handle closure is guaranteed to be timely, file > traversal stops as soon as all the required headers have been found, > memory usage is minimised. Very nice. I like the clean separation, but as you say, its one ugly bit of code compared to my original code, although much more elegant no doubt. > I hope that helps. Very much so. Thank you for you help. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: Lazy IO and closing of file handles
"Matthew Brecknell" <[EMAIL PROTECTED]> writes: > So here's a test. I don't have any big maildirs handy, so this is based > on the simple exercise of printing the first line of each of a large > number of files. First, the preamble. > >> import Control.Exception (bracket) >> import System.Environment >> import System.IO > >> main = do >> t:n:fs <- getArgs >> ([test0,test1,test2,test3] !! read t) (take (read n) $ cycle fs) > > [snip] Thank you for summarizing the approaches presented by others. As a Haskell newbie, there seems to be quite a few esoteric concepts to conquer. Your concrete examples were helpful in my understanding of the ramifications associated with the various approaches. After reading the various threads you cited, I decided to avoid lazy IO altogether. By using 'readFile' without forcing the strict evaluation of my parser, I inadvertently relinquished control of the resource management--closing of the file handles was left to the GC. And although I could have used 'seq' to address the issue, why bother fixing a problem that could have been avoided altogther by using strict IO. With that said, I added the following function to my program and then replaced the invocation of 'readFile' with it: readEmailHeaders :: FilePath -> IO String readEmailHeaders file = bracket (openFile file ReadMode) (hClose) (headers []) where headers acc h = do line <- hGetLine h case line of -- Stop reading file once we hit the empty separator -- line, no need to read the rest of the file (body). "" -> return . concat . reverse $ acc _ -> headers ("\n":line:acc) h I'm not sure if this is the best implementation, but the speed is comparable to the lazy IO version without the annoying defect of running out of file handles. I also tried an implementation using 'hGetChar' but that was much slower. I attempted to read Oleg's fold-stream implementation [1] as this sounds quite appealing to me, but I was completely overwhelmed, especially with all of the various type signatures used. It would be great if one of the regular Haskell bloggers (Tom Moertel are you reading this?) might write a blog entry or two interpreting his implementation for those of us starting out in Haskell perhaps by starting out with a non-polymorphic version so as to emphasize the approach. Thanks, Pete [1] http://okmij.org/ftp/Haskell/fold-stream.lhs ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: Lazy IO and closing of file handles
Quoth Pete Kazmier, nevermore, > the same error regarding max open files. Incidentally, the lazy > bytestring version of my program was by far the fastest and used the > least amount of memory, but it still crapped out regarding max open > files. I've tried the approach you appear to be using and it can be tricky to predict how the laziness will interact with the list of actions. For example, I tried to download a temporary file, read a bit of data out of it and then download another one. I thought I would save thinking and use the same file name for each download: /tmp/feed.xml. What happened was that it downloaded them all in rapid succession, over-writing each one with the next and not actually reading the data until the end. So I ended up parsing N identical copies of the final file, instead of one of each. You need to refactor how you map the functions so that fewer whole lists are passed around. I'd guess that (1) is being executed in its entirety before being passed to (2), but it's not until (2) that the file data is actually used. > main = > getArgs >>= > mapM fileContentsOfDirectory >>= -- (1) > mapM_ print . threadEmails . map parseEmail . concat -- (2) This means there are a lot of files sitting open doing nothing. I've had a lot of success by recreating this as: > main = > getArgs >>= > mapM_ readAndPrint > where readAndPrint = fileContentsOfDirectory >>= print -- etc. It may seem semantically identical but it sometimes makes a difference when things actually happen. -- Dougal Stanton ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: Lazy IO and closing of file handles
pete-expires-20070513: > [EMAIL PROTECTED] (Donald Bruce Stewart) writes: > > > pete-expires-20070513: > >> When using readFile to process a large number of files, I am exceeding > >> the resource limits for the maximum number of open file descriptors on > >> my system. How can I enhance my program to deal with this situation > >> without making significant changes? > > > > Read in data strictly, and there are two obvious ways to do that: > > > > -- Via strings: > > > > readFileStrict f = do > > s <- readFile f > > length s `seq` return s > > > > -- Via ByteStrings > > readFileStrict = Data.ByteString.readFile > > readFileStrictString = liftM Data.ByteString.unpack > > Data.ByteString.readFile > > > > If you're reading more than say, 100k of data, I'd use strict > > ByteStrings without hesitation. More than 10M, and I'd use lazy > > bytestrings. > > Correct me if I'm wrong, but isn't this exactly what I wanted to > avoid? Reading the entire file into memory? In my previous email, I > was trying to state that I wanted to lazily read the file because some > of the files are quite large and there is no reason to read beyond the > small set of headers. If I read the entire file into memory, this > design goal is no longer met. > > Nevertheless, I was benchmarking with ByteStrings (both lazy and > strict), and in both cases, the ByteString versions of readFile yield > the same error regarding max open files. Incidentally, the lazy > bytestring version of my program was by far the fastest and used the > least amount of memory, but it still crapped out regarding max open > files. > > So I'm back to square one. Any other ideas? Hmm. Ok. So we need to have more hClose's happen somehow. Can you process files one at a time? -- Don ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: Lazy IO and closing of file handles
[EMAIL PROTECTED] (Donald Bruce Stewart) writes: > pete-expires-20070513: >> When using readFile to process a large number of files, I am exceeding >> the resource limits for the maximum number of open file descriptors on >> my system. How can I enhance my program to deal with this situation >> without making significant changes? > > Read in data strictly, and there are two obvious ways to do that: > > -- Via strings: > > readFileStrict f = do > s <- readFile f > length s `seq` return s > > -- Via ByteStrings > readFileStrict = Data.ByteString.readFile > readFileStrictString = liftM Data.ByteString.unpack > Data.ByteString.readFile > > If you're reading more than say, 100k of data, I'd use strict > ByteStrings without hesitation. More than 10M, and I'd use lazy > bytestrings. Correct me if I'm wrong, but isn't this exactly what I wanted to avoid? Reading the entire file into memory? In my previous email, I was trying to state that I wanted to lazily read the file because some of the files are quite large and there is no reason to read beyond the small set of headers. If I read the entire file into memory, this design goal is no longer met. Nevertheless, I was benchmarking with ByteStrings (both lazy and strict), and in both cases, the ByteString versions of readFile yield the same error regarding max open files. Incidentally, the lazy bytestring version of my program was by far the fastest and used the least amount of memory, but it still crapped out regarding max open files. So I'm back to square one. Any other ideas? Thanks, Pete ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe