RE: [Haskell-cafe] Re: Processing of large files

2004-11-08 Thread Simon Marlow
John Goerzen writes: Yes it does. If you don't set block buffering, GHC will call read() separately for *every* single character. (I've straced stuff!) This is a huge performance penalty for large files. It's a lot more efficient if you set block buffering in your input, even if

Re: [Haskell-cafe] Re: Processing of large files

2004-11-04 Thread Tomasz Zielonka
On Thu, Nov 04, 2004 at 10:37:25AM +0300, Alexander Kogan wrote: Hi! How about this? let a' = addToFM_C (+) a x 1 in maybe () (`seq` ()) (lookupFM a' x) `seq` a' It worked for me. Thank you. It works for me too, but I don't understand why and how ;-)) Could you explain?

Re: [Haskell-cafe] Re: Processing of large files

2004-11-04 Thread Ketil Malde
Tomasz Zielonka [EMAIL PROTECTED] writes: Thank you. It works for me too, but I don't understand why and how ;-)) Could you explain? I'm a bit puzzled by this discussion, as strictness of FiniteMaps have rarely been (perceived to be?) a problem for me. Scott's solution forces (lookupFM a' x)

Re: [Haskell-cafe] Re: Processing of large files

2004-11-04 Thread Tomasz Zielonka
On Thu, Nov 04, 2004 at 09:24:30AM +0100, Ketil Malde wrote: Tomasz Zielonka [EMAIL PROTECTED] writes: Thank you. It works for me too, but I don't understand why and how ;-)) Could you explain? I'm a bit puzzled by this discussion, as strictness of FiniteMaps have rarely been (perceived

Re: [Haskell-cafe] Re: Processing of large files

2004-11-04 Thread Alexander Kogan
Hi! Scott's solution forces (lookupFM a' x) to head normal form (or is it weak head normal form). This means that the value of type (Maybe v) is evaluated as much that it is known whether it is Nothing, Just _ or _|_ (bottom). This is probably enough to evaluate the path from FiniteMap's

Re: [Haskell-cafe] Re: Processing of large files

2004-11-04 Thread Tomasz Zielonka
On Thu, Nov 04, 2004 at 08:03:56AM +0100, Tomasz Zielonka wrote: How about this? let a' = addToFM_C (+) a x 1 in maybe () (`seq` ()) (lookupFM a' x) `seq` a' It worked for me. Of course, it is quite inefficient. If you care about constant factors, better use a FiniteMap with

Re: [Haskell-cafe] Re: Processing of large files

2004-11-04 Thread Tomasz Zielonka
On Thu, Nov 04, 2004 at 11:44:32AM +0300, Alexander Kogan wrote: Ok. I have 2 questions about this: 1. This means seq function evaluates only 'top' of argument, so when I pass, for example, list as argument, it will be evaluated only to [unevaluated, unevaluated, ...]? Am I right? Almost.

Re: [Haskell-cafe] Re: Processing of large files

2004-11-04 Thread Alexander Kogan
Hi! 4 2004 11:54 Tomasz Zielonka (a): [skip] Note that the things marked 'unevaluated' above, could be already evaluated by some other computation. Ok, I understand. 2. If so, is there method to _completely_ evaluate expression? Most of the time you don't need it. But if you think you

Re: [Haskell-cafe] Re: Processing of large files

2004-11-03 Thread Alexander Kogan
Hi! merge' a x = let r = addToFM_C (+) a x 1 in r `seq` r but it doesn't help. == an anecdote A quick note, x `seq` x is always exactly equivalant to x. the reason being that your seq would never be called to force x unless x was needed anyway. ;-)) Ok. merge' a x = (addToFM (+) $! a)

Re: [Haskell-cafe] Re: Processing of large files

2004-11-03 Thread Scott Turner
On 2004 November 03 Wednesday 09:51, Alexander Kogan wrote: merge' a x = (addToFM (+) $! a) x 1 is not strict. Can I do something to make FiniteMap strict? Or the only way is to make my own StrictFiniteMap? You can replace addToFM_C (+) a x 1 with let a' = addToFM_C (+) a x 1 in

Re: [Haskell-cafe] Re: Processing of large files

2004-11-03 Thread Hal Daume III
you can steal my version of finitemap: http://www.isi.edu/~hdaume/haskell/FiniteMap.hs which is based on the GHC version, but supports strict operations. the strict version of function f is called f'. i've also added some ops i thought were missing. On Wed, 3 Nov 2004, Scott Turner wrote:

Re: [Haskell-cafe] Re: Processing of large files

2004-11-03 Thread Alexander Kogan
Hi! You can replace addToFM_C (+) a x 1 with let a' = addToFM_C (+) a x 1 in lookupFM a' x `seq` a' it is not strict also. or you can generalize that into your own strict version of addToFM_C. It's a little ugly, but probably gets the job done. Ohh.. -- Alexander

Re: [Haskell-cafe] Re: Processing of large files

2004-11-03 Thread Alexander Kogan
Hi! you can steal my version of finitemap: http://www.isi.edu/~hdaume/haskell/FiniteMap.hs which is based on the GHC version, but supports strict operations. the strict version of function f is called f'. i've also added some ops i thought were missing. Thank you! I'll try it. --

Re: [Haskell-cafe] Re: Processing of large files

2004-11-03 Thread Tomasz Zielonka
On Thu, Nov 04, 2004 at 09:48:47AM +0300, Alexander Kogan wrote: Hi! You can replace addToFM_C (+) a x 1 with let a' = addToFM_C (+) a x 1 in lookupFM a' x `seq` a' it is not strict also. How about this? let a' = addToFM_C (+) a x 1 in maybe () (`seq` ())

Re: [Haskell-cafe] Re: Processing of large files

2004-11-03 Thread Alexander Kogan
Hi! How about this? let a' = addToFM_C (+) a x 1 in maybe () (`seq` ()) (lookupFM a' x) `seq` a' It worked for me. Thank you. It works for me too, but I don't understand why and how ;-)) Could you explain? -- Alexander Kogan Institute of Applied Physics Russian Academy of

RE: [Haskell-cafe] Re: Processing of large files

2004-11-02 Thread Bayley, Alistair
From: Alexander Kogan [mailto:[EMAIL PROTECTED] But I wonder why the very useful function foldl' as I define it is not included into Prelude? I think, many people work with large lists or streams... It's in Data.List:

Re: [Haskell-cafe] Re: Processing of large files

2004-11-02 Thread Alexander Kogan
Hi! But I wonder why the very useful function foldl' as I define it is not included into Prelude? I think, many people work with large lists or streams... It's in Data.List: http://www.haskell.org/ghc/docs/latest/html/libraries/base/Data.List.html# v %3Afoldl' Hm... I haven't seen

Re: [Haskell-cafe] Re: Processing of large files

2004-11-02 Thread Alexander Kogan
Hi! Now I try to use FiniteMap to speed up the processing. merge' :: FiniteMap String Integer - String - FiniteMap String Integer merge' a x = addToFM_C (+) a x 1 parse' :: [String] - [(String, Integer)] parse' x = fmToList $ foldl' merge' emptyFM x Where should I add `seq` to make FiniteMap

Re: [Haskell-cafe] Re: Processing of large files

2004-11-02 Thread Ketil Malde
Alexander Kogan [EMAIL PROTECTED] writes: Thanks! I did the following: For extra credit, you can use a FiniteMap to store the words and counts. They have, as you probably know, log n access times, and should give you a substantial performance boost. :-) (I have a feeling FMs are slow when the

Re: [Haskell-cafe] Re: Processing of large files

2004-11-02 Thread John Meacham
On Tue, Nov 02, 2004 at 11:53:52AM +0300, Alexander Kogan wrote: Where should I add `seq` to make FiniteMap strict? I tried merge' a x = let r = addToFM_C (+) a x 1 in r `seq` r but it doesn't help. == an anecdote A quick note, x `seq` x is always exactly equivalant to x. the reason being

Re: [Haskell-cafe] Re: Processing of large files

2004-11-01 Thread Greg Buchholz
Peter Simons wrote: Read and process the file in blocks: http://cryp.to/blockio/docs/tutorial.html On a related note, I've found the collection of papers below to be helpful in understanding different methods of handling files in Haskell.

Re: [Haskell-cafe] Re: Processing of large files

2004-11-01 Thread Alexander N. Kogan
Hi! Peter Simons wrote: Read and process the file in blocks: http://cryp.to/blockio/docs/tutorial.html On a related note, I've found the collection of papers below to be helpful in understanding different methods of handling files in Haskell.

Re: [Haskell-cafe] Re: Processing of large files

2004-11-01 Thread Scott Turner
On 2004 November 01 Monday 16:48, Alexander N. Kogan wrote: Sorry, I don't understand. I thought the problem is in laziness - You're correct. The problem is laziness rather than I/O. my list of tuples becomes (qqq, 1+1+1+.) etc and my program reads whole file before it starts processing.

Re: [Haskell-cafe] Re: Processing of large files

2004-11-01 Thread Alexander Kogan
Hi! The list of tuples _does_ need to be strict. Beyond that, as Ketil Malde said, you should not use foldl -- instead, foldl' is the best version to use when you are recalculating the result every time a new list item is processed. Thanks! I did the following: merge [] x = [(x,1)] merge