RE: Working character by character in Haskell
> "Simon Marlow" <[EMAIL PROTECTED]> writes: > > To really match the C program, you need to use IOExts.hGetBuf and > > IOExts.hPutBuf, and do the operations on raw characters in memory. > > Using a UArray of Word8 would be better, but there aren't any > > operations to do IO to/from a UArray yet (actually I've written > > these, but they aren't in the tree yet). > > So why don't getContents / putStr / etc. deforest cleanly to calls to > hGetBuf and hPutBuf? I'm genuinely curious; my own experience in this > direction is "The engineering is challenging". These functions are so > commonly used, though---and they're vastly easier to use, actually > portable, etc. The effort would surely be repaid. I think you're right, in that the engineering would be challenging. You could conceivably factor getContents into two functions getBlocks :: IO [Block] and a pure function to unpack the blocks into characters, which would lend itself to deforestation, but that would conflict with our attempts to re-use buffers when possible. But the gains might outweigh the slight performance loss from not re-using buffers as much - it's an interesting idea, anyway. Cheers, Simon ___ Glasgow-haskell-users mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
RE: Working character by character in Haskell
"Simon Marlow" <[EMAIL PROTECTED]> writes: > To really match the C program, you need to use IOExts.hGetBuf and > IOExts.hPutBuf, and do the operations on raw characters in memory. > Using a UArray of Word8 would be better, but there aren't any > operations to do IO to/from a UArray yet (actually I've written > these, but they aren't in the tree yet). So why don't getContents / putStr / etc. deforest cleanly to calls to hGetBuf and hPutBuf? I'm genuinely curious; my own experience in this direction is "The engineering is challenging". These functions are so commonly used, though---and they're vastly easier to use, actually portable, etc. The effort would surely be repaid. Plus, I'm curious to hear about any challenges which may be involved in pulling this off. :-) If it's genuinely tricky, the tricks are likely to be generally useful for other stream-ish things. -Jan-Willem Maessen Eager Haskell Project [EMAIL PROTECTED] ___ Glasgow-haskell-users mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Working character by character in Haskell
"Simon Marlow" <[EMAIL PROTECTED]> writes: > Well, in Haskell each character of the string takes 20 bytes: 12 bytes > for the list cell, and 8 bytes for the character itself Why does a list cell consume as much as 12 bytes? Two pointers (data and next) and a 32-bit tag field, perhaps? And a 64-bit minimum allocation quatnity, accounting for the 8-byte character? Isn't it possible to optimize this, e.g. by embedding small data directly in the cons cell? 21 bits for a Unicode character should leave enough bits for tagging, shouldn't it? (Since I'm halfway planning to use Haskell next Spring to process long lists of data with a small set of values (for instance: data Base = A | C | G | T ) I'm curious about the performance.) -kzm -- If I haven't seen further, it is by standing in the footprints of giants ___ Glasgow-haskell-users mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
RE: Working character by character in Haskell
> Humn... I agree with both of you, Albert and Tom. I started > it from the > beginning, using map and don't using reverse anymore. But the > C program is > still 7x faster than the Haskell one. Here is the code of the Haskell > program: > > main :: IO () > main = do > bmFile <- openFileEx "in.txt" (BinaryMode ReadMode) > bmString <- hGetContents bmFile > writeFile "out.txt" (map inc bmString) > hClose bmFile > > inc :: Char -> Char > inc a = toEnum ((fromEnum a) + 1) Well, in Haskell each character of the string takes 20 bytes: 12 bytes for the list cell, and 8 bytes for the character itself - the memory used by the character will be recovered at GC time though, as long as the character is < chr 256. The map operation also allocates a further 28 bytes per character: list cell + thunk(8) + character, assuming the inc operation is suitably optimised not to do any extra allocation. That's a total of 48 bytes per character. The C code, by comparison, doesn't do any dynamic allocation at all. To really match the C program, you need to use IOExts.hGetBuf and IOExts.hPutBuf, and do the operations on raw characters in memory. Using a UArray of Word8 would be better, but there aren't any operations to do IO to/from a UArray yet (actually I've written these, but they aren't in the tree yet). You should find that the IO library in GHC 5.02 is slightly faster than the one in 5.00.2. Anyway, I hope all this helps to explain why the Haskell version is so slow. Cheers, Simon ___ Glasgow-haskell-users mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Working character by character in Haskell
Humn... I agree with both of you, Albert and Tom. I started it from the beginning, using map and don't using reverse anymore. But the C program is still 7x faster than the Haskell one. Here is the code of the Haskell program: main :: IO () main = do bmFile <- openFileEx "in.txt" (BinaryMode ReadMode) bmString <- hGetContents bmFile writeFile "out.txt" (map inc bmString) hClose bmFile inc :: Char -> Char inc a = toEnum ((fromEnum a) + 1) And now the C program: #include void main() { FILE *in, *out; char ch; in = fopen("in.txt","rb"); out = fopen("out.txt","wb"); while ( (ch=getc(in)) != EOF) putc(ch + 1,out); fclose(out); fclose(in); } As you noticed, the main idea here is to increment the character by "one". Is the Haskell program now a nice translation of the C program? If it is (I think so...) why is the C program so much faster than it? Thanks again, -- Andre - Original Message - From: Albert Lai <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Thursday, October 18, 2001 9:42 PM Subject: Re: Working character by character in Haskell > "Andre W B Furtado" <[EMAIL PROTECTED]> writes: > > [...] > > > For example, suppose function doSomeStuffWith returns its own parameter. > > Using a 1.5MB file in this case, the Haskell program ends in almost 5 > > seconds, while the C program ends in less than 0.5 seconds... Is my Haskell > > program too bad implemented? (I'm using GHC 4.08-1 under a Windows 98 > > platform.) > > I indeed think that your Haskell program is inefficient and is not a > translation of your C program. It reverses a huge string, which takes > not only execution time but also garbage collection time and memory, > which entails even more time for OS overhead and swapping out other > things in the physical memory. > > (Many programmers complain "my linear-time algorithm takes just 1 > second on a 100M list, so why is it taking 5 seconds on a 200M list?" > They forget that because of issues such as cache locality and virtual > memory, their computers do not scale.) > > I am wondering that if doSomeStuffWith is pure-functional, why are you > writing and using copyFile instead of just using map? I mean: > > main :: IO () > main = do > bmFile <- openFileEx "in.txt" (BinaryMode ReadMode) > bmString <- hGetContents bmFile > writeFile "out.txt" (map doSomestuffWith bmString) > hClose bmFile > > Because both hGetContents and map are lazy and use O(1) memory, this > program behaves exactly as your C program plus occasional garbage > collections that don't hurt too much. In partcular, the OS will see > no difference (although the cache does). > > ___ > Glasgow-haskell-users mailing list > [EMAIL PROTECTED] > http://www.haskell.org/mailman/listinfo/glasgow-haskell-users > ___ Glasgow-haskell-users mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Working character by character in Haskell
"Andre W B Furtado" <[EMAIL PROTECTED]> writes: [...] > For example, suppose function doSomeStuffWith returns its own parameter. > Using a 1.5MB file in this case, the Haskell program ends in almost 5 > seconds, while the C program ends in less than 0.5 seconds... Is my Haskell > program too bad implemented? (I'm using GHC 4.08-1 under a Windows 98 > platform.) I indeed think that your Haskell program is inefficient and is not a translation of your C program. It reverses a huge string, which takes not only execution time but also garbage collection time and memory, which entails even more time for OS overhead and swapping out other things in the physical memory. (Many programmers complain "my linear-time algorithm takes just 1 second on a 100M list, so why is it taking 5 seconds on a 200M list?" They forget that because of issues such as cache locality and virtual memory, their computers do not scale.) I am wondering that if doSomeStuffWith is pure-functional, why are you writing and using copyFile instead of just using map? I mean: main :: IO () main = do bmFile <- openFileEx "in.txt" (BinaryMode ReadMode) bmString <- hGetContents bmFile writeFile "out.txt" (map doSomestuffWith bmString) hClose bmFile Because both hGetContents and map are lazy and use O(1) memory, this program behaves exactly as your C program plus occasional garbage collections that don't hurt too much. In partcular, the OS will see no difference (although the cache does). ___ Glasgow-haskell-users mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Working character by character in Haskell
I'm trying to create a fast program in Haskell that reads the content of a file, do some stuff with its characters (one by one) an then save the final result (the modified characters) in another file. The fastest program I've developed so far is: main :: IO () main = do bmFile <- openFileEx "in.txt" (BinaryMode ReadMode) bmString <- hGetContents bmFile str <- copyFile bmString [] writeFile "out.txt" str hClose bmFile copyFile :: String -> String -> IO String copyFile [] s = return (reverse s) copyFile (a:as) s = copyFile as ( (doSomeStuffWith a):s) What I can't understand is why a similar program, written in C, is much faster than the Haskell program. Here you have the C program: void main() { FILE *in, *out; char ch; in = fopen("in.txt","rb"); out = fopen("out.txt","wb"); while ( (ch=getc(in)) != EOF) putc(doSomeStuffWith(ch),out); fclose(out); fclose(in); } For example, suppose function doSomeStuffWith returns its own parameter. Using a 1.5MB file in this case, the Haskell program ends in almost 5 seconds, while the C program ends in less than 0.5 seconds... Is my Haskell program too bad implemented? (I'm using GHC 4.08-1 under a Windows 98 platform.) Thanks, -- Andre ___ Glasgow-haskell-users mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/glasgow-haskell-users