Re: [Haskell-cafe] MD5 performance optimizations
Woo! Salvatore kindly sent me a Darcs patch, and applying it does indeed make it run faster. Yay! [Note that -fvia-c works just fine for me. It doesn't appear to produce a huge speed difference, but it compiles just fine.] Thanks for the tips, guys! :-D The changes are in the online Darcs repo. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] MD5 performance optimizations
2008/5/21 Andrew Coppin [EMAIL PROTECTED]: Woo! Salvatore kindly sent me a Darcs patch, and applying it does indeed make it run faster. Yay! Hi Andrew, I'm glad that -fvia-c works for you: maybe it's a Mac OS X specific bug? Anyway, did you compile with -fvia-c -optc-O3? I expect register-intensive code like this to be at least 20% faster with gcc. Salvatore ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] MD5 performance optimizations, and GHC -via-C producing segfaulting binary
2008/5/17 Andrew Coppin [EMAIL PROTECTED]: Hi folks. OK, try this: darcs get http://darcs.orphi.me.uk/MyMD5 cd MyMD5 ghc -O2 --make md5sum md5sum some large filename I've got some time to take a look at the code. It's very nice, readable and declarative, but obviously not optimized for raw speed. There're a few things to do to have a speed-up of 2x, without going low-level. 1) There's a lazyness leak in Pad.hs. The sum in: else make_block_bs bs0 : work (n + block_size_bytes) bs1 is not strict. With a very large file (e.g. 100 mb) it means stack overflow. [roxas:~/Desktop/test2/MyMD5] kirby% ghc -O2 --make md5sum.hs [roxas:~/Desktop/test2/MyMD5] kirby% time ./md5sum ../../jboss-4.2.2.GA.zip Stack space overflow: current size 8388608 bytes. Use `+RTS -Ksize' to increase it. To solve it just add a bang (!) before the n parameter: work !n bs = You've got to add {-# LANGUAGE BangPatterns #-} at the top of the file too. There're solutions that don't imply BangPatterns, and use only seq, but I like bang patterns! Now it works: [roxas:~/Desktop/test2/MyMD5] kirby% time ./md5sum ../../jboss-4.2.2.GA.zip E6542028D538BAEC180A96A5D1E6EC3A 11.958u 0.210s 0:12.19 99.7%0+0k 0+2io 0pf+0w 2) make_block_bs is sub-optimal, and very critical to performance. I decided to use Data.Binary for it (it's also more readable, and you get rid of the unsafeWrite): import Data.Binary import Data.Binary.Get import Control.Monad // ... instance Binary Block where put _ = undefined get = do xs - replicateM 16 getWord32le return $ Block $ listArray (0, 15) xs make_block_bs :: B.ByteString - Block make_block_bs = decode Now we are getting faster: [roxas:~/Desktop/test2/MyMD5] kirby% time ./md5sum ../../jboss-4.2.2.GA.zip E6542028D538BAEC180A96A5D1E6EC3A 8.063u 0.174s 0:08.23 100.0%0+0k 0+0io 0pf+0w 3) You are doing a lot of access to fields of a strict data type (State). You can at least ask the compiler to help you a bit with -funbox-strict-fields. Now we have: [roxas:~/Desktop/test2/MyMD5] kirby% time ./md5sum ../../jboss-4.2.2.GA.zip E6542028D538BAEC180A96A5D1E6EC3A 6.780u 0.133s 0:06.91 100.0%0+0k 0+1io 0pf+0w We have got a good, nearly 2x, speed-up with very few optimizations, and we run in a very small constant amount of memory. Probably compiling with -fvia-C could help even more, but strangely: [roxas:~/Desktop/test2/MyMD5] kirby% ghc -fvia-C -funbox-strict-fields -O2 --make md5sum.hs [roxas:~/Desktop/test2/MyMD5] kirby% time ./md5sum ../../jboss-4.2.2.GA.zip Segmentation fault Is this supposed to happen? There're no unsafe functions or imports used in the program. Maybe a bug in GHC? Salvatore ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] MD5 performance optimizations, and GHC -via-C producing segfaulting binary
Thanks for taking a look at this. kirby81: 2008/5/17 Andrew Coppin [EMAIL PROTECTED]: Hi folks. OK, try this: darcs get http://darcs.orphi.me.uk/MyMD5 cd MyMD5 ghc -O2 --make md5sum md5sum some large filename I've got some time to take a look at the code. It's very nice, readable and declarative, but obviously not optimized for raw speed. There're a few things to do to have a speed-up of 2x, without going low-level. 1) There's a lazyness leak in Pad.hs. The sum in: else make_block_bs bs0 : work (n + block_size_bytes) bs1 is not strict. With a very large file (e.g. 100 mb) it means stack overflow. [roxas:~/Desktop/test2/MyMD5] kirby% ghc -O2 --make md5sum.hs [roxas:~/Desktop/test2/MyMD5] kirby% time ./md5sum ../../jboss-4.2.2.GA.zip Stack space overflow: current size 8388608 bytes. Use `+RTS -Ksize' to increase it. To solve it just add a bang (!) before the n parameter: work !n bs = You've got to add {-# LANGUAGE BangPatterns #-} at the top of the file too. There're solutions that don't imply BangPatterns, and use only seq, but I like bang patterns! Now it works: [roxas:~/Desktop/test2/MyMD5] kirby% time ./md5sum ../../jboss-4.2.2.GA.zip E6542028D538BAEC180A96A5D1E6EC3A 11.958u 0.210s 0:12.19 99.7% 0+0k 0+2io 0pf+0w Good idea. I saw some lazy left folds in there too, which look suspicious. 2) make_block_bs is sub-optimal, and very critical to performance. I decided to use Data.Binary for it (it's also more readable, and you get rid of the unsafeWrite): import Data.Binary import Data.Binary.Get import Control.Monad // ... instance Binary Block where put _ = undefined get = do xs - replicateM 16 getWord32le return $ Block $ listArray (0, 15) xs make_block_bs :: B.ByteString - Block make_block_bs = decode Now we are getting faster: [roxas:~/Desktop/test2/MyMD5] kirby% time ./md5sum ../../jboss-4.2.2.GA.zip E6542028D538BAEC180A96A5D1E6EC3A 8.063u 0.174s 0:08.23 100.0% 0+0k 0+0io 0pf+0w Definitely a good idea. 3) You are doing a lot of access to fields of a strict data type (State). You can at least ask the compiler to help you a bit with -funbox-strict-fields. Now we have: [roxas:~/Desktop/test2/MyMD5] kirby% time ./md5sum ../../jboss-4.2.2.GA.zip E6542028D538BAEC180A96A5D1E6EC3A 6.780u 0.133s 0:06.91 100.0% 0+0k 0+1io 0pf+0w We have got a good, nearly 2x, speed-up with very few optimizations, and we run in a very small constant amount of memory. Great. Probably compiling with -fvia-C could help even more, but strangely: [roxas:~/Desktop/test2/MyMD5] kirby% ghc -fvia-C -funbox-strict-fields -O2 --make md5sum.hs [roxas:~/Desktop/test2/MyMD5] kirby% time ./md5sum ../../jboss-4.2.2.GA.zip Segmentation fault Is this supposed to happen? There're no unsafe functions or imports used in the program. Maybe a bug in GHC? That could be a gcc bug, in fact. Can you provide the final source? Does changing the gcc optimisation level help? (i.e. -fvia-C -optc-O2 ) Andrew, I hope you look carefully at the suggestions here, and use them to improve the code you were working on. -- Don ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] MD5 performance optimizations
Don Stewart wrote: Andrew, I hope you look carefully at the suggestions here, and use them to improve the code you were working on. Indeed, this is just the sort of stuff I was hoping to get... [Top marks for spotting the laziness leak in pad BTW! I totally missed that.] While I'm here, can I just clarify *exactly* what -funbox-strict-fields actually does? I was under the impression that if you have a single-constructor type with all strict fields, GHC would automatically try to avoid boxing and unboxing it. However, the existence of this flag seems to indicate that it only performs this particular optimisation if you ask for it. But on the third hand, turning this flag on or off makes no measurable performance difference. (At least, in the tests I did. Maybe that's because it was swamped by all the other inefficiencies mentioned?) Some clarification? ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] MD5 performance optimizations
andrewcoppin: Don Stewart wrote: Andrew, I hope you look carefully at the suggestions here, and use them to improve the code you were working on. Indeed, this is just the sort of stuff I was hoping to get... [Top marks for spotting the laziness leak in pad BTW! I totally missed that.] While I'm here, can I just clarify *exactly* what -funbox-strict-fields actually does? If in doubt, ask the user's guide: http://www.haskell.org/ghc/docs/latest/html/users_guide/pragmas.html#unpack-pragma -- Don ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] MD5 performance optimizations, and GHC -via-C producing segfaulting binary
2008/5/20 Don Stewart [EMAIL PROTECTED]: Probably compiling with -fvia-C could help even more, but strangely: [roxas:~/Desktop/test2/MyMD5] kirby% ghc -fvia-C -funbox-strict-fields -O2 --make md5sum.hs [roxas:~/Desktop/test2/MyMD5] kirby% time ./md5sum ../../jboss-4.2.2.GA.zip Segmentation fault Is this supposed to happen? There're no unsafe functions or imports used in the program. Maybe a bug in GHC? That could be a gcc bug, in fact. Can you provide the final source? Does changing the gcc optimisation level help? (i.e. -fvia-C -optc-O2 ) I tried with and without gcc optimizations, but it always segfaults. It happened to me even before my optimizations. I've got GHC 6.8.2 on Mac OS X 10.5.2, GCC 4.0.1. -fvia-C works well for other libraries and applications, so I don't think it's a problem of my specific setting. Salvatore MyMD5.tar.gz Description: GNU Zip compressed data ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] MD5 performance optimizations
Don Stewart wrote: andrewcoppin: While I'm here, can I just clarify *exactly* what -funbox-strict-fields actually does? If in doubt, ask the user's guide: http://www.haskell.org/ghc/docs/latest/html/users_guide/pragmas.html#unpack-pragma ...so making a field strict causes it to be automatically forced when the constructor is forced, but doesn't actually unbox it? So there's still a level of indirection, and no register allocation? ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe