Re: [Haskell-cafe] MD5 performance optimizations

2008-05-21 Thread Andrew Coppin

Woo!

Salvatore kindly sent me a Darcs patch, and applying it does indeed make it run 
faster. Yay!

[Note that -fvia-c works just fine for me. It doesn't appear to produce a huge 
speed difference, but it compiles just fine.]

Thanks for the tips, guys! :-D The changes are in the online Darcs repo.



___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] MD5 performance optimizations

2008-05-21 Thread Salvatore Insalaco
2008/5/21 Andrew Coppin [EMAIL PROTECTED]:
 Woo!

 Salvatore kindly sent me a Darcs patch, and applying it does indeed make it
 run faster. Yay!

Hi Andrew,
I'm glad that -fvia-c works for you: maybe it's a Mac OS X specific bug?

Anyway, did you compile with -fvia-c -optc-O3? I expect
register-intensive code like this to be at least 20% faster with gcc.

Salvatore
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] MD5 performance optimizations, and GHC -via-C producing segfaulting binary

2008-05-20 Thread Salvatore Insalaco
2008/5/17 Andrew Coppin [EMAIL PROTECTED]:
 Hi folks.

 OK, try this:

  darcs get http://darcs.orphi.me.uk/MyMD5
  cd MyMD5
  ghc -O2 --make md5sum
  md5sum some large filename

I've got some time to take a look at the code. It's very nice,
readable and declarative, but obviously not optimized for raw speed.
There're a few things to do to have a speed-up of 2x, without going low-level.

1) There's a lazyness leak in Pad.hs. The sum in:
  else make_block_bs bs0 : work (n + block_size_bytes) bs1
is not strict. With a very large file (e.g. 100 mb) it means stack overflow.

[roxas:~/Desktop/test2/MyMD5] kirby% ghc -O2 --make md5sum.hs

[roxas:~/Desktop/test2/MyMD5] kirby% time ./md5sum ../../jboss-4.2.2.GA.zip
Stack space overflow: current size 8388608 bytes.
Use `+RTS -Ksize' to increase it.

To solve it just add a bang (!) before the n parameter:

work !n bs =

You've got to add {-# LANGUAGE BangPatterns #-} at the top of the file
too. There're solutions that don't imply BangPatterns, and use only
seq, but I like bang patterns!

Now it works:
[roxas:~/Desktop/test2/MyMD5] kirby% time ./md5sum ../../jboss-4.2.2.GA.zip
E6542028D538BAEC180A96A5D1E6EC3A
11.958u 0.210s 0:12.19 99.7%0+0k 0+2io 0pf+0w

2) make_block_bs is sub-optimal, and very critical to performance. I
decided to use Data.Binary for it (it's also more readable, and you
get rid of the unsafeWrite):
import Data.Binary
import Data.Binary.Get
import Control.Monad

// ...

instance Binary Block where
put _ = undefined
get = do
xs - replicateM 16 getWord32le
return $ Block $ listArray (0, 15) xs

make_block_bs :: B.ByteString - Block
make_block_bs = decode

Now we are getting faster:
[roxas:~/Desktop/test2/MyMD5] kirby% time ./md5sum ../../jboss-4.2.2.GA.zip
E6542028D538BAEC180A96A5D1E6EC3A
8.063u 0.174s 0:08.23 100.0%0+0k 0+0io 0pf+0w

3) You are doing a lot of access to fields of a strict data type
(State). You can at least ask the compiler to help you a bit with
-funbox-strict-fields.

Now we have:
[roxas:~/Desktop/test2/MyMD5] kirby% time ./md5sum ../../jboss-4.2.2.GA.zip
E6542028D538BAEC180A96A5D1E6EC3A
6.780u 0.133s 0:06.91 100.0%0+0k 0+1io 0pf+0w

We have got a good, nearly 2x, speed-up with very few optimizations,
and we run in a very small constant amount of memory.

Probably compiling with -fvia-C could help even more, but strangely:

[roxas:~/Desktop/test2/MyMD5] kirby% ghc -fvia-C -funbox-strict-fields
-O2 --make md5sum.hs
[roxas:~/Desktop/test2/MyMD5] kirby% time ./md5sum ../../jboss-4.2.2.GA.zip
Segmentation fault

Is this supposed to happen? There're no unsafe functions or imports
used in the program. Maybe a bug in GHC?

Salvatore
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] MD5 performance optimizations, and GHC -via-C producing segfaulting binary

2008-05-20 Thread Don Stewart
Thanks for taking a look at this.

kirby81:
 2008/5/17 Andrew Coppin [EMAIL PROTECTED]:
  Hi folks.
 
  OK, try this:
 
   darcs get http://darcs.orphi.me.uk/MyMD5
   cd MyMD5
   ghc -O2 --make md5sum
   md5sum some large filename
 
 I've got some time to take a look at the code. It's very nice,
 readable and declarative, but obviously not optimized for raw speed.
 There're a few things to do to have a speed-up of 2x, without going 
 low-level.
 
 1) There's a lazyness leak in Pad.hs. The sum in:
   else make_block_bs bs0 : work (n + block_size_bytes) bs1
 is not strict. With a very large file (e.g. 100 mb) it means stack overflow.
 
 [roxas:~/Desktop/test2/MyMD5] kirby% ghc -O2 --make md5sum.hs
 
 [roxas:~/Desktop/test2/MyMD5] kirby% time ./md5sum ../../jboss-4.2.2.GA.zip
 Stack space overflow: current size 8388608 bytes.
 Use `+RTS -Ksize' to increase it.
 
 To solve it just add a bang (!) before the n parameter:
 
 work !n bs =
 
 You've got to add {-# LANGUAGE BangPatterns #-} at the top of the file
 too. There're solutions that don't imply BangPatterns, and use only
 seq, but I like bang patterns!
 
 Now it works:
 [roxas:~/Desktop/test2/MyMD5] kirby% time ./md5sum ../../jboss-4.2.2.GA.zip
 E6542028D538BAEC180A96A5D1E6EC3A
 11.958u 0.210s 0:12.19 99.7%  0+0k 0+2io 0pf+0w

Good idea. I saw some lazy left folds in there too, which look
suspicious.

 
 2) make_block_bs is sub-optimal, and very critical to performance. I
 decided to use Data.Binary for it (it's also more readable, and you
 get rid of the unsafeWrite):
 import Data.Binary
 import Data.Binary.Get
 import Control.Monad
 
 // ...
 
 instance Binary Block where
 put _ = undefined
 get = do
 xs - replicateM 16 getWord32le
 return $ Block $ listArray (0, 15) xs
 
 make_block_bs :: B.ByteString - Block
 make_block_bs = decode
 
 Now we are getting faster:
 [roxas:~/Desktop/test2/MyMD5] kirby% time ./md5sum ../../jboss-4.2.2.GA.zip
 E6542028D538BAEC180A96A5D1E6EC3A
 8.063u 0.174s 0:08.23 100.0%  0+0k 0+0io 0pf+0w

Definitely a good idea.
  
 3) You are doing a lot of access to fields of a strict data type
 (State). You can at least ask the compiler to help you a bit with
 -funbox-strict-fields.
 
 Now we have:
 [roxas:~/Desktop/test2/MyMD5] kirby% time ./md5sum ../../jboss-4.2.2.GA.zip
 E6542028D538BAEC180A96A5D1E6EC3A
 6.780u 0.133s 0:06.91 100.0%  0+0k 0+1io 0pf+0w
 
 We have got a good, nearly 2x, speed-up with very few optimizations,
 and we run in a very small constant amount of memory.

Great.

 
 Probably compiling with -fvia-C could help even more, but strangely:
 
 [roxas:~/Desktop/test2/MyMD5] kirby% ghc -fvia-C -funbox-strict-fields
 -O2 --make md5sum.hs
 [roxas:~/Desktop/test2/MyMD5] kirby% time ./md5sum ../../jboss-4.2.2.GA.zip
 Segmentation fault
 
 Is this supposed to happen? There're no unsafe functions or imports
 used in the program. Maybe a bug in GHC?

That could be a gcc bug, in fact. Can you provide the final source?
Does changing the gcc optimisation level help? (i.e. -fvia-C -optc-O2 )

Andrew, I hope you look carefully at the suggestions here, and use them
to improve the code you were working on.

-- Don
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] MD5 performance optimizations

2008-05-20 Thread Andrew Coppin

Don Stewart wrote:

Andrew, I hope you look carefully at the suggestions here, and use them
to improve the code you were working on.
  


Indeed, this is just the sort of stuff I was hoping to get... [Top marks 
for spotting the laziness leak in pad BTW! I totally missed that.]


While I'm here, can I just clarify *exactly* what -funbox-strict-fields 
actually does?


I was under the impression that if you have a single-constructor type 
with all strict fields, GHC would automatically try to avoid boxing and 
unboxing it. However, the existence of this flag seems to indicate that 
it only performs this particular optimisation if you ask for it. But on 
the third hand, turning this flag on or off makes no measurable 
performance difference. (At least, in the tests I did. Maybe that's 
because it was swamped by all the other inefficiencies mentioned?) Some 
clarification?


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] MD5 performance optimizations

2008-05-20 Thread Don Stewart
andrewcoppin:
 Don Stewart wrote:
 Andrew, I hope you look carefully at the suggestions here, and use them
 to improve the code you were working on.
   
 
 Indeed, this is just the sort of stuff I was hoping to get... [Top marks 
 for spotting the laziness leak in pad BTW! I totally missed that.]
 
 While I'm here, can I just clarify *exactly* what -funbox-strict-fields 
 actually does?

If in doubt, ask the user's guide:


http://www.haskell.org/ghc/docs/latest/html/users_guide/pragmas.html#unpack-pragma

-- Don
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] MD5 performance optimizations, and GHC -via-C producing segfaulting binary

2008-05-20 Thread Salvatore Insalaco
2008/5/20 Don Stewart [EMAIL PROTECTED]:

 Probably compiling with -fvia-C could help even more, but strangely:

 [roxas:~/Desktop/test2/MyMD5] kirby% ghc -fvia-C -funbox-strict-fields
 -O2 --make md5sum.hs
 [roxas:~/Desktop/test2/MyMD5] kirby% time ./md5sum ../../jboss-4.2.2.GA.zip
 Segmentation fault

 Is this supposed to happen? There're no unsafe functions or imports
 used in the program. Maybe a bug in GHC?

 That could be a gcc bug, in fact. Can you provide the final source?
 Does changing the gcc optimisation level help? (i.e. -fvia-C -optc-O2 )

I tried with and without gcc optimizations, but it always segfaults.
It happened to me even before my optimizations.

I've got GHC 6.8.2 on Mac OS X 10.5.2, GCC 4.0.1. -fvia-C works well
for other libraries and applications, so I don't think it's a problem
of my specific setting.

Salvatore


MyMD5.tar.gz
Description: GNU Zip compressed data
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] MD5 performance optimizations

2008-05-20 Thread Andrew Coppin

Don Stewart wrote:

andrewcoppin:
  
While I'm here, can I just clarify *exactly* what -funbox-strict-fields 
actually does?



If in doubt, ask the user's guide:


http://www.haskell.org/ghc/docs/latest/html/users_guide/pragmas.html#unpack-pragma
  


...so making a field strict causes it to be automatically forced when 
the constructor is forced, but doesn't actually unbox it? So there's 
still a level of indirection, and no register allocation?


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe