Re: [Haskell] Haskell fast (?) arrays
Federico Squartini wrote: Thanks for the hints. It's a pity that (as far as I know) no one has written a tutorial on those techniques, because I think it would be appreciated. Some of them are quite involved and learning them just by reading code is very time consuming. There's the Performance section of the Haskell wiki: http://haskell.org/haskellwiki/Performance Some of the techniques are described there, but by no means all. It's a good place to put knowledge that you acquire while doing these experiments, though. FWIW I vaguely recall there was a performance problem with initialising IOUArrays that we haven't got around to fixing yet. If you can narrow down the test case, then please submit a bug report. Cheers, Simon ___ Haskell mailing list Haskell@haskell.org http://www.haskell.org/mailman/listinfo/haskell
Re: [Haskell] Haskell fast (?) arrays
Frederico, On Tue, 2007-05-01 at 13:59 +0200, Federico Squartini wrote: I was reading an old post where Hal Daume III was analyzing Haskell performance for arrays. He proposed a test program which initializes an array, reverse it a number of times, and sums the contents. So I wrote a c++ reference program, a naive haskell version using lists and I also tweaked a little bit with the IOArray version, which should be the fastest. Unfortunately there is a huge performance gap. Haskell is slower by a factor of ten, even when using imperative style. I think the version using lists is a bit unfair, since in C++ you don't re-allocate the array on the heap and the Haskell version gives you a very nice high-level abstraction of lists. With respect to the imperative version, I would suggest a) letting the program run for longer so you get more reliable timing. b) use a similar optimisations that we've done for a demo of modifying images in-situ for our Gtk2Hs library (in http://darcs.haskell.org/gtk2hs/demo/fastdraw/FastDraw.hs ): import Data.Array.Base ( unsafeWrite ) doFromTo 0 255 $ \y - doFromTo 0 255 $ \x - -- Here, writeArray was replaced with unsafeWrite. The latter does -- not check that the index is within bounds which has a tremendous -- effect on performance. -- writeArray pbData (2+x*chan+y*row) blue -- checked indexing unsafeWrite pbData (2+x*chan+y*row) blue -- unchecked indexing Here, doFromTo is much faster and using unsafeWrite instead of writeArray eliminates the array bound check, which is a big win again. Then again, it is questionable if you really want to do that kind of low-level programming in Haskell. Axel. ___ Haskell mailing list Haskell@haskell.org http://www.haskell.org/mailman/listinfo/haskell
Re: [Haskell] Haskell fast (?) arrays
On Tue, May 01, 2007 at 01:59:01PM +0200, Federico Squartini wrote: I was reading an old post where Hal Daume III was analyzing Haskell performance for arrays. He proposed a test program which initializes an array, reverse it a number of times, and sums the contents. So I wrote a c++ reference program, a naive haskell version using lists and I also tweaked a little bit with the IOArray version, which should be the fastest. Unfortunately there is a huge performance gap. Haskell is slower by a factor of ten, even when using imperative style. I'd recommend using -O2 or -O. GHC doesn't even try to generate fast code if you don't use them. Stefan ___ Haskell mailing list Haskell@haskell.org http://www.haskell.org/mailman/listinfo/haskell
Re: [Haskell] Haskell fast (?) arrays
Of course I know that the list version is very unfair, but I wanted to see what was the trade off between elegance and speed. Regarding whether low level programming makes sense or not, I was just curious to see what are the limits of Haskell. Moreover there is not much literature on high performance Haskell programming (tricks like unsafeWrite), at least organized in a systematic and concise way. My original problem was writing a fast library for simple matrix computations (i.e. multiplication and inversion for small dense matrices). I have not been able to make GSLHaskell work with Lapack so far. :( Anyway here are the new versions and timings, I increased the number of times the vector is reversed, I also compiled everything with -O2. time ./arrayC 499 real0m0.244s user0m0.236s sys0m0.005s time ./list 499 real0m11.036s user0m10.770s sys0m0.118s time ./IOMutArrayUnboxed 499 real0m2.573s user0m2.408s sys0m0.042s time ./IOMutUnbUnsafe 499 real0m2.264s user0m2.183s sys0m0.025s -- -- //compile with g++ -O2 -o arrayC arrayC.cc #include stdio.h #include math.h int main() { int array[51]; for (int i=0;i=50;i++) { array[i]=(19*i+23)%911; } int tmp=0; for (int cnt=0;cnt120;cnt++) { for (int x=0;x=25;x++) { tmp=array[50-x]; array[50-x]=array[x]; array[x]=tmp; } } int result=0; for (int i=0;i=50;i++) { result=result+(array[i]%911); } result=result % 911; printf(%d,result); return 0; } -- compile with -- ghc -O2 --make -o list list.hs module Main where import Data.List testArray = [ (19*i+23) `mod` 911 |i - [0..50]] sumArrayMod = foldl (\x y - (y+x) `mod` 911) 0 main = print $ sumArrayMod$ foldl (.) id (replicate 120 reverse) $testArray -- -- compile with -- ghc -O2 --make -o IOMutArrayUnboxed IOMutArrayUnboxed.hs module Main where import Monad import Data.Array.IO http://data.array.io/ import Data.Array.MArray import Data.Array.Unboxed total, semiTotal ::Int total= 50 javascript:void(0) semiTotal=25 testArray :: IO (IOUArray Int Int) testArray = newListArray (0,total) [(19*i+23) `mod` 911 |i - [0..total]] reverseArray :: IOUArray Int Int - IO () reverseArray arr = mapM_ (\i - do oldi - readArray arr i oldj - readArray arr (total-i) writeArray arr i oldj writeArray arr (total-i) oldi) [0..semiTotal] sumArrayMod :: IOUArray Int Int - IO Int sumArrayMod arr = foldM (\s i - do x - readArray arr i return $!(s+x) `mod` 911) 0 [0..total] main::IO() main = testArray = \a - sequence (replicate 120 $reverseArray a) sumArrayMod a = print -- compile with -- ghc -O2 --make -o IOMutUnbUnsafe IOMutUnbUnsafe.hs module Main where import Monad import Data.Array.IO http://data.array.io/ import Data.Array.MArray import Data.Array.Unboxed import Data.Array.Base ( unsafeWrite, unsafeRead ) total, semiTotal ::Int total= 50 javascript:void(0) semiTotal=25 testArray :: IO (IOUArray Int Int) testArray = newListArray (0,total) [(19*i+23) `mod` 911 |i - [0..total]] reverseArray :: IOUArray Int Int - IO () reverseArray arr = mapM_ (\i - do oldi - unsafeRead arr i oldj - unsafeRead arr (total-i) unsafeWrite arr i oldj unsafeWrite arr (total-i) oldi) [0..semiTotal] sumArrayMod :: IOUArray Int Int - IO Int sumArrayMod arr = foldM (\s i - do x - unsafeRead arr i return $!(s+x) `mod` 911) 0 [0..total] main::IO() main = testArray = \a - doFromTo 1 120 (\_ - reverseArray a) sumArrayMod a = print {-# INLINE doFromTo #-} -- do the action for [from..to], ie it's inclusive. doFromTo :: Int - Int - (Int - IO ()) - IO () doFromTo from to action = let loop n | n to = return () | otherwise = do action n loop (n+1) in loop from --- ___ Haskell mailing list Haskell@haskell.org http://www.haskell.org/mailman/listinfo/haskell
Re: [Haskell] Haskell fast (?) arrays
federico.squartini: Of course I know that the list version is very unfair, but I wanted to see what was the trade off between elegance and speed. Regarding whether low level programming makes sense or not, I was just curious to see what are the limits of Haskell. Moreover there is not much literature on high performance Haskell programming (tricks like unsafeWrite), at least organized in a systematic and concise way. My original problem was writing a fast library for simple matrix computations (i.e. multiplication and inversion for small dense matrices). I have not been able to make GSLHaskell work with Lapack so far. :( Anyway here are the new versions and timings, I increased the number of times the vector is reversed, I also compiled everything with -O2. Probably a good idea to use techniques from Data.ByteString (ie. use strict Ptr loops, and Foreign arrays), or techniques from the shootout, if you're chasing C speed. Good examples are: http://shootout.alioth.debian.org/gp4/benchmark.php?test=nsievebitslang=ghcid=4 (mutable bit arrays) http://shootout.alioth.debian.org/gp4/benchmark.php?test=nsievelang=ghcid=0 (mutable byte (foreign) arrays) http://shootout.alioth.debian.org/gp4/benchmark.php?test=spectralnormlang=ghcid=4 (more mutable arrays) When I really really care about speed, I use Foreign.Marshal.Array Data.ByteString and apply ! patterns liberally, checking the Core output for inner loops. -O2 -optc-O2 -optc-march=pentium4 often helps. 1-4x C is around what you can best hope for. 10x says still room for improvement in my experience. -- Don ___ Haskell mailing list Haskell@haskell.org http://www.haskell.org/mailman/listinfo/haskell
Re: [Haskell] Haskell fast (?) arrays
Sorry, I was very silly! This is the correct version of the program using the doFromto loop. And it runs fast! I hope there are no further mistakes. Thanks Axel. time ./IOMutUnbUnsafe 499 real0m0.708s user0m0.573s sys 0m0.008s - -- compile with -- ghc --make -o IOMutUnbUnsafe IOMutUnbUnsafe.hs module Main where import Monad import Data.Array.IO import Data.Array.MArray import Data.Array.Unboxed import Data.Array.Base ( unsafeWrite, unsafeRead ) total, semiTotal ::Int total= 50 semiTotal=25 testArray :: IO (IOUArray Int Int) testArray = newListArray (0,total) [(19*i+23) `mod` 911 |i - [0..total]] reverseArray :: IOUArray Int Int - IO () reverseArray arr = doFromTo 0 semiTotal (\i - do oldi - unsafeRead arr i oldj - unsafeRead arr (total-i) unsafeWrite arr i oldj unsafeWrite arr (total-i) oldi) sumArrayMod :: IOUArray Int Int - IO Int sumArrayMod arr = foldM (\s i - do x - unsafeRead arr i return $!(s+x) `mod` 911) 0 [0..total] main::IO() main = testArray = \a - doFromTo 1 120 (\_ - reverseArray a) sumArrayMod a = print {-# INLINE doFromTo #-} -- do the action for [from..to], ie it's inclusive. doFromTo :: Int - Int - (Int - IO ()) - IO () doFromTo from to action = let loop n | n to = return () | otherwise = do action n loop (n+1) in loop from --- Federico ___ Haskell mailing list Haskell@haskell.org http://www.haskell.org/mailman/listinfo/haskell
Re: [Haskell] Haskell fast (?) arrays
Thanks for the hints. It's a pity that (as far as I know) no one has written a tutorial on those techniques, because I think it would be appreciated. Some of them are quite involved and learning them just by reading code is very time consuming. Federico ___ Haskell mailing list Haskell@haskell.org http://www.haskell.org/mailman/listinfo/haskell
Re: [Haskell] Haskell fast (?) arrays
I think another interesting data point would be for a C++ version that uses the 'vector' data type from STL (Standard Template Library) and using the vector indexing ops that do bounds-checking. Regards, Nikhil ___ Haskell mailing list Haskell@haskell.org http://www.haskell.org/mailman/listinfo/haskell
Re: [Haskell] Haskell fast (?) arrays
On 5/1/07, Federico Squartini [EMAIL PROTECTED] wrote: Thanks for the hints. It's a pity that (as far as I know) no one has written a tutorial on those techniques, because I think it would be appreciated. Some of them are quite involved and learning them just by reading code is very time consuming. I personally recommend to people that if sections of your code are really performance-critical, you should consider writing them in a lower-level lanaguage like C and using FFI for access. P.S. I wonder if jhc could improve the output code? -- Taral [EMAIL PROTECTED] You can't prove anything. -- Gödel's Incompetence Theorem ___ Haskell mailing list Haskell@haskell.org http://www.haskell.org/mailman/listinfo/haskell
Re: [Haskell] Haskell fast (?) arrays
On 5/1/07, Federico Squartini [EMAIL PROTECTED] wrote: Moreover there is not much literature on high performance Haskell programming (tricks like unsafeWrite), at least organized in a systematic and concise way. Look at: http://haskell.org/haskellwiki/Performance regards, Bas van Dijk ___ Haskell mailing list Haskell@haskell.org http://www.haskell.org/mailman/listinfo/haskell
Re: [Haskell] Haskell fast (?) arrays
While scanning my Inbox I read 'fast' and 'array' in the context of functional programming. Well, of course SaC instantly came to my mind (what a surprise ;) ). So I did some measurements myself. I used your programs, except that I increased the array size by a factor of 10. For the C++ version I had to move the array to the heap and fix the order of function applications within the fold. Here are the timings: C++ 520 real0m0.204s user0m0.182s sys 0m0.023s Haskell IOArray (the extended version with unsafe accesses that was posted shortly after yours) 520 real0m5.542s user0m5.453s sys 0m0.068s Haskell Lists (just to be complete) 520 real0m27.596s user0m26.650s sys 0m0.870s and finally SaC Dimension: 0 Shape: 520 real0m0.057s user0m0.048s sys 0m0.000s The corresponding SaC program follows. I have compiled it with sac2c -O3. I used the current compiler from the website http://www.sac-home.org. use Structures : all; use StdIO : all; inline int sumMod( int a, int b) { return( (a + b) % 911); } inline int sumArrayMod( int[*] A) { res = with { ( shape(A) * 0 = iv shape(A)) : A[iv]; } : fold( sumMod, 0); return( res); } int main() { testArray = (19*iota(501)+23) % 911; print( sumArrayMod( reverse( reverse( reverse( reverse( reverse( reverse( reverse( reverse( reverse( reverse( reverse( reverse( reverse( reverse( reverse( reverse( testArray)); return( 0); } On 5/1/07, Federico Squartini [EMAIL PROTECTED] wrote: I was reading an old post where Hal Daume III was analyzing Haskell performance for arrays. He proposed a test program which initializes an array, reverse it a number of times, and sums the contents. So I wrote a c++ reference program, a naive haskell version using lists and I also tweaked a little bit with the IOArray version, which should be the fastest. Unfortunately there is a huge performance gap. Haskell is slower by a factor of ten, even when using imperative style. C++ time ./arrayC 499 real0m0.059s user0m0.044s sys0m0.008s HASKELL - IOUArray time ./IOMutArrayUnboxed 499 real0m0.720s user0m0.571s sys0m0.019s HASKELL - list time ./list 499 real0m1.845s user0m1.770s sys0m0.064s Can anyone suggest a faster version (using whatever data structure)? I like Haskell very much but I still have to figure out if the slowness of some code is due to my lack of knowledge or to some intrinsic limitation of the language (or libraries). By the way, sorry for the poor quality of the code, I am not a computer scientist. --- --- //compile with //g++ -o arrayC arrayC.cc #include stdio.h #include math.h int main() { int array[51]; for (int i=0;i=50;i++) { array[i]=(19*i+23)%911; } int tmp=0; for (int cnt=0;cnt12;cnt++) { for (int x=0;x=25;x++) { tmp=array[50-x]; array[50-x]=array[x]; array[x]=tmp; } } int result=0; for (int i=0;i=50;i++) { result=result+(array[i]%911); } result=result % 911; printf(%d,result); return 0; } - - -- compile with -- ghc --make -o list list.hs module Main where testArray = [ (19*i+23) `mod` 911 |i - [0..50]] sumArrayMod = foldl (\x y - (y+x) `mod` 911) 0 main = print $ sumArrayMod$ reverse$ reverse$ reverse$ reverse$ reverse$ reverse$ reverse$ reverse$ reverse$ reverse$ reverse$ reverse$ reverse$ reverse$ reverse$ reverse$ testArray - - -- compile with -- ghc --make -o IOMutArrayUnboxed IOMutArrayUnboxed.hs module Main where import Monad import Data.Array.IO import Data.Array.MArray import Data.Array.Unboxed total, semiTotal ::Int total= 50 semiTotal=25 testArray :: IO (IOUArray Int Int) testArray = newListArray (0,total) [(19*i+23) `mod` 911 |i - [0..total]] reverseArray :: IOUArray Int Int - IO () reverseArray arr = mapM_ (\i - do oldi - readArray arr i oldj - readArray arr (total-i) writeArray arr i oldj writeArray arr (total-i) oldi) [0..semiTotal] sumArrayMod :: IOUArray Int Int - IO Int sumArrayMod arr = foldM (\s i - do x - readArray arr i
Re: [Haskell] Haskell fast (?) arrays
federico.squartini: Sorry, I was very silly! This is the correct version of the program using the doFromto loop. And it runs fast! I hope there are no further mistakes. Thanks Axel. time ./IOMutUnbUnsafe 499 real 0m0.708s user 0m0.573s sys 0m0.008s Here's an improved version, using Foreign.Marshal.Array. I spent about 2 minutes inspecting the core, as well. Before, with your IOUArray version: $ time ./T 499 ./T 1.46s user 0.02s system 97% cpu 1.515 total with the new version: $ time ./S 499 ./S 1.15s user 0.01s system 99% cpu 1.168 total Here's the source, its more idiomatic high-perf Haskell, I'd argue. Cheers, Don {-# OPTIONS -O2 -optc-O -optc-march=pentium4 -fbang-patterns #-} import Control.Monad import Foreign.Marshal.Array import Foreign total :: Int total = 51 type Arr = Ptr Int testArray :: IO Arr testArray = do u - mallocArray total :: IO Arr forM_ [0 .. total] $ \i - pokeElemOff u i ((19*i+23) `mod` 911) return u reverseArray :: Arr - Int - Int - IO () reverseArray !p !i !j | i j = do x - peekElemOff p i y - peekElemOff p j pokeElemOff p i y pokeElemOff p j x reverseArray p (i+1) (j-1) | otherwise = return () sumArrayMod :: Arr - Int - Int - IO Int sumArrayMod !p !s !i | i total = do x - peekElemOff p i sumArrayMod p ((s + x) `rem` 911) (i+1) | otherwise = return s main :: IO () main = do a - testArray replicateM_ 120 (reverseArray a 0 (total-1)) print = sumArrayMod a 0 0 ___ Haskell mailing list Haskell@haskell.org http://www.haskell.org/mailman/listinfo/haskell