Re: [Haskell] Haskell fast (?) arrays

2007-05-02 Thread Simon Marlow

Federico Squartini wrote:

Thanks for the hints. It's a pity that (as far as I know) no one has
written a tutorial on those techniques, because I think it would be
appreciated. Some of them are quite involved and learning them just by
reading code is very time consuming.


There's the Performance section of the Haskell wiki:

  http://haskell.org/haskellwiki/Performance

Some of the techniques are described there, but by no means all.  It's a good 
place to put knowledge that you acquire while doing these experiments, though.


FWIW I vaguely recall there was a performance problem with initialising 
IOUArrays that we haven't got around to fixing yet.  If you can narrow down the 
test case, then please submit a bug report.


Cheers,
Simon
___
Haskell mailing list
Haskell@haskell.org
http://www.haskell.org/mailman/listinfo/haskell


Re: [Haskell] Haskell fast (?) arrays

2007-05-01 Thread Axel Simon
Frederico,

On Tue, 2007-05-01 at 13:59 +0200, Federico Squartini wrote:
 I was reading an old post where Hal Daume III was analyzing Haskell
 performance for arrays. 
 He proposed a test program which initializes an array, reverse it a
 number of times, and sums the contents.
 
 So I wrote a c++ reference program, a naive haskell version using
 lists and I also tweaked a little bit with the IOArray version, which
 should be the fastest. Unfortunately there is a  huge performance gap.
 Haskell is slower by a factor of ten, even when using imperative
 style. 

I think the version using lists is a bit unfair, since in C++ you don't
re-allocate the array on the heap and the Haskell version gives you a
very nice high-level abstraction of lists.

With respect to the imperative version, I would suggest

a) letting the program run for longer so you get more reliable timing.
b) use a similar optimisations that we've done for a demo of modifying
images in-situ for our Gtk2Hs library (in
http://darcs.haskell.org/gtk2hs/demo/fastdraw/FastDraw.hs ):

import Data.Array.Base ( unsafeWrite )

doFromTo 0 255 $ \y -
  doFromTo 0 255 $ \x -
-- Here, writeArray was replaced with unsafeWrite. The latter does
-- not check that the index is within bounds which has a tremendous
-- effect on performance.
--  writeArray  pbData (2+x*chan+y*row) blue  -- checked indexing
unsafeWrite pbData (2+x*chan+y*row) blue  -- unchecked indexing

Here, doFromTo is much faster and using unsafeWrite instead of
writeArray eliminates the array bound check, which is a big win again.
Then again, it is questionable if you really want to do that kind of
low-level programming in Haskell.

Axel.


___
Haskell mailing list
Haskell@haskell.org
http://www.haskell.org/mailman/listinfo/haskell


Re: [Haskell] Haskell fast (?) arrays

2007-05-01 Thread Stefan O'Rear
On Tue, May 01, 2007 at 01:59:01PM +0200, Federico Squartini wrote:
 I was reading an old post where Hal Daume III was analyzing Haskell
 performance for arrays.
 He proposed a test program which initializes an array, reverse it a number
 of times, and sums the contents.
 
 So I wrote a c++ reference program, a naive haskell version using lists and
 I also tweaked a little bit with the IOArray version, which should be the
 fastest. Unfortunately there is a  huge performance gap. Haskell is slower
 by a factor of ten, even when using imperative style.

I'd recommend using -O2 or -O.  GHC doesn't even try to generate fast
code if you don't use them. 

Stefan
___
Haskell mailing list
Haskell@haskell.org
http://www.haskell.org/mailman/listinfo/haskell


Re: [Haskell] Haskell fast (?) arrays

2007-05-01 Thread Federico Squartini

Of course I know that the list version is very unfair, but I wanted to see
what was the trade off between elegance and speed.
Regarding whether low level programming makes sense or not, I was just
curious to see what are the limits of Haskell. Moreover there is not much
literature on high performance Haskell programming (tricks like
unsafeWrite), at least organized in a systematic and concise way.

My original problem was writing a  fast library for simple matrix
computations (i.e. multiplication and inversion for small dense matrices).
I have not been able to make GSLHaskell work with Lapack so far. :(

Anyway here are the new versions and timings, I increased the number of
times the vector is reversed, I also compiled everything with -O2.

time ./arrayC
499
real0m0.244s
user0m0.236s
sys0m0.005s

time ./list
499
real0m11.036s
user0m10.770s
sys0m0.118s

time ./IOMutArrayUnboxed
499
real0m2.573s
user0m2.408s
sys0m0.042s

time ./IOMutUnbUnsafe
499
real0m2.264s
user0m2.183s
sys0m0.025s

--
--

//compile with g++ -O2 -o arrayC arrayC.cc
#include  stdio.h
#include math.h



int main()
{
 int array[51];

 for (int i=0;i=50;i++)
   {
   array[i]=(19*i+23)%911;
   }
 int tmp=0;
 for (int cnt=0;cnt120;cnt++)
   {
 for (int x=0;x=25;x++)
   {
 tmp=array[50-x];
 array[50-x]=array[x];
 array[x]=tmp;
   }
   }
 int result=0;
 for (int i=0;i=50;i++)
   {
 result=result+(array[i]%911);
   }
 result=result % 911;
 printf(%d,result);
 return 0;
}




-- compile with
-- ghc -O2 --make -o list list.hs

module Main
   where

import Data.List

testArray = [ (19*i+23) `mod` 911 |i - [0..50]]

sumArrayMod =  foldl (\x y - (y+x) `mod` 911) 0

main = print $ sumArrayMod$
  foldl (.) id  (replicate 120 reverse) $testArray

--

-- compile with
-- ghc -O2 --make -o IOMutArrayUnboxed IOMutArrayUnboxed.hs
module Main
   where

import Monad
import Data.Array.IO http://data.array.io/
import Data.Array.MArray
import Data.Array.Unboxed

total, semiTotal ::Int
total= 50 javascript:void(0)
semiTotal=25


testArray :: IO (IOUArray Int Int)
testArray = newListArray (0,total)  [(19*i+23) `mod` 911 |i - [0..total]]


reverseArray :: IOUArray Int Int - IO ()
reverseArray arr = mapM_  (\i - do oldi - readArray arr i
   oldj - readArray arr (total-i)
   writeArray arr i oldj
   writeArray arr (total-i) oldi)
  [0..semiTotal]

sumArrayMod :: IOUArray Int Int - IO Int
sumArrayMod arr = foldM (\s i - do x - readArray arr i
   return   $!(s+x) `mod` 911) 0 [0..total]


main::IO()
main = testArray = \a -
  sequence  (replicate 120 $reverseArray a)
  sumArrayMod a =  print




-- compile with
-- ghc -O2 --make -o IOMutUnbUnsafe IOMutUnbUnsafe.hs
module Main
   where

import Monad
import Data.Array.IO http://data.array.io/
import Data.Array.MArray
import Data.Array.Unboxed
import Data.Array.Base ( unsafeWrite, unsafeRead )

total, semiTotal ::Int
total= 50 javascript:void(0)
semiTotal=25


testArray :: IO (IOUArray Int Int)
testArray = newListArray (0,total)  [(19*i+23) `mod` 911 |i - [0..total]]


reverseArray :: IOUArray Int Int - IO ()
reverseArray arr = mapM_  (\i - do oldi - unsafeRead arr i
   oldj - unsafeRead arr (total-i)
   unsafeWrite arr i oldj
   unsafeWrite arr (total-i) oldi)
  [0..semiTotal]

sumArrayMod :: IOUArray Int Int - IO Int
sumArrayMod arr = foldM (\s i - do x - unsafeRead arr i
   return   $!(s+x) `mod` 911) 0 [0..total]



main::IO()
main = testArray = \a -
  doFromTo 1 120 (\_ - reverseArray a)  sumArrayMod a =  print



{-# INLINE doFromTo #-}
-- do the action for [from..to], ie it's inclusive.
doFromTo :: Int - Int - (Int - IO ()) - IO ()
doFromTo from to action =
 let loop n | n  to   = return ()
| otherwise = do action n
 loop (n+1)
  in loop from

---
___
Haskell mailing list
Haskell@haskell.org
http://www.haskell.org/mailman/listinfo/haskell


Re: [Haskell] Haskell fast (?) arrays

2007-05-01 Thread Donald Bruce Stewart
federico.squartini:
 
Of course I know that the list version is very unfair, but I
wanted to see what was the trade off between elegance and
speed.
Regarding whether low level programming makes sense or not,
I was just curious to see what are the limits of Haskell.
Moreover there is not much literature on high performance
Haskell programming (tricks like unsafeWrite), at least
organized in a systematic and concise way.
My original problem was writing a  fast library for simple
matrix computations (i.e. multiplication and inversion for
small dense matrices).  I have not been able to make
GSLHaskell work with Lapack so far. :(
Anyway here are the new versions and timings, I increased
the number of times the vector is reversed, I also compiled
everything with -O2.

Probably a good idea to use techniques from Data.ByteString (ie. use
strict Ptr loops, and Foreign arrays), or techniques from the shootout,
if you're chasing C speed. Good examples are:


http://shootout.alioth.debian.org/gp4/benchmark.php?test=nsievebitslang=ghcid=4
(mutable bit arrays)


http://shootout.alioth.debian.org/gp4/benchmark.php?test=nsievelang=ghcid=0
(mutable byte (foreign) arrays)


http://shootout.alioth.debian.org/gp4/benchmark.php?test=spectralnormlang=ghcid=4
(more mutable arrays)

When I really really care about speed, I use 

Foreign.Marshal.Array
Data.ByteString

and apply ! patterns liberally, checking the Core output for inner
loops. -O2 -optc-O2 -optc-march=pentium4 often helps.

1-4x C is around what you can best hope for. 10x says still room for
improvement in my experience.

-- Don
___
Haskell mailing list
Haskell@haskell.org
http://www.haskell.org/mailman/listinfo/haskell


Re: [Haskell] Haskell fast (?) arrays

2007-05-01 Thread Federico Squartini

Sorry, I was very silly!

This is the correct version of the program using the doFromto loop.
And it runs fast! I hope there are no further mistakes.
Thanks Axel.

time ./IOMutUnbUnsafe
499
real0m0.708s
user0m0.573s
sys 0m0.008s



-
-- compile with
-- ghc --make -o IOMutUnbUnsafe IOMutUnbUnsafe.hs
module Main
   where

import Monad
import Data.Array.IO
import Data.Array.MArray
import Data.Array.Unboxed
import Data.Array.Base ( unsafeWrite, unsafeRead )

total, semiTotal ::Int
total= 50
semiTotal=25


testArray :: IO (IOUArray Int Int)
testArray = newListArray (0,total)  [(19*i+23) `mod` 911 |i - [0..total]]


reverseArray :: IOUArray Int Int - IO ()
reverseArray arr = doFromTo 0 semiTotal (\i - do oldi - unsafeRead arr i

oldj - unsafeRead arr (total-i)

unsafeWrite arr i oldj

unsafeWrite arr (total-i) oldi)

sumArrayMod :: IOUArray Int Int - IO Int
sumArrayMod arr = foldM (\s i - do x - unsafeRead arr i
return
$!(s+x) `mod` 911) 0 [0..total]


main::IO()
main = testArray = \a -
  doFromTo 1 120 (\_ - reverseArray a)  sumArrayMod a =  print



{-# INLINE doFromTo #-}
-- do the action for [from..to], ie it's inclusive.
doFromTo :: Int - Int - (Int - IO ()) - IO ()
doFromTo from to action =
 let loop n | n  to   = return ()
| otherwise = do action n
 loop (n+1)
  in loop from

---

Federico
___
Haskell mailing list
Haskell@haskell.org
http://www.haskell.org/mailman/listinfo/haskell


Re: [Haskell] Haskell fast (?) arrays

2007-05-01 Thread Federico Squartini

Thanks for the hints. It's a pity that (as far as I know) no one has
written a tutorial on those techniques, because I think it would be
appreciated. Some of them are quite involved and learning them just by
reading code is very time consuming.


Federico
___
Haskell mailing list
Haskell@haskell.org
http://www.haskell.org/mailman/listinfo/haskell


Re: [Haskell] Haskell fast (?) arrays

2007-05-01 Thread Rishiyur Nikhil

I think another interesting data point would be for a C++ version
that uses the 'vector' data type from STL (Standard Template Library)
and using the vector indexing ops that do bounds-checking.

Regards,

Nikhil
___
Haskell mailing list
Haskell@haskell.org
http://www.haskell.org/mailman/listinfo/haskell


Re: [Haskell] Haskell fast (?) arrays

2007-05-01 Thread Taral

On 5/1/07, Federico Squartini [EMAIL PROTECTED] wrote:

Thanks for the hints. It's a pity that (as far as I know) no one has
written a tutorial on those techniques, because I think it would be
appreciated. Some of them are quite involved and learning them just by
reading code is very time consuming.


I personally recommend to people that if sections of your code are
really performance-critical, you should consider writing them in a
lower-level lanaguage like C and using FFI for access.

P.S. I wonder if jhc could improve the output code?

--
Taral [EMAIL PROTECTED]
You can't prove anything.
   -- Gödel's Incompetence Theorem
___
Haskell mailing list
Haskell@haskell.org
http://www.haskell.org/mailman/listinfo/haskell


Re: [Haskell] Haskell fast (?) arrays

2007-05-01 Thread Bas van Dijk

On 5/1/07, Federico Squartini [EMAIL PROTECTED] wrote:

Moreover there is not much literature on high performance Haskell programming
(tricks like unsafeWrite), at least organized in a systematic and concise way.


Look at: http://haskell.org/haskellwiki/Performance

regards,

Bas van Dijk
___
Haskell mailing list
Haskell@haskell.org
http://www.haskell.org/mailman/listinfo/haskell


Re: [Haskell] Haskell fast (?) arrays

2007-05-01 Thread Stephan Herhut

While scanning my Inbox I read 'fast' and 'array' in the context of
functional programming. Well, of course SaC instantly came to my mind (what
a surprise ;) ). So I did some measurements myself. I used your programs,
except that I increased the array size by a factor of 10. For the C++
version I had to move the array to the heap and fix the order of function
applications within the fold. Here are the timings:

C++
520
real0m0.204s
user0m0.182s
sys 0m0.023s

Haskell IOArray (the extended version with unsafe accesses that was posted
shortly after yours)
520

real0m5.542s
user0m5.453s
sys 0m0.068s

Haskell Lists (just to be complete)
520

real0m27.596s
user0m26.650s
sys 0m0.870s

and finally SaC
Dimension:  0
Shape:  
520

real0m0.057s
user0m0.048s
sys 0m0.000s

The corresponding SaC program follows. I have compiled it with sac2c -O3. I
used the current compiler from the website http://www.sac-home.org.

use Structures : all;
use StdIO : all;

inline
int sumMod( int a, int b)
{
 return( (a + b) % 911);
}

inline
int sumArrayMod( int[*] A)
{
 res = with {
 ( shape(A) * 0 = iv  shape(A)) : A[iv];
   } : fold( sumMod, 0);

 return( res);
}

int main() {
 testArray = (19*iota(501)+23) % 911;

 print( sumArrayMod(
   reverse( reverse( reverse( reverse(
   reverse( reverse( reverse( reverse(
   reverse( reverse( reverse( reverse(
   reverse( reverse( reverse( reverse(
 testArray));

 return( 0);
}

On 5/1/07, Federico Squartini [EMAIL PROTECTED] wrote:


I was reading an old post where Hal Daume III was analyzing Haskell
performance for arrays.
He proposed a test program which initializes an array, reverse it a number
of times, and sums the contents.

So I wrote a c++ reference program, a naive haskell version using lists
and I also tweaked a little bit with the IOArray version, which should be
the fastest. Unfortunately there is a  huge performance gap. Haskell is
slower by a factor of ten, even when using imperative style.

C++
time ./arrayC
499
real0m0.059s
user0m0.044s
sys0m0.008s

HASKELL - IOUArray
time ./IOMutArrayUnboxed
499
real0m0.720s
user0m0.571s
sys0m0.019s

HASKELL - list
time ./list
499
real0m1.845s
user0m1.770s
sys0m0.064s


Can anyone suggest a faster version (using whatever data structure)? I
like Haskell very much but I still have to figure out if the slowness of
some code is due to my lack of knowledge or to some intrinsic limitation of
the language (or libraries).

By the way, sorry for the poor quality of the code, I am not a computer
scientist.


---


---
//compile with
//g++ -o arrayC arrayC.cc
#include stdio.h
#include  math.h



int main()
{
  int array[51];

  for (int i=0;i=50;i++)
{
array[i]=(19*i+23)%911;
}
  int tmp=0;
  for (int cnt=0;cnt12;cnt++)
{
  for (int x=0;x=25;x++)
{
  tmp=array[50-x];
  array[50-x]=array[x];
  array[x]=tmp;
}
}
  int result=0;
  for (int i=0;i=50;i++)
{
  result=result+(array[i]%911);
}
  result=result % 911;
  printf(%d,result);
  return 0;
}

-


-
-- compile with
-- ghc --make -o list list.hs
module Main
where

testArray = [ (19*i+23) `mod` 911 |i - [0..50]]

sumArrayMod =  foldl (\x y - (y+x) `mod` 911) 0

main = print $ sumArrayMod$
   reverse$ reverse$ reverse$ reverse$
   reverse$ reverse$ reverse$ reverse$
   reverse$ reverse$ reverse$ reverse$
   reverse$ reverse$ reverse$ reverse$
   testArray


-


-
-- compile with
-- ghc --make -o IOMutArrayUnboxed IOMutArrayUnboxed.hs
module Main
where

import Monad
import Data.Array.IO
import Data.Array.MArray
import Data.Array.Unboxed

total, semiTotal ::Int
total= 50
semiTotal=25


testArray :: IO (IOUArray Int Int)
testArray = newListArray (0,total)  [(19*i+23) `mod` 911 |i - [0..total]]


reverseArray :: IOUArray Int Int - IO ()
reverseArray arr = mapM_  (\i - do oldi - readArray arr i
oldj - readArray arr (total-i)
writeArray arr i oldj
writeArray arr (total-i) oldi)
   [0..semiTotal]

sumArrayMod :: IOUArray Int Int - IO Int
sumArrayMod arr = foldM (\s i - do x - readArray arr i
 

Re: [Haskell] Haskell fast (?) arrays

2007-05-01 Thread Donald Bruce Stewart
federico.squartini:
 Sorry, I was very silly!
 
 This is the correct version of the program using the doFromto loop.
 And it runs fast! I hope there are no further mistakes.
 Thanks Axel.
 
 time ./IOMutUnbUnsafe
 499
 real  0m0.708s
 user  0m0.573s
 sys   0m0.008s

Here's an improved version, using Foreign.Marshal.Array. I spent about 2
minutes inspecting the core, as well.

Before, with your IOUArray version:

$ time ./T
499
./T  1.46s user 0.02s system 97% cpu 1.515 total

with the new version:

$ time ./S
499
./S  1.15s user 0.01s system 99% cpu 1.168 total

Here's the source, its more idiomatic high-perf Haskell, I'd argue.

Cheers,
  Don




{-# OPTIONS -O2 -optc-O -optc-march=pentium4 -fbang-patterns #-}

import Control.Monad
import Foreign.Marshal.Array
import Foreign

total :: Int
total = 51

type Arr = Ptr Int

testArray :: IO Arr
testArray = do
u - mallocArray total :: IO Arr
forM_ [0 .. total] $ \i - pokeElemOff u i ((19*i+23) `mod` 911)
return u

reverseArray :: Arr - Int - Int - IO ()
reverseArray !p !i !j
| i  j = do
x - peekElemOff p i
y - peekElemOff p j
pokeElemOff p i y
pokeElemOff p j x
reverseArray p (i+1) (j-1)
| otherwise = return ()

sumArrayMod :: Arr - Int - Int - IO Int
sumArrayMod !p !s !i
| i  total = do
x - peekElemOff p i
sumArrayMod p ((s + x) `rem` 911) (i+1)
| otherwise = return s

main :: IO ()
main = do
a - testArray
replicateM_ 120 (reverseArray a 0 (total-1))
print = sumArrayMod a 0 0

___
Haskell mailing list
Haskell@haskell.org
http://www.haskell.org/mailman/listinfo/haskell