Re: [Haskell-cafe] poor perfomance of indexU in uvector package
On Sun, Nov 15, 2009 at 8:59 PM, Don Stewart d...@galois.com wrote: alexey.skladnoy: I found that perfomace of indexU is very poor and it is not fast O(1) operation which is very surprising. Here is some benchmarcking I've done. Everything compiled with -O2 You're using the streamed version when its not fusing. Use the non-streaming direct implementation exported from Data.Array.Vector.UArr This is really an API bug, but I've not had time to sanitize the use. Probably this should be stated more explicitly in documentation. This is _very_ unexpected and confusing behaviour. Also I don't quite understand nature of bug. Is this missing export or wrong function is exported. And is streamed version of idexU useful and in which way? On Sun, Nov 15, 2009 at 9:11 PM, Thomas DuBuisson thomas.dubuis...@gmail.com wrote: The documentation explicitly says indexU is O(n) - no need for so much testing to rediscover that fact. When I needed a contiguous block of values in UArr, I just relied on sliceU to acquire the block and performed a foldU. Problems begin when you need non-contiguous block. Easiest way to so is indexing. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] poor perfomance of indexU in uvector package
On 16/11/2009, at 22:46, Alexey Khudyakov wrote: Problems begin when you need non-contiguous block. Easiest way to so is indexing. FWIW, this operation is called backpermute and is probably exported as bpermute in uvector. Roman ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] poor perfomance of indexU in uvector package
Hello This post meant to be literate haskell. I found that perfomace of indexU is very poor and it is not fast O(1) operation which is very surprising. Here is some benchmarcking I've done. Everything compiled with -O2 Code below converts square 2D array to list of 1D arrays. Summation of array contents is done in force evaluation import Control.Monad import Control.Monad.ST import Data.Array.Vector import System arr :: Int - UArr Double arr n = toU $ map fromIntegral [1 .. n*n] This is fastest function. It slice arrays along another direction and used mainly as upper bound of speed sliceY :: Int - UArr Double - [UArr Double] sliceY n a = map (\i - sliceU a (i*n) n) [0 .. n-1] Naive implementation using lists and index lookup. 2.15 second for 200*200 array sliceXlist :: Int - UArr Double - [UArr Double] sliceXlist n a = map mkSlice [0 .. n-1] where mkSlice x = toU $ map (\y - indexU a (x + y*n)) [0 .. n-1] Similar implementation in ST monad and it uses indexU too. 2.14 seconds for 200*200 array sliceXst :: Int - UArr Double - [UArr Double] sliceXst n a = map mkSlice [0 .. n-1] where mkSlice x = runST $ do arr - newMU n forM_ [0 .. n-1] $ \y - writeMU arr y (indexU a (y*n + x)) unsafeFreezeAllMU arr This implementation avoids use of indexU by copying entire 2D array into mutable array and using it for lookup. Surprisingly it outperform previsious implementations for sufficiently big n 1.19 seconds for 200*200 array sliceXcopy :: Int - UArr Double - [UArr Double] sliceXcopy n a = map mkSlice [0 .. n-1] where mkSlice x = runST $ do arr - newMU n cp - newMU (n*n) copyMU cp 0 a forM_ [0 .. n-1] $ \y - writeMU arr y = readMU cp (y*n + x) unsafeFreezeAllMU arr This is another implementation with lists which convert whole array to list and picks appropriate element it. It is fastest implementation so far. 0.039 seconds for 200*200 array sliceXlistfast :: Int - UArr Double - [UArr Double] sliceXlistfast n a = map mkSlice [0 .. n-1] where takeEvery n [] = [] takeEvery n (x:xs) = x : takeEvery n (drop (n-1) xs) mkSlice x = toU $ takeEvery n . drop x $ fromU a main :: IO () main = do [str,a] - getArgs let n = read str case a of y- print $ sum $ map sumU (sliceY n (arr n)) list - print $ sum $ map sumU (sliceXlist n (arr n)) lf - print $ sum $ map sumU (sliceXlistfast n (arr n)) st - print $ sum $ map sumU (sliceXst n (arr n)) copy - print $ sum $ map sumU (sliceXcopy n (arr n)) ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] poor perfomance of indexU in uvector package
On Sun, Nov 15, 2009 at 03:00:34PM +0300, Alexey Khudyakov wrote: Naive implementation using lists and index lookup. 2.15 second for 200*200 array sliceXlist :: Int - UArr Double - [UArr Double] sliceXlist n a = map mkSlice [0 .. n-1] where mkSlice x = toU $ map (\y - indexU a (x + y*n)) [0 .. n-1] Have you tried something like mkSlice x = mapU (\y - indexU a (x + y*n)) $ enumFromToU 0 (n-1) I guess it should be a lot faster :). Also, I would recomend using criterion. Another implementation you may try is a' = mapU (\(i :*: x) - (i `mod` n) :*: x) (indexedU a) mkSlice j = fstU $ filterU (\(i :*: x) - i == j) a' HTH, -- Felipe. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] poor perfomance of indexU in uvector package
On Sun, Nov 15, 2009 at 5:50 PM, Felipe Lessa felipe.le...@gmail.com wrote: On Sun, Nov 15, 2009 at 03:00:34PM +0300, Alexey Khudyakov wrote: Naive implementation using lists and index lookup. 2.15 second for 200*200 array sliceXlist :: Int - UArr Double - [UArr Double] sliceXlist n a = map mkSlice [0 .. n-1] where mkSlice x = toU $ map (\y - indexU a (x + y*n)) [0 .. n-1] Have you tried something like mkSlice x = mapU (\y - indexU a (x + y*n)) $ enumFromToU 0 (n-1) I guess it should be a lot faster :). No it doesn't. There is no significant difference between two variant above. I think any program which uses indexU will be slowed to crawl. Seems like a bug for me. Another implementation you may try is a' = mapU (\(i :*: x) - (i `mod` n) :*: x) (indexedU a) mkSlice j = fstU $ filterU (\(i :*: x) - i == j) a' This one is fastest so far Also, I would recomend using criterion. I tried to do so.. But it depends on gtk2hs and it is too difficult to install ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] poor perfomance of indexU in uvector package
Also, I would recomend using criterion. I tried to do so.. But it depends on gtk2hs and it is too difficult to install You can install with the flag to skip gtk2hs installation. i.e. Try `cabal install criterion -f-chart` -~nwn ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] poor perfomance of indexU in uvector package
alexey.skladnoy: Hello This post meant to be literate haskell. I found that perfomace of indexU is very poor and it is not fast O(1) operation which is very surprising. Here is some benchmarcking I've done. Everything compiled with -O2 You're using the streamed version when its not fusing. Use the non-streaming direct implementation exported from Data.Array.Vector.UArr This is really an API bug, but I've not had time to sanitize the use. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] poor perfomance of indexU in uvector package
On Sun, Nov 15, 2009 at 06:16:03PM +0300, Alexey Khudyakov wrote: Another implementation you may try is a' = mapU (\(i :*: x) - (i `mod` n) :*: x) (indexedU a) mkSlice j = fstU $ filterU (\(i :*: x) - i == j) a' This one is fastest so far Nice! Just for the record, of course I meant 'sndU' :). Thanks god Haskell is statically typed and that error should be caught rather easily. -- Felipe. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] poor perfomance of indexU in uvector package
The documentation explicitly says indexU is O(n) - no need for so much testing to rediscover that fact. When I needed a contiguous block of values in UArr, I just relied on sliceU to acquire the block and performed a foldU. Thomas On Sun, Nov 15, 2009 at 4:00 AM, Alexey Khudyakov alexey.sklad...@gmail.com wrote: Hello This post meant to be literate haskell. I found that perfomace of indexU is very poor and it is not fast O(1) operation which is very surprising. Here is some benchmarcking I've done. Everything compiled with -O2 Code below converts square 2D array to list of 1D arrays. Summation of array contents is done in force evaluation import Control.Monad import Control.Monad.ST import Data.Array.Vector import System arr :: Int - UArr Double arr n = toU $ map fromIntegral [1 .. n*n] This is fastest function. It slice arrays along another direction and used mainly as upper bound of speed sliceY :: Int - UArr Double - [UArr Double] sliceY n a = map (\i - sliceU a (i*n) n) [0 .. n-1] Naive implementation using lists and index lookup. 2.15 second for 200*200 array sliceXlist :: Int - UArr Double - [UArr Double] sliceXlist n a = map mkSlice [0 .. n-1] where mkSlice x = toU $ map (\y - indexU a (x + y*n)) [0 .. n-1] Similar implementation in ST monad and it uses indexU too. 2.14 seconds for 200*200 array sliceXst :: Int - UArr Double - [UArr Double] sliceXst n a = map mkSlice [0 .. n-1] where mkSlice x = runST $ do arr - newMU n forM_ [0 .. n-1] $ \y - writeMU arr y (indexU a (y*n + x)) unsafeFreezeAllMU arr This implementation avoids use of indexU by copying entire 2D array into mutable array and using it for lookup. Surprisingly it outperform previsious implementations for sufficiently big n 1.19 seconds for 200*200 array sliceXcopy :: Int - UArr Double - [UArr Double] sliceXcopy n a = map mkSlice [0 .. n-1] where mkSlice x = runST $ do arr - newMU n cp - newMU (n*n) copyMU cp 0 a forM_ [0 .. n-1] $ \y - writeMU arr y = readMU cp (y*n + x) unsafeFreezeAllMU arr This is another implementation with lists which convert whole array to list and picks appropriate element it. It is fastest implementation so far. 0.039 seconds for 200*200 array sliceXlistfast :: Int - UArr Double - [UArr Double] sliceXlistfast n a = map mkSlice [0 .. n-1] where takeEvery n [] = [] takeEvery n (x:xs) = x : takeEvery n (drop (n-1) xs) mkSlice x = toU $ takeEvery n . drop x $ fromU a main :: IO () main = do [str,a] - getArgs let n = read str case a of y - print $ sum $ map sumU (sliceY n (arr n)) list - print $ sum $ map sumU (sliceXlist n (arr n)) lf - print $ sum $ map sumU (sliceXlistfast n (arr n)) st - print $ sum $ map sumU (sliceXst n (arr n)) copy - print $ sum $ map sumU (sliceXcopy n (arr n)) ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe