Re: [Haskell-cafe] ANN: psqueue-benchmarks - benchmarks of priority queue implementations
Does not compiles under ghc 7.6.2 Date: Sat, 13 Apr 2013 11:09:13 +0800 From: m...@nh2.me To: k...@iij.ad.jp CC: haskell-cafe@haskell.org Subject: Re: [Haskell-cafe] ANN: psqueue-benchmarks - benchmarks of priority queue implementations I actually found a (potential) problem with the GHC implementation. See here: https://github.com/nh2/psqueue-benchmarks/blob/db89731c5b4bdd2ff2ef81022a65f894036d8453/QueueBenchmark.hs#L44 ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] my knucleotide fast on 64 bit but extremely slow on 32 bit?
Corrected, little bit slower on 64 bit but much faster on 32 bit version.(also made hashmap grow from small default size as it is becnh req)http://benchmarksgame.alioth.debian.org/u32q/program.php?test=knucleotidelang=ghcid=228 secs for 32 bit.http://benchmarksgame.alioth.debian.org/u64q/program.php?test=knucleotidelang=ghcid=225 secs for 64 bit Date: Thu, 28 Mar 2013 20:23:39 + Subject: Re: [Haskell-cafe] my knucleotide fast on 64 bit but extremely slow on 32 bit? From: don...@gmail.com To: bm...@hotmail.com CC: haskell-cafe@haskell.org Int64 is emulated on 32 bit. So it is not as efficient by a long shot. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] my knucleotide fast on 64 bit but extremely slow on 32 bit?
I have posted previous knucleotide program, it is fast on 64 bit butvery slow on 32 bit.I cannot install 32 bit ghc to test it so I can only guess is thatcause is use of Int64 for hash and HashMap array indexing.What bothers me is that it that much slower , and I guessthat array indexing of 64 bit int where native index is 32 bitis culprit.Am I right? If I make Int index of array it will be fast on32 bit platform too?I cannot imagine that hashing to 64 bit int is culprit ratherarray indexing as I guess that every array accessrequires indirect access on 32 bit platform? 64 bit :http://benchmarksgame.alioth.debian.org/u64q/program.php?test=knucleotidelang=ghcid=119.80 secs32 bit:http://benchmarksgame.alioth.debian.org/u32q/program.php?test=knucleotidelang=ghcid=191.01 secs ;( I really like Haskell, but Im frustrated as how inpredictable it is.Im programming from 1983' but found Haskell as much more difficultto program efficiently than C++.In C++ whatever lousy program I write it performs decently,but not so in Haskell. It requires much deeper knowledge thanc++. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] my take at knucleotide
I have posted this version.Mad home grown HashMap and replaced IOref with Ptr.This made program twice as fast as current entry. {-# Language BangPatterns #-} The Computer Language Benchmarks Game-- http://benchmarksgame.alioth.debian.org/ Contributed by Branimir Maksimovic--import Data.Bitsimport Data.Charimport Data.Intimport Data.Listimport Data.Array.Baseimport Data.Array.Unboxedimport Data.Array.IOimport qualified Data.ByteString.Char8 as Simport Foreign.Ptrimport Foreign.Storableimport Foreign.Marshal.Allocimport Control.Concurrentimport Text.Printf main = dolet skip = dol - S.getLineif S.isPrefixOf (S.pack THREE) lthen return ()else skip skips - S.getContentslet content = S.filter ((/=) '\n') s;mapM_ (execute content) actionsdata Actions = I Int | S Stringactions = [I 1,I 2, S GGT,S GGTA,S GGTATT,S GGTAAATT,S GGTAAATTTATAGT]execute content (I i) = writeFrequencies content iexecute content (S s) = writeCount content s writeFrequencies :: S.ByteString - Int - IO ()writeFrequencies input size = doht - tcalculate input sizelst - Main.foldM (\lst (k,v)-do v' - peek vreturn $ (k,v'):lst) [] htlet sorted = sortBy (\(_,x) (_,y) - y `compare` x) lstsum = fromIntegral ((S.length input) + 1 - size)mapM_ (\(k,v)- doprintf %s %.3f\n (toString k) ((100 * (fromIntegral v)/sum)::Double)) sortedputChar '\n' writeCount :: S.ByteString - String - IO ()writeCount input string = do let size = length stringk = T (toNum (S.pack string) 0 size) sizeht - tcalculate input sizeres - Main.lookup ht kcase res of Nothing - putStrLn $ string ++ not found...Just v - do r - peek vprintf %d\t%s\n r string tcalculate :: S.ByteString - Int - IO HMtcalculate input size = dolet l = [0..63]actions = map (\i - (calculate input i size (length l))) lvars - mapM (\action - dovar - newEmptyMVar forkIO $ doanswer - action putMVar var answerreturn var) actionsresult - newTable :: IO HMresults - mapM takeMVar varsmapM_ (\ht - Main.foldM (\lst (k,v) - do res - Main.lookup lst kcase res ofNothing - dor1 - peek v r2 - mallocpoke r2 r1 Main.insert lst k r2Just v1 - dor1 - peek v1 r2 - peek vpoke v1 (r1+r2) return lst) result ht) resultsreturn result calculate :: S.ByteString - Int - Int - Int - IO HMcalculate input beg size incr = do!ht - newTable :: IO HMletcalculate' i | i = ((S.length input)+1 - size) = return ht | otherwise = do let k = T (toNum input i size) sizeres - Main.lookup ht k case res ofNothing - do!r - malloc poke r 1Main.insert ht k r Just v - do !r - peek vpoke v (r+1) calculate' (i+incr)calculate' beg toNum :: S.ByteString - Int - Int - Int64toNum s beg size = toNum' 0 size wheretoNum' v 0 = vtoNum' v i = toNum' ((v `shiftL` 2) .|. (toNumA `unsafeAt` (ord (S.index s (beg+i-1) (i-1) toString :: T - StringtoString (T v s) = toString' v swhere toString' v 0 = []toString' v i = case v..3 of 0 - 'A'1 - 'C'2 - 'T' 3 - 'G' : toString' (v `shiftR` 2) (i-1) toNumA :: UArray Int Int64toNumA = array (0,255) [(ord 'a',0),(ord 'c',1),(ord 't',2),(ord 'g',3),(ord 'A',0),(ord 'C',1),(ord 'T',2),(ord 'G',3)] data T = T !Int64 !Intinstance Eq T where(T a _) == (T b _) = a == bclass Hash h wherehash :: h - Int64instance Hash T wherehash (T a _) = a type HM = HashMap T (Ptr Int)data HashMap k v = HashMap !(IOArray Int64 [(k,v)])tsz = 4096newTable :: IO (HashMap k v)newTable = do!array - newArray (0,(tsz-1)) []return $ HashMap array lookup :: (Eq k, Hash k)=HashMap k v - k - IO (Maybe v)lookup (HashMap a) k = dolet h = hash k!lst - readArray a (h .. (tsz-1))let loop [] = return Nothingloop ((!k',!v):xs) | k /= k' = loop xs| otherwise = return $ Just vloop lst insert :: (Eq k, Hash k)=HashMap k v - k - v - IO ()insert (HashMap a) k v = dolet h = hash k!lst
Re: [Haskell-cafe] my take at knucleotide
Finally, I have made it ;)Trick was in more threads . For some reason if I run 64 (sweet spot) threads program runsfaster both with -threaded and without ;)Other trick is that I don't convert to uppercase (shaves second) rather pack nucleotidein 64 bit int.Program runs 30% faster multithreaded (scales better) than current entry, and consumes 50% less memory,and is shorter.If someone can see some improvement please post, otherwise I will contribute this program. {-# Language BangPatterns #-} The Computer Language Benchmarks Game-- http://benchmarksgame.alioth.debian.org/ Contributed by Branimir Maksimovic--import Data.Bitsimport Data.Charimport Data.Intimport Data.Listimport Data.IORefimport Data.Array.Unboxedimport Data.Array.Baseimport qualified Data.HashTable.IO as Himport Data.Hashableimport qualified Data.ByteString.Char8 as Simport Control.Concurrentimport Text.Printf main = dos - S.getContentslet(_,subs) = S.breakSubstring (S.pack THREE) scontent = (S.filter ((/=) '\n') . S.dropWhile ((/=) '\n')) subs mapM_ (execute content) actions data Actions = I Int | S Stringactions = [I 1,I 2, S GGT,S GGTA,S GGTATT,S GGTAAATT,S GGTAAATTTATAGT]execute content (I i) = writeFrequencies content iexecute content (S s) = writeCount content s writeFrequencies :: S.ByteString - Int - IO ()writeFrequencies input size = doht - tcalculate input sizelst - H.foldM (\lst (k,v)-do v' - readIORef vreturn $ insertBy (\(_,x) (_,y)-y `compare` x) (k,v') lst) [] htlet sum = fromIntegral ((S.length input) + 1 - size)mapM_ (\(k,v)- doprintf %s %.3f\n (toString k) ((100 * (fromIntegral v)/sum)::Double)) lstputChar '\n' writeCount :: S.ByteString - String - IO ()writeCount input string = do let size = length stringht - tcalculate input sizelet k = T (toNum (S.pack string) 0 size) sizeres - H.lookup ht kcase res of Nothing - putStrLn $ string ++ not found...Just v - do r - readIORef vprintf %d\t%s\n r string tcalculate :: S.ByteString - Int - IO HMtcalculate input size = dolet l = [0..63]actions = map (\i - (calculate input i size (length l))) lvars - mapM (\action - dovar - newEmptyMVar forkIO $ doanswer - action putMVar var answerreturn var) actionsresult - newTable :: IO HMresults - mapM takeMVar varsmapM_ (\ht - H.foldM (\lst (k,v) - do res - H.lookup lst k case res ofNothing - do r1 - readIORef v r2 - newIORef r1H.insert lst k r2 Just v1 - dor1 - readIORef v1r2 - readIORef v writeIORef v1 (r1+r2)return lst) result ht) resultsreturn result calculate :: S.ByteString - Int - Int - Int - IO HMcalculate input beg size incr = do!ht - newTable :: IO HMletcalculate' i | i = ((S.length input)+1 - size) = return ht | otherwise = do res - H.lookup ht kcase res ofNothing - do !r - newIORef 1H.insert ht k r Just v - do!r - readIORef v writeIORef v (r+1)calculate' (i+incr)where k = T (toNum input i size) sizecalculate' beg toNum :: S.ByteString - Int - Int - Int64toNum s beg size = toNum' 0 size wheretoNum' v 0 = vtoNum' v i = toNum' ((v `shiftL` 2) .|. (toNumA `unsafeAt` (ord (S.index s (beg+i-1) (i-1) toString :: T - StringtoString (T v s) = toString' v swhere toString' v 0 = []toString' v i = case v..3 of 0 - 'A'1 - 'C'2 - 'T' 3 - 'G' : toString' (v `shiftR` 2) (i-1) toNumA :: UArray Int Int64toNumA = array (0,255) [(ord 'a',0),(ord 'c',1),(ord 't',2),(ord 'g',3),(ord 'A',0),(ord 'C',1),(ord 'T',2),(ord 'G',3)] data T = T !Int64 !Intinstance Eq T where(T a _) == (T b _) = a == binstance Hashable T wherehashWithSalt _ (T a _) = fromIntegral a type HM = H.BasicHashTable T (IORef Int)newTable = H.new Date: Sun, 24 Mar 2013 20:12:57 +0100 Subject: Re: [Haskell-cafe] my take at knucleotide From: g...@gregorycollins.net To: bm...@hotmail.com CC: haskell-cafe@haskell.org What happens to performance if you compile and link with cabal install --constraint='hashable 1.2' ? G
[Haskell-cafe] my take at knucleotide
Hi, I have tried to implement knucleotide benchmark program this time:http://benchmarksgame.alioth.debian.org/u64q/performance.php?test=knucleotide Implementation is shorter (uses hashtable from hashtables package),but two time slower then current Haskell entry ( which is too low levelfor my taste :)).What is interesting is that if I try to place Int64 as a key tohash table, performance is even slower.Strange that dropping and taking from bytestring would befaster than packing string in 64 bit int and directly indexing. If someone can see something that can bring performance on parwith current haskell entry , I would post it , otherwise no point,except that program is shorter and not low level. {-# Language BangPatterns #-} The Computer Language Benchmarks Game-- http://benchmarksgame.alioth.debian.org/ Contributed by Branimir Maksimovic--import Data.Charimport Data.Listimport Data.IORefimport qualified Data.HashTable.IO as Himport qualified Data.ByteString.Char8 as Simport Control.Concurrentimport Text.Printf main = dos - S.getContentslet content = (S.map toUpper . S.concat . tail . dropWhile (\l-not $ S.isPrefixOf (S.pack THREE) l) . S.lines) smapM_ (execute content) actions data Actions = I Int | S Stringactions = [I 1,I 2, S GGT,S GGTA,S GGTATT,S GGTAAATT,S GGTAAATTTATAGT]execute content (I i) = writeFrequencies content iexecute content (S s) = writeCount content s writeFrequencies input size = doht - tcalculate input sizelst - H.foldM (\lst (k,v)-do v' - readIORef vreturn $ insertBy (\(_,x) (_,y)-y `compare` x) (k,v') lst) [] htlet sum = fromIntegral ((S.length input) + 1 - size)mapM_ (\(k,v)- doprintf %s %.3f\n (S.unpack k) ((100 * (fromIntegral v)/sum)::Double)) lstputChar '\n' writeCount input string = dolet size = length stringht - tcalculate input sizeres - H.lookup ht (S.pack string)case res of Nothing - putStrLn $ string ++ not found...Just v - dor - readIORef vprintf %d\t%s\n r (string::String) tcalculate input size = dolet l = [0..7]actions = map (\i - (calculate input i size (length l))) lvars - mapM (\action - do var - newEmptyMVarforkIO $ do answer - actionputMVar var answer return var) actionsresult - newTableresults - mapM takeMVar vars mapM_ (\ht - H.foldM (\lst (k,v) - do res - H.lookup lst kcase res of Nothing - dor1 - readIORef v r2 - newIORef r1 H.insert lst k r2Just v1 - do r1 - readIORef v1r2 - readIORef vwriteIORef v1 (r1+r2) return lst) result ht) resultsreturn resultcalculate input beg size incr = doht - newTableletcalculate' :: S.ByteString - Int - IO HashTablecalculate' str i | i = ((S.length input)+1 - size) = return ht | otherwise = dores - H.lookup ht kcase res ofNothing - do !r - newIORef 1H.insert ht k rJust v - do!r - readIORef vwriteIORef v (r+1)calculate' (S.drop incr str) (i+incr)where k = S.take size strcalculate' (S.drop beg input) beg type HashTable = H.BasicHashTable S.ByteString (IORef Int) newTable :: IO HashTablenewTable = H.new ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] my take at knucleotide
I have placed constraint on version of hashable, time is exactly same.bmaxa@maxa:~/shootout/knucleotide$ cabal list hashable* hashable Synopsis: A class for types that can be converted to a hash valueDefault available version: 1.2.0.5Installed versions: 1.1.2.5Homepage: http://github.com/tibbe/hashableLicense: BSD3 Date: Sun, 24 Mar 2013 20:12:57 +0100 Subject: Re: [Haskell-cafe] my take at knucleotide From: g...@gregorycollins.net To: bm...@hotmail.com CC: haskell-cafe@haskell.org What happens to performance if you compile and link with cabal install --constraint='hashable 1.2' ? G ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Streaming bytes and performance
Are you sure? I use ghc 7.6.2 (compiled with -O2) and without bang patternsfor 1million iterations it blows stack space.With bang patterns it runs in constant space , same as other version? bmaxa@maxa:~/haskell$ ./state +RTS -s5050 52,080 bytes allocated in the heap 3,512 bytes copied during GC 44,416 bytes maximum residency (1 sample(s)) 17,024 bytes maximum slop 1 MB total memory in use (0 MB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 0 colls, 0 par0.00s0.00s 0.s0.s Gen 1 1 colls, 0 par0.00s0.00s 0.0001s0.0001s INITtime0.00s ( 0.00s elapsed) MUT time0.00s ( 0.00s elapsed) GC time0.00s ( 0.00s elapsed) EXITtime0.00s ( 0.00s elapsed) Total time0.00s ( 0.00s elapsed) %GC time 0.0% (6.2% elapsed) Alloc rate0 bytes per MUT second Productivity 100.0% of total user, 0.0% of total elapsed Date: Wed, 20 Mar 2013 08:04:01 +0200 From: to.darkan...@gmail.com To: bm...@hotmail.com CC: haskell-cafe@haskell.org Subject: Re: [Haskell-cafe] Streaming bytes and performance On 03/20/2013 12:47 AM, Branimir Maksimovic wrote: Your problem is that main_6 thunks 'i' and 'a' . If you write (S6 !i !a) - get than there is no problem any more... Nope :( Unfortunately that doesn't change anything. Still allocating... ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Streaming bytes and performance
To: haskell-cafe@haskell.org From: to.darkan...@gmail.com Date: Tue, 19 Mar 2013 23:27:09 +0200 Subject: Re: [Haskell-cafe] Streaming bytes and performance On 03/19/2013 10:49 PM, Konstantin Litvinenko wrote: {-# LANGUAGE BangPatterns #-} import Control.Monad.State.Strict data S6 = S6 !Int !Int main_6 = do let r = evalState go (S6 1 0) print r where go = do (S6 i a) - get if (i == 0) then return a else (put (S6 (i - 1) (a + i))) go main_7 = do let r = go (S6 1 0) print r where go (S6 i a) | i == 0 = a | otherwise = go $ S6 (i - 1) (a + i) main = main_6 main_6 doing constant allocations while main_7 run in constant space. Can you suggest something that improve situation? I don't want to manually unfold all my code that I want to be fast :(. Your problem is that main_6 thunks 'i' and 'a' .If you write (S6 !i !a) - getthan there is no problem any more... Correction - they both run in constant space, that's not a problem. The problem is main_6 doing constant allocation/destroying and main_7 doesn't. No main_6 does not runs in constant space if you dont use bang patterns... ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] What is a Haskell way to implement flags?
In C usual way is to set some bit in integer variable by shifting or oring,and than check flag integer variable by anding with particular flag value.What is Haskell way? Thanks. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] performance question
ByteString gains most improvements as String must be converted o CStringfirst, internaly, in regex (this is warpper for libpcre), while ByteString not.libpcre is much faster than posix (I guess posix is also wrapper).Interface for libpcre is same as for Posix, there is no real effortin replacing it. Date: Tue, 12 Feb 2013 20:32:01 -0800 From: bri...@aracnet.com To: nicolasb...@gmail.com CC: bm...@hotmail.com; b...@redivi.com; haskell-cafe@haskell.org Subject: Re: [Haskell-cafe] performance question On Tue, 12 Feb 2013 15:57:37 -0700 Nicolas Bock nicolasb...@gmail.com wrote: Here is haskell version that is faster than python, almost as fast as c++. You need to install bytestring-lexing package for readDouble. I was hoping Branimir could comment on how the improvements were allocated. how much is due to text.regex.pcre (which looks to be a wrapper to libpcre) ? how much can be attributed to using data.bytestring ? you have to admit, it's amazing how well a byte-compiled, _dynamically typed_ interpreter can do against an actualy native code compiler. Can't regex be done effectively in haskell ? Is it something that can't be done, or is it just such minimal effort to link to pcre that it's not worth the trouble ? Brian ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] performance question
Here is haskell version that is faster than python, almost as fast as c++.You need to install bytestring-lexing package for readDouble. bmaxa@maxa:~/haskell$ time ./printMatrixDecay - output.txtread 16384 matrix elements (128x128 = 16384)[0.00e0, 1.00e-8) = 0 (0.00%) 0[1.00e-8, 1.00e-7) = 0 (0.00%) 0[1.00e-7, 1.00e-6) = 0 (0.00%) 0[1.00e-6, 1.00e-5) = 0 (0.00%) 0[1.00e-5, 1.00e-4) = 1 (0.01%) 1[1.00e-4, 1.00e-3) = 17 (0.10%) 18[1.00e-3, 1.00e-2) = 155 (0.95%) 173[1.00e-2, 1.00e-1) = 1434 (8.75%) 1607[1.00e-1, 1.00e0) = 14777 (90.19%) 16384[1.00e0, 2.00e0) = 0 (0.00%) 16384 real0m0.031suser0m0.028ssys 0m0.000sbmaxa@maxa:~/haskell$ time ./printMatrixDecay.py - output.txt(-) read 16384 matrix elements (128x128 = 16384)[0.00e+00, 1.00e-08) = 0 (0.00%) 0[1.00e-08, 1.00e-07) = 0 (0.00%) 0[1.00e-07, 1.00e-06) = 0 (0.00%) 0[1.00e-06, 1.00e-05) = 0 (0.00%) 0[1.00e-05, 1.00e-04) = 1 (0.00%) 1[1.00e-04, 1.00e-03) = 17 (0.00%) 18[1.00e-03, 1.00e-02) = 155 (0.00%) 173[1.00e-02, 1.00e-01) = 1434 (0.00%) 1607[1.00e-01, 1.00e+00) = 14777 (0.00%) 16384[1.00e+00, 2.00e+00) = 0 (0.00%) 16384 real0m0.081suser0m0.080ssys 0m0.000s Program follows... import System.Environmentimport Text.Printfimport Text.Regex.PCREimport Data.Maybeimport Data.Array.IOimport Data.Array.Unboxedimport qualified Data.ByteString.Char8 as Bimport Data.ByteString.Lex.Double (readDouble) strataBounds :: UArray Int DoublestrataBounds = listArray (0,10) [ 0.0, 1.0e-8, 1.0e-7, 1.0e-6, 1.0e-5, 1.0e-4, 1.0e-3, 1.0e-2, 1.0e-1, 1.0, 2.0 ] newStrataCounts :: IO(IOUArray Int Int)newStrataCounts = newArray (bounds strataBounds) 0 main = dol - B.getContentslet a = B.lines lstrataCounts - newStrataCountsn - calculate strataCounts a 0let printStrataCounts :: IO ()printStrataCounts = dolet s = round $ sqrt (fromIntegral n::Double) :: Intprintf read %d matrix elements (%dx%d = %d)\n n s s nprintStrataCounts' 0 0 printStrataCounts' :: Int - Int - IO ()printStrataCounts' i total | i (snd $ bounds strataBounds) = docount - readArray strataCounts ilet p :: Double p = (100.0*(fromIntegral count) :: Double)/(fromIntegral n :: Double)printf [%1.2e, %1.2e) = %i (%1.2f%%) %i\n (strataBounds ! i) (strataBounds ! (i+1)) count p (total + count) printStrataCounts' (i+1) (total+count)| otherwise = return () printStrataCounts calculate :: IOUArray Int Int - [B.ByteString] - Int - IO Intcalculate _ [] n = return ncalculate counts (l:ls) n = dolet a = case getAllTextSubmatches $ l =~ B.pack matrix.*= ([0-9eE.+-]+)$ :: [B.ByteString] of[_,v] - Just (readDouble v) :: Maybe (Maybe (Double,B.ByteString))_ - Nothingb = (fst.fromJust.fromJust) aloop :: Int - IO()loop i| i (snd $ bounds strataBounds) = if (b = (strataBounds ! i)) (b (strataBounds ! (i+1)))then doc - readArray counts iwriteArray counts i (c+1) else loop (i+1)| otherwise = return ()if isNothing athen calculate counts ls nelse do loop 0calculate counts ls (n+1) From: nicolasb...@gmail.com Date: Fri, 8 Feb 2013 12:26:09 -0700 To: haskell-cafe@haskell.org Subject: [Haskell-cafe] performance question Hi list, I wrote a script that reads matrix elements from standard input, parses the input using a regular expression, and then bins the matrix elements by magnitude. I wrote the same script in python (just to be sure :) ) and find that the python version vastly outperforms the Haskell script. To be concrete: $ time ./createMatrixDump.py -N 128 | ./printMatrixDecayreal0m2.655s user0m2.677ssys 0m0.095s $ time ./createMatrixDump.py -N 128 | ./printMatrixDecay.py - real0m0.445suser0m0.615ssys 0m0.032s The Haskell script was compiled with ghc --make printMatrixDecay.hs. Could you have a look at the script and give me some pointers as to where I could improve it, both in terms of performance and also generally, as I am very new to Haskell. Thanks already, nick ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] performance question
Heh, I have wrote c++ version and that is much faster than python ;) bmaxa@maxa:~/haskell$ time ./createMatrixDump.py -N 128 output.txt real0m0.041suser0m0.040ssys 0m0.000sbmaxa@maxa:~/haskell$ time ./printMatrixDecay.py - output.txt(-) read 16384 matrix elements (128x128 = 16384)[0.00e+00, 1.00e-08) = 0 (0.00%) 0[1.00e-08, 1.00e-07) = 0 (0.00%) 0[1.00e-07, 1.00e-06) = 0 (0.00%) 0[1.00e-06, 1.00e-05) = 0 (0.00%) 0[1.00e-05, 1.00e-04) = 1 (0.00%) 1[1.00e-04, 1.00e-03) = 15 (0.00%) 16[1.00e-03, 1.00e-02) = 149 (0.00%) 165[1.00e-02, 1.00e-01) = 1425 (0.00%) 1590[1.00e-01, 1.00e+00) = 14794 (0.00%) 16384[1.00e+00, 2.00e+00) = 0 (0.00%) 16384 real0m0.081suser0m0.072ssys 0m0.008sbmaxa@maxa:~/haskell$ time ./printMatrixDecay output.txtread 16384 matrix elements (128x128 = 16384)[0.00e+00, 1.00e-08) = 0 (0.00%) 0[1.00e-08, 1.00e-07) = 0 (0.00%) 0[1.00e-07, 1.00e-06) = 0 (0.00%) 0[1.00e-06, 1.00e-05) = 0 (0.00%) 0[1.00e-05, 1.00e-04) = 1 (0.01%) 1[1.00e-04, 1.00e-03) = 15 (0.09%) 16[1.00e-03, 1.00e-02) = 149 (0.91%) 165[1.00e-02, 1.00e-01) = 1425 (8.70%) 1590[1.00e-01, 1.00e+00) = 14794 (90.30%) 16384[1.00e+00, 2.00e+00) = 0 (0.00%) 16384 real0m0.018suser0m0.012ssys 0m0.004s unfortunately g++ does not have regex implemented yet so I used libpcre ... #include pcre.h#include sstream#include cstdio#include cmath#include iostream#include stdexcept#include vector template class Fvoid regex(const std::string in, const std::string pattern,int n,F f){int ovec[3*n],position;const char* error; int errorpos; pcre* pe = pcre_compile(pattern.c_str(),0,error,errorpos,0); if(!pe)throw std::runtime_error(error); pcre_extra* extra=pcre_study(pe,0,error); for(position = 0; pcre_exec(pe,extra,in.c_str(),in.size(),position,0,ovec,3*n)=0; position = ovec[1])f(position,ovec);f(position,ovec);pcre_free(extra); pcre_free(pe); } int main(){ std::ios::sync_with_stdio(false); std::ostringstream oss; oss std::cin.rdbuf(); const std::string in = oss.str(); std::vectordouble strataBounds = { 0.0, 1.0e-8, 1.0e-7, 1.0e-6, 1.0e-5, 1.0e-4, 1.0e-3, 1.0e-2, 1.0e-1, 1.0, 2.0 }; std::vectorint strataCounts(strataBounds.size()); unsigned N = 0; auto f = [](int position,int* ovec) {if(int(position) ovec[0])return;++N;double aij = 0.0;std::istringstream iss(in.substr(ovec[2],ovec[3]-ovec[2]));iss aij;aij=fabs(aij); for(unsigned i = 0; i strataBounds.size() - 1; ++i){ if(aij = strataBounds[i] aij strataBounds[i+1]) {++strataCounts[i]; break; }} }; regex(in,matrix.*= ([0-9.eE+-]+)\n,2,f); printf(read %d matrix elements (%dx%d = %d)\n,N,int(sqrt(N)),int(sqrt(N)),N); int total = 0; for(unsigned i = 0; i strataBounds.size()-1;++i) {total += strataCounts[i];printf([%1.2e, %1.2e) = %d (%1.2f%%) %d\n, strataBounds[i], strataBounds[i+1],strataCounts[i], 100*(double(strataCounts[i])/N), total); }} From: nicolasb...@gmail.com Date: Fri, 8 Feb 2013 12:26:09 -0700 To: haskell-cafe@haskell.org Subject: [Haskell-cafe] performance question Hi list, I wrote a script that reads matrix elements from standard input, parses the input using a regular expression, and then bins the matrix elements by magnitude. I wrote the same script in python (just to be sure :) ) and find that the python version vastly outperforms the Haskell script. To be concrete: $ time ./createMatrixDump.py -N 128 | ./printMatrixDecayreal0m2.655s user0m2.677ssys 0m0.095s $ time ./createMatrixDump.py -N 128 | ./printMatrixDecay.py - real0m0.445suser0m0.615ssys 0m0.032s The Haskell script was compiled with ghc --make printMatrixDecay.hs. Could you have a look at the script and give me some pointers as to where I could improve it, both in terms of performance and also generally, as I am very new to Haskell. Thanks already, nick ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] my Fasta is slow ;(
Thank you. Your entry is great. Faster than fortran entry!Dou you want to contribute at the site, or you want me to do it for you? Date: Thu, 27 Dec 2012 15:58:40 -0800 Subject: Re: [Haskell-cafe] my Fasta is slow ;( From: b...@serpentine.com To: bm...@hotmail.com CC: haskell-cafe@haskell.org On Tue, Dec 18, 2012 at 12:42 PM, Branimir Maksimovic bm...@hotmail.com wrote: Seems to me that culprit is in function random as I have tested rest of codeand didn't found speed related problems. The problem with your original program was that it was not pure enough. Because you stored your PRNG state in an IORef, you forced the program to allocate and case-inspect boxed Ints in its inner loop. I refactored it slightly to make genRand and genRandom pure, and combined with using the LLVM back end, this doubled the program's performance, so that the Haskell program ran at the same speed as your C++ version. The next bottleneck was that your program was performing floating point arithmetic in the inner loop. I changed it to precompute a small lookup table, followed by only using integer arithmetic in the inner loop (the same technique used by the fastest C fasta program). This further improved performance: the new Haskell code is 40% faster than the C++ program, and only ~20% slower than the C program that currently tops the shootout chart. The Haskell source is a little over half the size of the C source. You can follow the work I did here: https://github.com/bos/shootout-fasta ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] my Fasta is slow ;(
This time I have tried fasta benchmark since current entries does notdisplay correct output.Program is copy of mine http://benchmarksgame.alioth.debian.org/u64q/program.php?test=fastalang=gppid=1c++ benchmark, but unfortunately executes more than twice time. Seems to me that culprit is in function random as I have tested rest of codeand didn't found speed related problems. bmaxa@maxa:~/shootout/fasta$ time ./fastahs 2500 /dev/null real0m5.262suser0m5.228ssys 0m0.020s bmaxa@maxa:~/shootout/fasta$ time ./fastacpp 2500 /dev/null real0m2.075suser0m2.056ssys 0m0.012s Since I am planning to contribute program, perhaps someone cansee a problem to speed it up at least around 3.5 secs which is speed of bench that display incorrect result (in 7.6.1). Program follows: {-# LANGUAGE BangPatterns #-}{- The Computer Language Benchmarks Game http://shootout.alioth.debian.org/ contributed by Branimir Maksimovic-} import System.Environmentimport System.IO.Unsafe import Data.IORefimport Data.Array.Unboxedimport Data.Array.Storableimport Data.Array.Baseimport Data.Word import Foreign.Ptrimport Foreign.C.Types type A = UArray Int Word8type B = StorableArray Int Word8type C = (UArray Int Word8,UArray Int Double) foreign import ccall unsafe stdio.h puts :: Ptr a - IO ()foreign import ccall unsafe string.h strlen :: Ptr a - IO CInt main :: IO () main = don - getArgs = readIO.head let !a = (listArray (0,(length alu)-1) $ map (fromIntegral. fromEnum) alu:: A)make ONE Homo sapiens alu (n*2) $ Main.repeat a (length alu)make TWO IUB ambiguity codes (n*3) $ random iubmake THREE Homo sapiens frequency (n*5) $ random homosapiens make :: String - String - Int - IO Word8 - IO (){-# INLINE make #-}make id desc n f = dolet lst = ++ id ++ ++ desca - (newListArray (0,length lst) $ map (fromIntegral. fromEnum) lst:: IO B) unsafeWrite a (length lst) 0pr amake' n 0where make' :: Int - Int - IO ()make' !n !i = dolet line = (unsafePerformIO $ newArray (0,60) 0 :: B)if n 0 then do!c - funsafeWrite line i c if i+1 = 60 then do pr linemake' (n-1) 0 else make' (n-1) (i+1)else do unsafeWrite line i 0l - len line if l /= 0then pr line else return () pr :: B - IO ()pr line = withStorableArray line (\ptr - puts ptr)len :: B - IO CIntlen line = withStorableArray line (\ptr - strlen ptr) repeat :: A - Int - IO Word8repeat xs !n = dolet v = unsafePerformIO $ newIORef 0!i - readIORef vif i+1 = nthen writeIORef v 0else writeIORef v (i+1)return $ xs `unsafeAt` i random :: C - IO Word8random (a,b) = do !rnd - randlet find :: Int - IO Word8find !i = let !c = a `unsafeAt` i!p = b `unsafeAt` i in if p = rndthen return celse find (i+1)find 0 rand :: IO Double{-# INLINE rand #-}rand = do!seed - readIORef lastlet newseed = (seed * ia + ic) `rem` imnewran = fromIntegral newseed * rimdrimd = 1.0 / (fromIntegral im)im, ia, ic :: Intim = 139968ia = 3877ic = 29573writeIORef last newseedreturn newranwhere last = unsafePerformIO $ newIORef 42 alu:: [Char]alu = GGCCGGGCGCGGTGGCTCACGCCTGTAATCCCAGCACTTTGG\ \GAGGCCGAGGCGGGCGGATCACCTGAGGTCAGGAGTTCGAGA\ \CCAGCCTGGCCAACATGGTGAAAGTCTCTACTAT\ \ACATTAGCCGGGCGTGGTGGCGCGCGCCTGTAATCCCA\ \GCTACTCGGGAGGCTGAGGCAGGAGAATCGCTTGAACCCGGG\ \AGGCGGAGGTTGCAGTGAGCCGAGATCGCGCCACTGCACTCC\ \AGCCTGGGCGACAGAGCGAGACTCCGTCTCA mkCum :: [(Char,Double)] - [(Word8,Double)]mkCum lst = map (\(c,p) - ((fromIntegral.fromEnum) c,p)) $ scanl1 (\(_,p) (c',p') - (c', p+p')) lst homosapiens, iub :: C iub' = mkCum [('a',0.27),('c',0.12),('g',0.12),('t',0.27),('B',0.02) ,('D',0.02),('H',0.02),('K',0.02),('M',0.02),('N',0.02) ,('R',0.02),('S',0.02),('V',0.02),('W',0.02),('Y',0.02)] homosapiens' = mkCum [('a',0.3029549426680),('c',0.1979883004921) ,('g',0.1975473066391),('t',0.3015094502008)] iub = (listArray (0, (length iub')-1) $ map fst iub',listArray (0, (length iub')-1) $ map snd iub') homosapiens = (listArray (0, (length homosapiens')-1) $ map fst homosapiens', listArray (0, (length homosapiens')-1) $ map snd homosapiens
Re: [Haskell-cafe] Help optimize fannkuch program
Here it is :http://shootout.alioth.debian.org/u64/program.php?test=fannkuchreduxlang=ghcid=4 Date: Mon, 3 Dec 2012 15:32:20 -0800 Subject: Re: [Haskell-cafe] Help optimize fannkuch program From: b...@serpentine.com To: bm...@hotmail.com CC: haskell-cafe@haskell.org On Mon, Dec 3, 2012 at 11:18 AM, Branimir Maksimovic bm...@hotmail.com wrote: Thanks ! Should I contribute your version on shootout site? Do whatever you like with it. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Help optimize fannkuch program
Thanks. Your version is much faster.Yes, I have compiled with ghc --make -O2 -fllvm -optlo-O3 -optlo-constprop fannkuchredux4.hs(there is bug in ghc 7.4.2 regarding llvm 3.1 which is circumvented with constrprop) results: yours:bmaxa@maxa:~/shootout/fannkuchredux$ time ./fannkuchredux4 123968050Pfannkuchen(12) = 65 real0m39.200suser0m39.132ssys 0m0.044s mine:bmaxa@maxa:~/shootout/fannkuchredux$ time ./fannkuchredux 123968050Pfannkuchen(12) = 65 real0m50.784suser0m50.660ssys 0m0.092s Seems that you machine is faster than mine and somewhat better for executing mine version.Thanks ! Should I contribute your version on shootout site? Date: Mon, 3 Dec 2012 00:01:32 -0800 Subject: Re: [Haskell-cafe] Help optimize fannkuch program From: b...@serpentine.com To: bm...@hotmail.com CC: haskell-cafe@haskell.org On Sun, Dec 2, 2012 at 3:12 PM, Branimir Maksimovic bm...@hotmail.com wrote: Well, playing with Haskell I have literally trasnlated my c++ program http://shootout.alioth.debian.org/u64q/program.php?test=fannkuchreduxlang=gppid=3 and got decent performance but not that good in comparisonwith c++ On my machine Haskell runs 52 secs while c++ 30 secs. Did you compile with -O2 -fllvm? On my machine: C++ 28 secMine -O2 -fllvm 37 secYours -O2 -fllvm 41 secMine -O2 48 secYours -O2 54 sec My version of your Haskell code is here: http://hpaste.org/78705 ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Help optimize fannkuch program
Well, playing with Haskell I have literally trasnlated my c++ program http://shootout.alioth.debian.org/u64q/program.php?test=fannkuchreduxlang=gppid=3and got decent performance but not that good in comparisonwith c++ On my machine Haskell runs 52 secs while c++ 30 secs.(There is Haskell entry that is fastest but unfortunately does not runs on test machine is on par with c++http://shootout.alioth.debian.org/u64q/program.php?test=fannkuchreduxlang=ghcid=3)There is something which I have missing since programsare identical.Aa with previous entries you gurus here helped a lot in both helpand learning experience.I simply love Haskell ;)I plan to contribute this program as it is much faster than current runningentry http://shootout.alioth.debian.org/u64q/program.php?test=fannkuchreduxlang=ghcid=2even if it is multithreaded and my is not. This is program: {-# LANGUAGE CPP, BangPatterns #-}{- The Computer Language Benchmarks Game http://shootout.alioth.debian.org/ contributed by Branimir Maksimovic -} import System.Environmentimport Text.Printfimport Data.Bits import qualified Data.Vector.Unboxed.Mutable as VMimport qualified Data.Vector.Generic.Mutable as VGimport qualified Data.Vector.Unboxed as V main = do n - getArgs = readIO.head(checksum,maxflips) - fannkuch n printf %d\nPfannkuchen(%d) = %d\n checksum n maxflips fannkuch n = do !perm - V.unsafeThaw $ V.fromList [1..n] !tperm - VG.new n !cnt - VG.replicate n 0 let loop :: Int - Int - Int - IO(Int,Int)loop !c !m !pc = do !b - next_permutation perm n cnt if b == False then return (c,m) else do VM.unsafeCopy tperm perm!flips - count_flips tperm 0 loop (c + (if pc .. 1 == 0 then flips else -flips)) (max m flips) (pc+1) r - loop 0 0 1 return r next_permutation :: VM.IOVector Int - Int - VM.IOVector Int- IO(Bool)next_permutation !perm !n !cnt =do !i - loop 1 if(i = n) then return False else do !v - VM.unsafeRead cnt i VM.unsafeWrite cnt i (v+1) return True where loop :: Int - IO(Int) loop !i | i n = do !tmp - VM.unsafeRead perm 0let rotate :: Int - IO() rotate !j = if j = i then do VM.unsafeWrite perm i tmp return () else do !v - VM.unsafeRead perm (j+1) VM.unsafeWrite perm j v rotate (j+1) rotate 0 !v - VM.unsafeRead cnt i if v = i then do VM.unsafeWrite cnt i 0 loop (i+1) else return i | otherwise = return i count_flips :: VM.IOVector Int - Int - IO(Int)count_flips !tperm !flips = do !f - VM.unsafeRead tperm 0 if f == 1 then return flipselse do VG.reverse $ VM.unsafeSlice 0 f tperm count_flips tperm (flips+1) ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] How to incrementally update list
I want to simulate some calculation that does that.For example n-body simulation.Anyway this is solved ;) Date: Fri, 30 Nov 2012 13:25:57 -0800 Subject: Re: [Haskell-cafe] How to incrementally update list From: kc1...@gmail.com To: bm...@hotmail.com CC: haskell-cafe@haskell.org Why do you want to incrementally update this list a lot of times? The question would affect the answer you get; i.e. some context (non-monadically speaking). :D On Wed, Nov 28, 2012 at 3:43 AM, Branimir Maksimovic bm...@hotmail.com wrote: Problem is following short program: list = [1,2,3,4,5] advance l = map (\x - x+1) l run 0 s = s run n s = run (n-1) $ advance s main = do let s = run 5000 list putStrLn $ show s I want to incrementally update list lot of times, but don't know how to do this. Since Haskell does not have loops I have to use recursion, but problem is that recursive calls keep previous/state parameter leading to excessive stack.and memory usage. I don't know how to tell Haskell not to keep previous state rather to release so memory consumption becomes managable. Is there some solution to this problem as I think it is rather common? ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe -- -- Regards, KC ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Why is boxed mutable array so slow?
I have made benchmark test inspired by http://lemire.me/blog/archives/2012/07/23/is-cc-worth-it/ What surprised me is that unboxed array is much faster than boxed array.Actually boxed array performance is on par with standard Haskell listwhich is very slow.All in all even unboxed array is about 10 times slower than Java version.I don't understand why is even unboxed array so slow.But! unboxed array consumes least amount of RAM.(warning, program consumes more than 3gb of ram) bmaxa@maxa:~/examples$ time ./Cumulboxed arraylast 262486571 seconds 4.972unboxed arraylast 262486571 seconds 0.776listlast 262486571 seconds 6.812 real0m13.086suser0m11.996ssys 0m1.080s -{-# LANGUAGE CPP, BangPatterns #-}import System.CPUTimeimport Text.Printfimport Data.Array.IOimport Data.Array.Baseimport Data.Intimport Control.DeepSeqimport System.Mem main :: IO()main = do (newArray_ (0,n'-1) :: IO(A)) = test boxed array performGC (newArray_ (0,n'-1) :: IO(B)) = test unboxed array performGC begin - getCPUTime printf list\nlast %d $ last $ force $ take n' $ sum' data'end - getCPUTime let diff = (fromIntegral (end - begin)) / (10^12) printf seconds %.3f\n (diff::Double) test s a = do putStrLn s begin - getCPUTime init' a partial_sum a end - getCPUTime let diff = (fromIntegral (end - begin)) / (10^12) last - readArray a (n'-1) printf last %d seconds %.3f\n last (diff::Double) n' :: Intn' = 50 * 1000 * 1000 type A = IOArray Int Int32type B = IOUArray Int Int32 init' a = do(_,n) - getBounds ainit a 0 n where init a k n | k n = return () | otherwise = dolet !x = fromIntegral $ k + k `div` 3 unsafeWrite a k x init a (k+1) n partial_sum a = do (_,n) - getBounds a k - unsafeRead a 0 ps a 1 n k where ps a i n s | i n = return () | otherwise = do k - unsafeRead a i let !l = fromIntegral $ s + k unsafeWrite a i l ps a (i+1) n l data' :: [Int32]data' = [k + k `div` 3 | k - [0..] ] sum' = scanl1 (+) ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Why is boxed mutable array so slow?
Wow that sped it up 5 times.I see that boxed Vector is 25% faster than IOArray.What is the difference and when to use Vector,when IOArray?Thanks! bmaxa@maxa:~/examples$ time ./Cumul +RTS -A1600Mboxed arraylast 262486571 seconds 1.196unboxed arraylast 262486571 seconds 0.748boxed vectorlast 262486571 seconds 0.908unboxed vectorlast 262486571 seconds 0.720 real0m3.805suser0m3.428ssys 0m0.372s Date: Sat, 1 Dec 2012 12:20:37 -0500 Subject: Re: [Haskell-cafe] Why is boxed mutable array so slow? From: don...@gmail.com To: bm...@hotmail.com CC: haskell-cafe@haskell.org The obvious difference between boxed and unboxed arrays is that the boxed arrays are full of pointers to heap allocated objects. This means you pay indirection to access the values, much more time in GC spent chasing pointers (though card marking helps), and generally do more allocation. Compare the GC stats below, for * Boxed vector: 88M bytes copied; 75% of time in GC, 0.472s * Unboxed vector: 11k bytes copied, 1.3% of time in GC, 0.077s So there's your main answer. The increased data density of unboxed arrays also helps a too. Now, you can help out the GC signifcantly by hinting at how much you're going to allocated in the youngest generation (see the ghc-gc-tune app for a methodical approach to this, though it needs updating to ghc 7 -- http://donsbot.wordpress.com/2010/07/05/ghc-gc-tune-tuning-haskell-gc-settings-for-fun-and-profit/ and http://stackoverflow.com/questions/3171922/ghcs-rts-options-for-garbage-collection ). Use the +RTS -A flag to set an initial youngest generation heap size to the size of your array, and watch the GC cost disappear. For our boxed vector, we'd use +RTS -A50M, resulting in: * Boxed vector: 8k copied, 1% of time in GC, 0.157s So not bad. 3x speedup through a RTS flag. -A is very useful if you are working with boxed, mutable arrays. For reference, there's a generic version below that specializes based on the vector type parameter. - {-# LANGUAGE BangPatterns #-} import System.CPUTime import Text.Printf import Data.Int import Control.DeepSeq import System.Mem import qualified Data.Vector.Mutable as V import qualified Data.Vector.Unboxed.Mutable as U import qualified Data.Vector.Generic.Mutable as G main :: IO() main = do -- (G.new n' :: IO (V.IOVector Int32)) = test' boxed vector -- performGC (G.new n' :: IO (U.IOVector Int32)) = test' unboxed vector performGC test' s a = do putStrLn s begin - getCPUTime init'' a partial_sum' a end - getCPUTime let diff = (fromIntegral (end - begin)) / (10**12) last - G.read a (n'-1) printf last %d seconds %.3f\n last (diff::Double) n' :: Int n' = 1000 * 1000 init'' !a = init 0 (n'-1) where init :: Int - Int - IO () init !k !n | k n = return () | otherwise = do let !x = fromIntegral $ k + k `div` 3 G.write a k x init (k+1) n partial_sum' !a = do k - G.read a 0 ps 1 (n'-1) k where ps :: Int - Int - Int32 - IO () ps i n s | i n = return () | otherwise = do k - G.read a i let !l = fromIntegral $ s + k G.write a i l ps (i+1) n l - $ time ./A +RTS -s boxed vector last 945735787 seconds 0.420 40,121,448 bytes allocated in the heap 88,355,272 bytes copied during GC 24,036,456 bytes maximum residency (6 sample(s)) 380,632 bytes maximum slop 54 MB total memory in use (0 MB lost due to fragmentation) %GC time 75.2% (75.9% elapsed) Alloc rate359,655,602 bytes per MUT second ./A +RTS -s 0.40s user 0.07s system 98% cpu 0.475 total $ time ./A +RTS -s unboxed vector last 945735787 seconds 0.080 4,113,568 bytes allocated in the heap 11,288 bytes copied during GC 4,003,256 bytes maximum residency (3 sample(s)) 182,856 bytes maximum slop 5 MB total memory in use (0 MB lost due to fragmentation) %GC time 1.3% (1.3% elapsed) Alloc rate51,416,660 bytes per MUT second ./A +RTS -s 0.08s user 0.01s system 98% cpu 0.088 total $ time ./A +RTS -A50M -s boxed vector last 945735787 seconds 0.127 40,121,504 bytes allocated in the heap 8,032 bytes copied during GC 44,704 bytes maximum residency (2 sample(s)) 20,832 bytes maximum slop 59 MB total memory in use (0 MB lost due to fragmentation) %GC time 1.0% (1.0% elapsed) Productivity 97.4% of total user, 99.6% of total elapsed ./A +RTS -A50M -s 0.10s user 0.05s system 97% cpu 0.157 total - On Sat, Dec 1, 2012 at 11:09 AM, Branimir Maksimovic bm...@hotmail.com wrote
Re: [Haskell-cafe] Why is boxed mutable array so slow?
boxed vector last 945735787 seconds 0.127 40,121,504 bytes allocated in the heap 8,032 bytes copied during GC 44,704 bytes maximum residency (2 sample(s)) 20,832 bytes maximum slop 59 MB total memory in use (0 MB lost due to fragmentation) %GC time 1.0% (1.0% elapsed) Productivity 97.4% of total user, 99.6% of total elapsed ./A +RTS -A50M -s 0.10s user 0.05s system 97% cpu 0.157 total - On Sat, Dec 1, 2012 at 11:09 AM, Branimir Maksimovic bm...@hotmail.com wrote: I have made benchmark test inspired by http://lemire.me/blog/archives/2012/07/23/is-cc-worth-it/ What surprised me is that unboxed array is much faster than boxed array. Actually boxed array performance is on par with standard Haskell list which is very slow. All in all even unboxed array is about 10 times slower than Java version. I don't understand why is even unboxed array so slow. But! unboxed array consumes least amount of RAM. (warning, program consumes more than 3gb of ram) bmaxa@maxa:~/examples$ time ./Cumul boxed array last 262486571 seconds 4.972 unboxed array last 262486571 seconds 0.776 list last 262486571 seconds 6.812 real0m13.086s user0m11.996s sys 0m1.080s - {-# LANGUAGE CPP, BangPatterns #-} import System.CPUTime import Text.Printf import Data.Array.IO import Data.Array.Base import Data.Int import Control.DeepSeq import System.Mem main :: IO() main = do (newArray_ (0,n'-1) :: IO(A)) = test boxed array performGC (newArray_ (0,n'-1) :: IO(B)) = test unboxed array performGC begin - getCPUTime printf list\nlast %d $ last $ force $ take n' $ sum' data' end - getCPUTime let diff = (fromIntegral (end - begin)) / (10^12) printf seconds %.3f\n (diff::Double) test s a = do putStrLn s begin - getCPUTime init' a partial_sum a end - getCPUTime let diff = (fromIntegral (end - begin)) / (10^12) last - readArray a (n'-1) printf last %d seconds %.3f\n last (diff::Double) n' :: Int n' = 50 * 1000 * 1000 type A = IOArray Int Int32 type B = IOUArray Int Int32 init' a = do (_,n) - getBounds a init a 0 n where init a k n | k n = return () | otherwise = do let !x = fromIntegral $ k + k `div` 3 unsafeWrite a k x init a (k+1) n partial_sum a = do (_,n) - getBounds a k - unsafeRead a 0 ps a 1 n k where ps a i n s | i n = return () | otherwise = do k - unsafeRead a i let !l = fromIntegral $ s + k unsafeWrite a i l ps a (i+1) n l data' :: [Int32] data' = [k + k `div` 3 | k - [0..] ] sum' = scanl1 (+) ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Why is boxed mutable array so slow?
Wow, now performance is on par with Java ;)So slow division was main problem, that and GC . Thanks! From: daniel.is.fisc...@googlemail.com To: haskell-cafe@haskell.org CC: bm...@hotmail.com Subject: Re: [Haskell-cafe] Why is boxed mutable array so slow? Date: Sat, 1 Dec 2012 21:12:29 +0100 On Samstag, 1. Dezember 2012, 16:09:05, Branimir Maksimovic wrote: All in all even unboxed array is about 10 times slower than Java version. I don't understand why is even unboxed array so slow. It's not the unboxed arrays that are slow. Your code has a couple of weak spots, and GHC's native code generator has a weakness that bites here. On my box, I don't quite have a 10× difference to my translation to Java, it's a bit less than 7× (0.82s vs 0.12s - I don't want to bring my box to its knees by running something that takes 3GB+ of RAM, so I run the unboxed array part only) with the LLVM backend and 8× (0.93s) with the native code generator. That's in the same ballpark, though. So what's the deal? Main.main_$s$wa1 [Occ=LoopBreaker] :: GHC.Prim.Int# - GHC.Prim.Int# - GHC.Prim.State# GHC.Prim.RealWorld - GHC.Types.Int - GHC.Types.Int - GHC.Types.Int - ... Your loops carry boxed Ints around, that's always a bad sign. In this case it doesn't hurt too much, however, since these values are neither read nor substituted during the loop (they're first and last index of the array and number of elements). Additionally, they carry an IOUArray constructor around. That is unnecessary. Eliminating a couple of dead parameters init' a = do (_,n) - getBounds a let init k | k n = return () | otherwise = do let x = fromIntegral $ k + k `div` 3 unsafeWrite a k x init (k+1) init 0 partial_sum a = do (_,n) - getBounds a let ps i s | i n = return () | otherwise = do k - unsafeRead a i let l = s + k unsafeWrite a i l ps (i+1) l k - unsafeRead a 0 ps 1 k brings the time for the native code generator down to 0.82s, and for the LLVM backend the time remains the same. Next problem, you're using `div` for the division. `div` does some checking and potentially fixup (not here, since everything is non-negative) after the machine division because `div` is specified to satisfy a = (a `div` b) * b + (a `mod` b) with 0 = a `mod` b abs b. That is in itself slower than the pure machine division you get with quot. So let's see what we get with `quot`. 0.65s with the native code generator, and 0.13 with the LLVM backend. Whoops, what's that? The problem is, as can be seen by manually replacing k `quot` 3 with (k *2863311531) `shiftR` 33 (requires 64-bit Ints; equivalent in Java: k*28..1L 33), when the native backend, the LLVM backend and Java (as well as C) all take more or less the same time [well, the NCG is a bit slower than the other two, 0.11s, 0.11s, 0.14s], that division is a **very** slow operation. Java and LLVM know how to replace the division by the constant 3 with a mulitplication, a couple of shifts and an addition (since we never have negative numbers here, just one multiplication and shift suffice, but neither Java nor LLVM can do that on their own because it's not guaranteed by the type). The native code generator doesn't - not yet. So the programme spends the majority of the time dividing. The array reads and writes are on par with Java's (and, for that matter, C's). If you make the divisor a parameter instead of a compile time constant, the NCG is not affected at all, the LLVM backend gives you equal performance (it can't optimise a division by a divisor it doesn't know). Java is at an advantage there, after a while the JIT sees that it might be a good idea to optimise the division and so its time only trebles. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] How to incrementally update list
Problem is following short program:list = [1,2,3,4,5] advance l = map (\x - x+1) l run 0 s = srun n s = run (n-1) $ advance s main = dolet s = run 5000 listputStrLn $ show s I want to incrementally update list lot of times, but don't knowhow to do this.Since Haskell does not have loops I have to use recursion,but problem is that recursive calls keep previous/state parameterleading to excessive stack.and memory usage.I don't know how to tell Haskell not to keep previousstate rather to release so memory consumption becomesmanagable. Is there some solution to this problem as I think it is rathercommon? ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] How to incrementally update list
Thank you very much! That solved it ;)I had to put explicit type signature in front of advance in order to compile From: cgae...@uwaterloo.ca Date: Wed, 28 Nov 2012 08:01:38 -0500 Subject: Re: [Haskell-cafe] How to incrementally update list To: edwards.b...@gmail.com CC: bm...@hotmail.com; haskell-cafe@haskell.org Here's a version that works: import Control.DeepSeq list = [1,2,3,4,5] advance l = force $ map (\x - x+1) l run 0 s = srun n s = run (n-1) $ advance s main = dolet s = run 5000 listputStrLn $ show s The problem is that you build of a huge chain of updates to the list. If we just commit each update as it happens, we'll use a constant amount of memory. Haskell's laziness is tricky to understand coming from imperative languages, but once you figure out its evaluation rules, you'll begin to see the elegance. Ηope this helps, - Clark On Wed, Nov 28, 2012 at 7:07 AM, Benjamin Edwards edwards.b...@gmail.com wrote: TCO + strictnesses annotations should take care of your problem. On 28 Nov 2012 11:44, Branimir Maksimovic bm...@hotmail.com wrote: Problem is following short program:list = [1,2,3,4,5] advance l = map (\x - x+1) l run 0 s = srun n s = run (n-1) $ advance s main = dolet s = run 5000 listputStrLn $ show s I want to incrementally update list lot of times, but don't knowhow to do this. Since Haskell does not have loops I have to use recursion,but problem is that recursive calls keep previous/state parameterleading to excessive stack.and memory usage.I don't know how to tell Haskell not to keep previous state rather to release so memory consumption becomesmanagable. Is there some solution to this problem as I think it is rathercommon? ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Problem with benchmarking FFI calls with Criterion
I don't see such behavior neither.ubuntu 12.10, ghc 7.4.2. Perhaps this has to do with how malloc allocates /cachebehavior. If you try not to allocate array rather use existing one perhaps there would be no inconsistency?It looks to me that's about CPU cache performance. Branimir I'm using GHC 7.4.2 on x86_64 openSUSE Linux, kernel 2.6.37.6. Janek Dnia piątek, 23 listopada 2012, Edward Z. Yang napisał: Running the sample code on GHC 7.4.2, I don't see the one fast, rest slow behavior. What version of GHC are you running? Edward Excerpts from Janek S.'s message of Fri Nov 23 13:42:03 -0500 2012: What happens if you do the benchmark without unsafePerformIO involved? I removed unsafePerformIO, changed copy to have type Vector Double - IO (Vector Double) and modified benchmarks like this: bench C binding $ whnfIO (copy signal) I see no difference - one benchmark runs fast, remaining ones run slow. Janek Excerpts from Janek S.'s message of Fri Nov 23 10:44:15 -0500 2012: I am using Criterion library to benchmark C code called via FFI bindings and I've ran into a problem that looks like a bug. The first benchmark that uses FFI runs correctly, but subsequent benchmarks run much longer. I created demo code (about 50 lines, available at github: https://gist.github.com/4135698 ) in which C function copies a vector of doubles. I benchmark that function a couple of times. First run results in avarage time of about 17us, subsequent runs take about 45us. In my real code additional time was about 15us and it seemed to be a constant factor, not relative to correct run time. The surprising thing is that if my C function only allocates memory and does no copying: double* c_copy( double* inArr, int arrLen ) { double* outArr = malloc( arrLen * sizeof( double ) ); return outArr; } then all is well - all runs take similar amount of time. I also noticed that sometimes in my demo code all runs take about 45us, but this does not seem to happen in my real code - first run is always shorter. Does anyone have an idea what is going on? Janek ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
RE: [Haskell-cafe] x86 code generation going wrong?
From: Chris Kuklewicz [EMAIL PROTECTED] To: haskell-cafe@haskell.org Subject: [Haskell-cafe] x86 code generation going wrong? Date: Sat, 07 Jan 2006 16:18:59 + Hello, I need to ask for some help to test x86 code generation. There is a factor of two runtime difference between the code I am benchmarking on my OS X powerbook G4 (ghc 6.4.1) and shootout's speed on a linux x86 machine (ghc 6.4.1). Could someone else running on x86 test the three versions pasted below before I think about submitting another one to the shootout? Here are the tests on P4 2.4 ghz and athlon 64 3000 linux test1-3 in respective order of appearance (note:OPTIONS didn't do anything I have to compile -O2 -fglasgow-exts explicitely, because I've got compile error for test3.hs ) [EMAIL PROTECTED] ~/haskell/myhaskell] $ time ./test1 sum-file-test-input 400 real0m3.550s user0m3.440s sys 0m0.080s [EMAIL PROTECTED] ~/haskell/myhaskell] $ time ./test2 sum-file-test-input 400 real0m3.708s user0m3.660s sys 0m0.060s [EMAIL PROTECTED] ~/haskell/myhaskell] $ time ./test3 sum-file-test-input 400 real0m3.678s user0m3.620s sys 0m0.050s This is on athlon64 3000 , linux : [EMAIL PROTECTED] ~]$ time ./test1 sum-file-test-input 400 real0m5.782s user0m5.724s sys 0m0.056s [EMAIL PROTECTED] ~]$ time ./test2 sum-file-test-input 400 real0m5.953s user0m5.900s sys 0m0.052s [EMAIL PROTECTED] ~]$ time ./test3 sum-file-test-input 400 real0m5.403s user0m5.332s sys 0m0.072s Greetings, Bane. To compile ghc --make filename.hs -o program To run cat input-file | time ./program where to save space, the gzip'd input file is at http://paradosso.mit.edu/~ckuklewicz/sum-file-test-input.gz - -- Original version {-# OPTIONS -O2 #-} import Char( ord ) main :: IO () main = getContents = print . accP 0 0 accP :: Int - Int - String - Int accP before this [] = before+this accP before this ('\n':xs) = accP (before+this) 0xs accP before this ('-' :xs) = accN before this xs accP before this (x :xs) = accP before (this*10+ord(x)-ord('0')) xs accN :: Int - Int - String - Int accN before this [] = before-this accN before this ('\n':xs) = accP (before-this) 0xs accN before this (x :xs) = accN before (this*10+ord(x)-ord('0')) xs - -- Faster on G4, 2x slower on x86 {-# OPTIONS -O2 -funbox-strict-fields #-} import GHC.Base data I = I !Int main = print . new (I 0) = getContents new (I i) [] = i new (I i) ('-':xs) = neg (I 0) xs where neg (I n) ('\n':xs) = new (I (i - n)) xs neg (I n) (x :xs) = neg (I (parse x + (10 * n))) xs new (I i) (x:xs) = pos (I (parse x)) xs where pos (I n) ('\n':xs) = new (I (i + n)) xs pos (I n) (x :xs) = pos (I (parse x + (10 * n))) xs parse c = ord c - ord '0' - -- Explicitly unboxed proposal, faster on G4 {-# OPTIONS -fglasgow-exts -O2 #-} import GHC.Base main = print . sumFile = getContents where sumFile = (\rest - newLine rest 0#) newLine [] rt = (I# rt) newLine ('-':rest) rt = negLine rest 0# where negLine ('\n':rest) soFar = newLine rest (rt -# soFar) negLine ( x :rest) soFar = negLine rest (d2i x +# (10# *# soFar)) newLine (x:rest) rt = posLine rest (d2i x) where posLine ('\n':rest) soFar = newLine rest (rt +# soFar) posLine ( x :rest) soFar = posLine rest (d2i x +# (10# *# soFar)) d2i (C# c) = (ord# c) -# z where z = ord# '0'# - Thanks, Chris ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe _ FREE pop-up blocking with the new MSN Toolbar - get it now! http://toolbar.msn.click-url.com/go/onm00200415ave/direct/01/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] x86 code generation going wrong?
From: Chris Kuklewicz [EMAIL PROTECTED] To: [EMAIL PROTECTED], Haskell Cafe haskell-cafe@haskell.org Subject: Re: [Haskell-cafe] x86 code generation going wrong? Date: Sun, 08 Jan 2006 20:33:57 + Brian Sniffen wrote: The first couldn't even complete on my 2.26 GHz Celeron! It's only got 512 MB of RAM, which may be part of the problem. I should not leak memory but it may be an optimization problem. Try explicitly using ghc -O2 -funbox-strict-fields. On p4. 2.4 ghz 512mb first example takes about 2.5 mb of ram. I've compiled explicitelly with -O2, because without optimisations it takes ridicilously large amount of RAM. I've changed {# OPTIONS to OPTIONS_GHC as Bulat sugested but still no effect. Options have to be specified on command line in order to work. Greetings, Bane. _ Express yourself instantly with MSN Messenger! Download today it's FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] x86 code generation going wrong?
From: [EMAIL PROTECTED] (Donald Bruce Stewart) To: Branimir Maksimovic [EMAIL PROTECTED] CC: [EMAIL PROTECTED], [EMAIL PROTECTED],haskell-cafe@haskell.org Subject: Re: [Haskell-cafe] x86 code generation going wrong? Date: Mon, 9 Jan 2006 11:15:51 +1100 bmaxa: From: Chris Kuklewicz [EMAIL PROTECTED] To: [EMAIL PROTECTED], Haskell Cafe haskell-cafe@haskell.org Subject: Re: [Haskell-cafe] x86 code generation going wrong? Date: Sun, 08 Jan 2006 20:33:57 + Brian Sniffen wrote: The first couldn't even complete on my 2.26 GHz Celeron! It's only got 512 MB of RAM, which may be part of the problem. I should not leak memory but it may be an optimization problem. Try explicitly using ghc -O2 -funbox-strict-fields. On p4. 2.4 ghz 512mb first example takes about 2.5 mb of ram. I've compiled explicitelly with -O2, because without optimisations it takes ridicilously large amount of RAM. I've changed {# OPTIONS to OPTIONS_GHC as Bulat sugested but still no effect. Options have to be specified on command line in order to work. Ensure that the {-# OPTIONS ... #-} lines is the *first* line of the file, and that no comments precede it. Aaah, I didn't knew that. Now this works, thanks! Greetings, Bane. _ FREE pop-up blocking with the new MSN Toolbar - get it now! http://toolbar.msn.click-url.com/go/onm00200415ave/direct/01/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: Shootout summary
This pidigit program is not mine, but original authors of algorithm. I've just added print function. It is idiomatic Haskell, pi is pure function that generates inifinite list of digits, and on two machinas I've tested p4 2.4 ghz and amd athlon 64 3000 it's about some small percentage ( 5%) faster then one submited for benchmark. Who knows on test machines could be some percentage slower, but what is important is that it is not optimised in any way and does not use monads, but is very fast. Greetings, Bane. From: [EMAIL PROTECTED] (Donald Bruce Stewart) To: Chris Kuklewicz [EMAIL PROTECTED] CC: Haskell Cafe haskell-cafe@haskell.org Subject: Re: [Haskell-cafe] Re: Shootout summary Date: Sat, 7 Jan 2006 16:10:06 +1100 dons: haskell: Summary of things entered and of things being worked on. Donald Bruce Stewart wrote: haskell: Simon Marlow wrote: Hi Chris, Rather than try to explain what I'm going on about, I decided to tweak the code a bit myself. My version is about 10% faster than yours, and doesn't use any explicit unboxery. I've put it in the wiki after your version. http://www.haskell.org/hawiki/ChameneosEntry Could someone upload this to the shootout? Cheers, Simon So no one else tries to submit this: I have just sent it to the shootout. Perhaps we should submit some of the other entires on the wiki too? -- Don I updated their wiki pages before, but the following submissions are in the pipeline. I am using https://alioth.debian.org/tracker/index.php?group_id=30402atid=411646 as a way to query the Category == Haskell GHC State == Any Order by == ID Descending On 6 Jan, I submitted: sum-file was submitted (using Simon style implicit unboxery) fasta was submitted (using Simon style implicit unboxery and which should fix the memory leak) chameneos was submitted (where Simon made the unboxing implicit) On 5 Jan: pidigit was submitted (Branimir Maksimovic's) By the way, this pidigit seems to be several times slower than the one already submitted. My apologies to the author, it's only a few percent, not several times slower. -- Don ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe _ Express yourself instantly with MSN Messenger! Download today it's FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
RE: Re[2]: [Haskell-cafe] Re: Haskell Speed
From: Bulat Ziganshin [EMAIL PROTECTED] Reply-To: Bulat Ziganshin [EMAIL PROTECTED] To: Branimir Maksimovic [EMAIL PROTECTED] CC: [EMAIL PROTECTED], haskell-cafe@haskell.org Subject: Re[2]: [Haskell-cafe] Re: Haskell Speed Date: Fri, 30 Dec 2005 17:56:57 +0300 Hello Branimir, Friday, December 30, 2005, 3:44:26 AM, you wrote: BM myHashString = fromIntegral . ff''' . ff'' . ff' . foldr f 0 i use the following hash function in my program: filenameHash = fromIntegral . foldl (\h c - h*37+(ord c)) 0 (it's written without explicit parameter in order to try to help GHC machinery better optimize this function) I've found that hasing function and anything that does calculation is fast enough (compared imported C function and Haskell, and no real difference, Haskell is fast regarding calculations) , problem is with memory leak. In this case hashing function have to be strong in order to avoid linear search. when memory leak with HashTable dissapears I will use some even better hash functions. BM All in all functional version consumes less memory and is twice as BM fast. IOArrays is second-class citizens in GHC/Haskell. they are scanned on _each_ GC, and this can substantially slow down program which uses large IOArrays. Hm, there is Hans Boehm GC for C and C++ and I have gcmalloc and gcmalloc_atomic. Why this isn;t present in Haskel? gcmalloc_atomic is usefull when allocating large arrays that does not contain any references/pointers. for your program, things will be not so bad because IOArray, used inside your HashTable, contains far less than million elements and because your program has more to do itself. but try to compare MUT time of functional and imperative version - may be your algorithm is really faster and only GC times makes this comparision unfair Hm that can be solved if hash_table use gcmaloc_atomic I guess. (If all execution times goes for GC scans). .. writing this message i thought that reducing number of GCs can speed up my program and your program too. so there is third variant - using IOArray, but with +RTS -A100m Wow, that option almost doubled speed of HashTable program (memory leak remains). Thank you, for enlighten me about GC. After this I can think of gc_add_scanrange,gc_remove_scanrange and gcmalloc_atomic functions as they are now common to other languages? We need low level GC interface in order to produce fast programs. Greetings, Bane. _ Don't just search. Find. Check out the new MSN Search! http://search.msn.com/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
RE: [Haskell-cafe] Re: Haskell Speed
From: Isaac Gouy [EMAIL PROTECTED] To: haskell-cafe@haskell.org Subject: [Haskell-cafe] Re: Haskell Speed Date: Thu, 29 Dec 2005 13:00:15 -0800 (PST) --- Isaac Gouy [EMAIL PROTECTED] wrote: We'll be happy to also show a Haskell program that uses Data.HashTable - first, someone needs to contribute that program. Someone did: k-nucleotide Haskell GHC #2 http://shootout.alioth.debian.org/gp4/benchmark.php?test=knucleotidelang=all To comment some observation on this program. Most of the pressure now is on Data.HashTable. I've susspected such large memory usage on substring from array conversions, so mad version with data MyString = MakeMyStrinf { buf :: Ptr Char, size :: Int } and there was no substrings in map or anywhere else, but memory consumption remains. So after eliminating inserts and updates into HashTable memory was ok. Finally I've realized that updates into hash table actually increase memory usage to large extent instead to keep memory same on average. So I guess this is bug in HashTable? Second in this test, hash function needs to be very strong, as even with following I got longest chain of 16 elements. myHashString = fromIntegral . ff''' . ff'' . ff' . foldr f 0 where f c m = f'' $ f' (ord c + m) f' m = m + (m `shiftL` 10) f'' m = m `xor` (m `shiftR` 6) ff' m = m + (m `shiftL` 3) ff'' m = m `xor` (m `shiftR` 11) ff''' m = m + (m `shiftL` 15) Default hashString has longestChain of 18 elements. Perhaps someone can explain if such a memory leaks from HashTable updates are normal or are bugs? All in all functional version consumes less memory and is twice as fast. Greetings, Bane. _ Express yourself instantly with MSN Messenger! Download today it's FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
RE: [Haskell-cafe] Re: Haskell Speed
From: Isaac Gouy [EMAIL PROTECTED] To: haskell-cafe@haskell.org Subject: [Haskell-cafe] Re: Haskell Speed Date: Tue, 27 Dec 2005 20:12:01 -0800 (PST) Branimir Maksimovic wrote: Of course, first example uses [String] instead of Data.HashTable as other languages do. Imagine C program does not use hash,rather list, how it will perform:) And the author comments his program -- This is a purely functional solution to the problem. -- An alternative which keeps a mutable table of occurences would -- be faster. Yes, I saw that but that does not qualifies for benchmark with hash tables. I mean program is ok, but this is aplles to oranges. We'll be happy to also show a Haskell program that uses Data.HashTable - first, someone needs to contribute that program. Ok , i've attached hash table version. As I'm Haskell newbie I suppose there is someone which can do much better. I didn't look further after that. Ideal - you may criticize without the risk that others will criticize what you do. Attachment follows, you can cricize my poor Haskell skills as much you want :) This version si actually D program trnslated to Haskell + file parsing code from original Haskell program. Now wou can compare no matter how slow it is :) Greetings, Bane. _ Don't just search. Find. Check out the new MSN Search! http://search.msn.click-url.com/go/onm00200636ave/direct/01/ knucleotide.hs Description: Binary data ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] binary IO
From: Joel Reymont [EMAIL PROTECTED] To: Tomasz Zielonka [EMAIL PROTECTED] CC: Jeremy Shaw [EMAIL PROTECTED],haskell-cafe@haskell.org Subject: Re: [Haskell-cafe] binary IO Date: Tue, 27 Dec 2005 09:18:54 + Tomasz, Try http://wagerlabs.com/timeleak.tgz. See the Killer pickler combinators thread as well. My desired goal is to have 4k bots (threads?) running at the same time. At, say, 1k/s per bot I figure something like 4Mb/s round-trip. Each bot cannot spend more than a couple of seconds on pickling/ unpickling. I'm not even close to hitting that goal even reading from a file. I'm getting delays of 4s+ with just 100-200 bots reading from a file and even less than that in a networked environment. The more bots I run the higher the delays, to the point of frequent delays of 10s+. The kicker is that some packets come in at 4k compressed with Zlib but become something like 50k uncompressed and then expand to a list of 500+ records, etc. I have C++ concurrent server that performs 2600 reqs/sec on about 500 connections and dual Xeon 2.8Ghz, but no pickling /unpickling, just short text. Has sepparate IO threads that divide descriptor sets (num descs / IO thread) and worker threads as number of CPU's * 2, no locking of shared queue. So with 4k connections I guess that would be maximum 2k requests on *dual* box per second, without pickling / unpickling, just short textual protocol and simple services. I think that you will get hard time even with C to achieve your goal. Greetings, Bane. _ Don't just search. Find. Check out the new MSN Search! http://search.msn.com/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: Haskell Speed
Paul Moore wrote: On 25 Dec 2005 12:24:38 +0100, Peter Simons [EMAIL PROTECTED] wrote: Paul Moore writes: It would be interesting to see standalone code for wcIOB (where you're allowed to assume that any helpers you need, like your block IO library, are available from the standard library). This would help in comparing the obviousness of the two approaches. A simple version of the program -- which doesn't need any 3rd party modules to compile -- is attached below. My guess is that this approach to I/O is quite obvious, too, if you have experience with system programming in C. Hmm, I can't honestly believe that you feel that your code is as obvious as the original. I'm not unfamiliar with monads and state, and I know C well, but it took me a significant amount of time to decipher your code (even knowing what it was intended to do), whereas I knew what the original was doing instantly. IMHO, the main point of the example in the article is that wc :: String - (Int, Int, Int) wc file = ( length (lines file) , length (words file) , length file ) is a crapy word-counting algorithm. Dunno. It's certainly not a bad (executable!) definition of the problem. My point is that Haskell allows me to write *very* clear executable pseudocode, but that code is not a good starting point for writing production-quality code. this program counts two times length of lists of strings formed by lines, and words and third time counts again length of file. This is not just word counting program, it creates two additional lists, which are not used anywhere but to count :) While it is certainly expressive, in terms of programing is pointless. No one would write such a code for word counting. Here is what I would write in Haskell, same logic as in C++ (i don;t know standard lib ): module Main where import IO import Char main = do s - hGetContents stdin putStrLn $ show $ wc s wc :: String - (Int , Int , Int) wc strs = wc' strs (0,0,0) where wc' [] res = res wc' (s:str) (lns, wrds, lngth ) | s == '\n' = wc' str (lns+1,wrds, lngth+1) | isAlpha s = wc'' str (lns, wrds+1,lngth+1) | otherwise = wc' str (lns,wrds, lngth+1) wc'' [] res = res wc'' (s:str) (lns,wrds,lngth) = if isAlphaNum s then wc'' str (lns,wrds,lngth+1) else wc' str (lns,wrds, lngth+1) Greetings, Bane. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
RE: [Haskell-cafe] Haskell vs. Erlang for heavy-duty network apps (wasRe: Haskel
From: Joel Reymont [EMAIL PROTECTED] To: Bulat Ziganshin [EMAIL PROTECTED] CC: Peter Simons [EMAIL PROTECTED], haskell-cafe@haskell.org Subject: [Haskell-cafe] Haskell vs. Erlang for heavy-duty network apps (wasRe: Haskell Speed) Date: Sun, 25 Dec 2005 12:20:38 + On Dec 25, 2005, at 10:13 AM, Bulat Ziganshin wrote: Hello Joel, [...] so i think that your problems is due to bad design decisions caused by lack of experience. two weeks ago when you skipped my suggestions about improving this design and answered that you will use systematic approach, i foresee that you will fail and say that Haskell is a bad language Yes and no. The systematic approach that I used was profiling the serialization code and tweaking all that I could. I saved my profiling reports after each run and tracked the changes that I made. I will blog about it after Simon M. comes back and suggests how to squeeze the last bit out of it. Regardless of this, it looks to me like I could easily have around 4Mb of network traffic per second with about 4k threads and complicated nested structures to serialize and deserialize. Trying to tackle far less data suggests to me that it's not gonna happen. So I will try to take this as far as I can in Haskell, once I have the heavy artillery to back me up. If the results are good then I will use them in later applications of the same nature but in the meantime I'm rewriting this particular app in Erlang. Sounds familiar ?:) http://www.jetcafe.org/~npc/doc/euc00-sendmail.html Similar experience with Erlang about 5 years ago :0) Greetings, Bane. _ Express yourself instantly with MSN Messenger! Download today it's FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
RE: Re[2]: [Haskell-cafe] Substring replacements
From: Bulat Ziganshin [EMAIL PROTECTED] Reply-To: Bulat Ziganshin [EMAIL PROTECTED] To: Branimir Maksimovic [EMAIL PROTECTED] CC: haskell-cafe@haskell.org Subject: Re[2]: [Haskell-cafe] Substring replacements Date: Fri, 23 Dec 2005 11:32:01 +0300 Hello Branimir, Wednesday, December 21, 2005, 10:18:43 AM, you wrote: try to add {-# NOINLINE replace #-} to both programs and repeat comparision BM These are tests: BM No optimisations (no -O): NOINLINE just prevents RunTimeCompilation (see wiki page for details), so this way you will test speed of replace on previously unknown string. disabling optimization says nothing about real speed of optimized program, which searches for the many different strings I got it. These tests were with NOINLINE in both cases but I didn;t saw any speed difference in results as actually replace (straight) and searchReplace (KMP) is just called for two differnet strings. Perhaps if I call that for long list of short patterns patterns on short string, test would display different results (INLINE wouldn't help). I'll try that next. Greetings, Bane. _ Don't just search. Find. Check out the new MSN Search! http://search.msn.click-url.com/go/onm00200636ave/direct/01/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
RE: [Haskell-cafe] Killer pickler combinators (was Time leak)
From: Joel Reymont [EMAIL PROTECTED] To: Haskell-Cafe Cafe haskell-cafe@haskell.org Subject: [Haskell-cafe] Killer pickler combinators (was Time leak) Date: Wed, 21 Dec 2005 02:39:43 + The original paper is at http://research.microsoft.com/ ~akenn/fun/ picklercombinators.pdf My adaptation is at http://wagerlabs.com/timeleak.tgz. This is a full repro case, data included. The issue is that with 1000 threads my serialization is taking a few seconds. Inserting a delay or passing in trace (see README and code) gets serialization time down to 0-1s, the way it should be. What gives? Try it for yourself and please let me know! This has easy explanation. I am learning haskell and your programs are great of that. In this code: do TOD time1 _ - getClockTime (kind, ix1) - unpickle puCmdType ptr ix TOD time2 _ - getClockTime (cmd', _) - unpickle (puCommand kind) ptr ix1 TOD time3 _ - getClockTime you get time multiple times. So if you lock whole operation nothing else will be executed while in this code and therefore you don;t have timeouts. But, without lock more threads you have, more time have to wait for operation to finish and therefore timeouts. Since I see that Haskell executes all this in single thread, lock just ensures that your operasion will not be interrupted, thereferore not time outs. But if you measure say 5000 reads cumulative time, you'll get same problem again. Let's say you have more then one worker thread and multiple CPU's. Only then situation will be better. Perhaps you'll get somewhat better perfromance with OS context switch, but not to avail, it is humpered with same problem. You need more CPU-s and worker threads in order to service such large number of tasks. Just measure how much requests can be serviced in reasonable time and limit on that. For single cpu lock will be ok, but for multiple CPU's you have to somehow say Haskell to spawn multiple workers (more then one). I would be glad to know how to tell run time to use more then one worker thread. Greetings, Bane. Thanks, Joel -- http://wagerlabs.com/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe _ Express yourself instantly with MSN Messenger! Download today it's FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Substring replacements
From: Daniel Fischer [EMAIL PROTECTED] To: Branimir Maksimovic [EMAIL PROTECTED] CC: Bulat Ziganshin [EMAIL PROTECTED], Haskell-Cafe@haskell.org KMP is O(m) while straightforward is O(m*n). Where m is the length of the input and n is the length of the searched-for pattern, I think? Yes. But these are worst-case complexities, I believe, ordinarily, straightforward will be O(m), too. Yes, those are worst cases for both algorithms. O(m) for KMP, O(m*n) for straightforward. My test favors straightforward, in any other case KMP wins by order of magnitude. Can you give example tests? Any example that has long search pattern say (many a's followed by b ) and searched string has many partial matches (many a's). Particularly, any example which exhibits O(m*n) or close to, case for straightforward search. Greetings, Bane. _ Express yourself instantly with MSN Messenger! Download today it's FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Killer pickler combinators (was Time leak)
From: Joel Reymont [EMAIL PROTECTED] To: Branimir Maksimovic [EMAIL PROTECTED] CC: haskell-cafe@haskell.org Subject: Re: [Haskell-cafe] Killer pickler combinators (was Time leak) Date: Wed, 21 Dec 2005 14:51:42 + I'm not sure I buy this. Again, this helps: {-# NOINLINE lock #-} lock :: MVar () lock = unsafePerformIO $ newMVar () trace s = withMVar lock $ const $ putStrLn s and then in read_ cmd - read h trace trace is called _after_ all the timings in read so it should not affect the timings. It does not affects timings directly , but indirectly. You have putStrLn which performs some work and that is point of serialization. Try no op instead of putStrLn and you should get timeouts again. i'm sure you'll get even better operation timings if you lock around whole timing operation instead. This will be valid only for single CPU only. In this way you actually get same throughoutput, with or without lock. It' just you have to measure cummulative reads/sec which will be same with/without lock for single CPU/worker thread. Again, only way to improve performance is to use more then one CPU/worker thread. Greetings, Bane. You could basically say that the lock is at the end of read, after all the unpickling has been done. The other interesting thing is that replacing trace with delay _ = threadDelay 1 does not solve the issue. On Dec 21, 2005, at 2:33 PM, Branimir Maksimovic wrote: In this code: do TOD time1 _ - getClockTime (kind, ix1) - unpickle puCmdType ptr ix TOD time2 _ - getClockTime (cmd', _) - unpickle (puCommand kind) ptr ix1 TOD time3 _ - getClockTime you get time multiple times. So if you lock whole operation nothing else will be executed while in this code and therefore you don;t have timeouts. But, without lock more threads you have, more time have to wait for operation to finish and therefore timeouts. -- http://wagerlabs.com/ _ Express yourself instantly with MSN Messenger! Download today it's FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Substring replacements
I've finally performed test on amd64 and result is a same as on intel. KMP always wins. So KMP is best suited for non indexed strings and I guess should be used in library as prefered search/replace method. This test favors straightforward search. [EMAIL PROTECTED] myhaskell]$ time ./KMP Working:seasearch replace able seaseasearch baker seasearch charlie True Done real0m10.783s user0m10.769s sys 0m0.016s [EMAIL PROTECTED] myhaskell]$ time ./straightforward Working:seasearch replace able seaseasearch baker seasearch charlie True Done real0m11.769s user0m11.741s sys 0m0.028s [EMAIL PROTECTED] myhaskell]$ uname -a Linux devel64.office.kom 2.6.14-skas3-v8.2 #2 Fri Nov 11 21:19:36 CET 2005 x86_64 x86_64 x86_64 GNU/Linux Greetings, Bane. _ Express yourself instantly with MSN Messenger! Download today it's FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Substring replacements
From: Bulat Ziganshin [EMAIL PROTECTED] Reply-To: Bulat Ziganshin [EMAIL PROTECTED] To: Branimir Maksimovic [EMAIL PROTECTED] CC: haskell-cafe@haskell.org Subject: Re: [Haskell-cafe] Substring replacements Date: Tue, 20 Dec 2005 23:55:22 +0300 Hello Branimir, Tuesday, December 20, 2005, 9:48:48 PM, you wrote: BM I've finally performed test on amd64 and result is a same as on intel. BM KMP always wins. So KMP is best suited for non indexed strings BM and I guess should be used in library as prefered search/replace method. BM This test favors straightforward search. i'm 90% sure that straightforward method must be faster for one-time searches. KMP is O(m) while straightforward is O(m*n). your test may give better results with KMP algorithm just because you repeat the same search many times and it was automatically run-time compiled as described on the wiki page about KMP My test favors straightforward, in any other case KMP wins by order of magnitude. I think that straightfoirward is better then KMP with optimisations off is due more complex program. try to add {-# NOINLINE replace #-} to both programs and repeat comparision These are tests: No optimisations (no -O): Intel hyperthreaded,windows $ time ./KMP Working:seasearch replace able seaseasearch baker seasearch charlie True Done real0m34.766s user0m0.015s sys 0m0.000s [EMAIL PROTECTED] ~/tutorial $ time ./straightforward Working:seasearch replace able seaseasearch baker seasearch charlie True Done real0m14.719s user0m0.031s sys 0m0.000s AMD 64 bit: [EMAIL PROTECTED] myhaskell]$ time ./KMP Working:seasearch replace able seaseasearch baker seasearch charlie True Done real1m58.066s user1m57.939s sys 0m0.128s [EMAIL PROTECTED] myhaskell]$ time ./straightforward Working:seasearch replace able seaseasearch baker seasearch charlie True Done real0m41.565s user0m41.527s sys 0m0.040s with optimisations (-O): Intel hyperthreaded,windows $ time ./KMP Working:seasearch replace able seaseasearch baker seasearch charlie True Done real0m8.625s user0m0.015s sys 0m0.015s [EMAIL PROTECTED] ~/tutorial $ time ./straightforward Working:seasearch replace able seaseasearch baker seasearch charlie True Done real0m11.735s user0m0.015s sys 0m0.000s AMD 64 bit, linux: [EMAIL PROTECTED] myhaskell]$ time ./KMP Working:seasearch replace able seaseasearch baker seasearch charlie True Done real0m10.546s user0m10.529s sys 0m0.016s [EMAIL PROTECTED] myhaskell]$ time ./straightforward Working:seasearch replace able seaseasearch baker seasearch charlie True Done real0m11.796s user0m11.785s sys 0m0.012s Greetings, Bane. _ FREE pop-up blocking with the new MSN Toolbar - get it now! http://toolbar.msn.click-url.com/go/onm00200415ave/direct/01/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
RE: Re[2]: [Haskell-cafe] Substring replacements
From: Bulat Ziganshin [EMAIL PROTECTED] Reply-To: Bulat Ziganshin [EMAIL PROTECTED] To: Branimir Maksimovic [EMAIL PROTECTED] CC: [EMAIL PROTECTED], Haskell-Cafe@haskell.org Subject: Re[2]: [Haskell-cafe] Substring replacements Date: Fri, 16 Dec 2005 16:51:32 +0300 Hello Branimir, Friday, December 16, 2005, 5:36:47 AM, you wrote: BM I've also performed tests on dual Xeon linux box and results are just to let you know - GHC don't uses pentium4 hyperthreading, multiple cpus or multiple cores in these tests only way to make ghc using multiple processors is to use 6.5 beta version, compile with -smp and explicitly fork several threads You are right. I've double checked on linux there is just one thread executing and there is not such a big difference between KMP and straightforward search. Just about 10% KMP is faster with my test, but still faster. I've checked both SMP and non SMP linux (Intel). Hyperthreading effect is on windows only, I guess, as there are visible three threads per test process. I have one amd64 near (I'll check that one too, as soon as admin sets up account for me on that machine). Greetings, Bane. _ Don't just search. Find. Check out the new MSN Search! http://search.msn.click-url.com/go/onm00200636ave/direct/01/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
RE: Re[2]: [Haskell-cafe] Substring replacements
From: Bulat Ziganshin [EMAIL PROTECTED] Reply-To: Bulat Ziganshin [EMAIL PROTECTED] To: Branimir Maksimovic [EMAIL PROTECTED] CC: [EMAIL PROTECTED], Haskell-Cafe@haskell.org Subject: Re[2]: [Haskell-cafe] Substring replacements Date: Fri, 16 Dec 2005 16:51:32 +0300 Hello Branimir, Friday, December 16, 2005, 5:36:47 AM, you wrote: BM I've also performed tests on dual Xeon linux box and results are just to let you know - GHC don't uses pentium4 hyperthreading, multiple cpus or multiple cores in these tests Oh yes it does. I clearly see multiple threads that take unequal percentage on each virtual CPU. I guess that other thread is garbage collector thread. In case of hyperthreading, speed is gained by reduced memory latency by 30-60 % only way to make ghc using multiple processors is to use 6.5 beta version, compile with -smp and explicitly fork several threads This is not the case as I see. On windows search replace test programs spawn 3 threads and on linux I'm not sure, but I've checked program that calls Haskell from C++ and GHC spawns additional thread, which is not my thread, that also performs something constantly, and I didn't spawn any thread from Haskell. So hyperthreading helps as it helps to optimise when several threads accesses memory as is in this test case. I can't see any other explanation why KMP search is slower on AMD 20% , but faster on Intel 30%, then straightforward search with my test. Greetings, Bane. _ Express yourself instantly with MSN Messenger! Download today it's FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Substring replacements
This is what I got for BM. Performance dissapoints as BM is really suited for indexed strings like arrays.It mainly operates on indexes. This is simple BM, as I didn't want to go for more complex variant,becauses takes and drops and recalculation of next position is too pricey for non indexed structure. So, clear winner is KMP for non indexed strings. There is also finite automaton algorithm but this works well if search strings are precompiled, so I'll implement it only for education purposes. I hope my Haskell improves as I've learned how to reduce number of paramaters. searchReplaceBM :: String - String - String - String searchReplaceBM _ str = str searchReplaceBM sr rp str = searchReplace str where table :: UArray Int Int table = array (0,255) ([(i,0) | i - [0..255]] ++ proc sr 1) proc [] _ = [] proc (s:st) i = (ord s,i):proc st (i+1) len = length sr rsrch = reverse sr searchReplace str | null remaining = if found then rp else passed |found = rp ++ searchReplace remaining | otherwise = passed ++ searchReplace remaining where (passed,remaining,found) = searchReplace' str searchReplace' str = if j == 0 then (,drop len str,True) else failed where failed = case drop (j-1) str of [] - (str,,False) (c:_) - (take sk str, drop sk str, False) where md = j - table ! ord c sk = if md 0 then md else 1 j = srch rsrch (reverse $ take len str) len where srch _ = 0 srch _ l = l srch (s:str) (s':str') l | s == s' = srch str str' (l-1) | otherwise = l Greetings, Bane. From: Branimir Maksimovic [EMAIL PROTECTED] To: [EMAIL PROTECTED], [EMAIL PROTECTED] CC: Haskell-Cafe@haskell.org Subject: Re: [Haskell-cafe] Substring replacements Date: Thu, 15 Dec 2005 01:39:57 + From: Branimir Maksimovic [EMAIL PROTECTED] To: [EMAIL PROTECTED] CC: Haskell-Cafe@haskell.org Subject: Re: [Haskell-cafe] Substring replacements Date: Thu, 15 Dec 2005 00:55:02 + From: Daniel Fischer [EMAIL PROTECTED] To: Branimir Maksimovic [EMAIL PROTECTED] CC: Haskell-Cafe@haskell.org Subject: Re: [Haskell-cafe] Substring replacements Date: Wed, 14 Dec 2005 20:40:06 +0100 Hi Bane, nice algorithm. Since comparing chars _is_ cheap, it is to be expected that all the hash-rotating is far more costly for short search patterns. The longer the pattern, the better this gets, I think -- though nowhere near KMP (or would it?). Yes,KMP is superior in single pattern search. We didn't tried Boyer-Moore algorithm yet, though. But I think it would be difficult to implement it in Haskell efficiently as it searches backwards and jumps around, and we want memory savings. Though, I even didn't tried yet, but it is certainly very interesting. Forget what I've said. Boyer-Moore *can* be implemented efficiently, it is similar to KMP it goes forward, but when it finds last character in pattern, than starts to search backwards. This can be implemented easilly as Haskell lists naturaly reverse order when putting from one list to other. Heh, never say never :) As I see from documents Boyer-Moore has best performance on average and should be better than KMP. Greetings,Bane. _ FREE pop-up blocking with the new MSN Toolbar - get it now! http://toolbar.msn.click-url.com/go/onm00200415ave/direct/01/ _ Don't just search. Find. Check out the new MSN Search! http://search.msn.click-url.com/go/onm00200636ave/direct/01/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Substring replacements
From: Daniel Fischer [EMAIL PROTECTED] To: Branimir Maksimovic [EMAIL PROTECTED] CC: Haskell-Cafe@haskell.org Subject: Re: [Haskell-cafe] Substring replacements Date: Thu, 15 Dec 2005 21:07:11 +0100 Am Donnerstag, 15. Dezember 2005 02:39 schrieben Sie: From: Branimir Maksimovic [EMAIL PROTECTED] To: [EMAIL PROTECTED] CC: Haskell-Cafe@haskell.org Subject: Re: [Haskell-cafe] Substring replacements Date: Thu, 15 Dec 2005 00:55:02 + From: Daniel Fischer [EMAIL PROTECTED] To: Branimir Maksimovic [EMAIL PROTECTED] CC: Haskell-Cafe@haskell.org Subject: Re: [Haskell-cafe] Substring replacements Date: Wed, 14 Dec 2005 20:40:06 +0100 Hi Bane, nice algorithm. Since comparing chars _is_ cheap, it is to be expected that all the hash-rotating is far more costly for short search patterns. The longer the pattern, the better this gets, I think -- though nowhere near KMP (or would it?). Yes,KMP is superior in single pattern search. We didn't tried Boyer-Moore algorithm yet, though. But I think it would be difficult to implement it in Haskell efficiently as it searches backwards and jumps around, and we want memory savings. Though, I even didn't tried yet, but it is certainly very interesting. Forget what I've said. Boyer-Moore *can* be implemented efficiently, it is similar to KMP it goes forward, but when it finds last character in pattern, than starts to search backwards. This can be implemented easilly as Haskell lists naturaly reverse order when putting from one list to other. Heh, never say never :) As I see from documents Boyer-Moore has best performance on average and should be better than KMP. Greetings,Bane. Well, I also thought that all the jumping around in Boyer-Moore wasn't too good (after each shift we must bite off a chunk from the remaining input, pushing that onto the stack, which costs something). But I gave it a try today and here's what I came up with: import Data.List (tails) import Data.Map (Map) import qualified Data.Map as Map import Data.Array.Unboxed searchRep :: String - String - String - String searchRep src rp str = run (reverse $ take len1 str) $ drop len1 str where len = length src len1 = len-1 pat :: UArray Int Char pat = listArray (0,len1) src ch = pat!len1 badChar :: Map Char Int badChar = Map.fromList $ zip src [0 .. ] getBc c = case Map.lookup c badChar of Just n - n Nothing - -1 suffs :: UArray Int Int suffs = listArray (0,len1) $! init $! map (pr 0 crs) $! tails crs where crs = reverse src pr n (x:xs) (y:ys) | x == y = pr (n+1) xs ys pr n _ _ = n bmGs0 :: UArray Int Int bmGs0 = array (0,len1) [(j,k) | (k,k') - zip (tail $! help) help, j - [k' .. k-1]] help = [k | k - [0 .. len], k == len || suffs!k == len-k] bmGs :: UArray Int Int bmGs = bmGs0 // [(len1-suffs!k,k) | k - [len1,len-2 .. 1]] run by = reverse by run by (c:cs) | c == ch = process (c:by) cs | otherwise = run (c:by) cs roll n xs ys | n = 0 = (xs, ys) roll n xs (y:ys) = roll (n-1) (y:xs) ys roll _ xs = (xs, ) walk n = (n,) walk n st@(c:cs) | n 0 = (n,st) | c == pat!n = walk (n-1) cs | otherwise = (n,st) process con left | i 0 = reverse pass ++ rp ++ run left | otherwise = {- bye ++ -} run ncon nleft where (i,pass) = walk len1 con d = if null pass then i+1 else max (bmGs!i) (i - getBc (head pass)) -- bye = reverse $! drop (len-d) con (ncon,nleft) = roll (d-1) {- (take (len-d) con) -} con left it's not as fast as KMP for the tests, but not too bad. Commenting out 'bye' gives a bit of extra speed, but if it's _long_ before a match (if any), we'd be better off relieving our memory with 'bye', I think. Any improvements are welcome, certainly some of you can do much better. It is fast on my machine except that you are using Map to lookup for badChar which is O(log n). I;ve placed this instead: badChar :: UArray Int Int badChar = array (0,255) ([(i,-1) | i - [0..255]] ++ proc src 0) proc [] _ = [] proc (s:st) i = (ord s,i):proc st (i+1) getBc c = badChar ! ord c which gaved it significant boost, O(1) lookup. Now it's faster then brute force method but 10% slower then KMP with my test. I've also performed tests on dual Xeon linux box and results are proportionally the same as on my intel windows box. KMP wins again 10% better then BM and 20-30% better then straightforward search, which means that KMP is well suited for non indexed strings. Cheers, Daniel P.S. As an algorithm, I prefer KMP, it's quite elegant whereas BM is somewhat fussy. Yes, BM is for indexed structures. Greetings, Bane
Re: [Haskell-cafe] Substring replacements
From: Daniel Fischer [EMAIL PROTECTED] To: Branimir Maksimovic [EMAIL PROTECTED] CC: Haskell-Cafe@haskell.org Subject: Re: [Haskell-cafe] Substring replacements Date: Tue, 13 Dec 2005 11:23:29 +0100 After seeing that your program is fastest (I've also tried one from http://haskell.org/hawiki/RunTimeCompilation but perhaps I'm not that good in converting to search replace?) I've decided to try with Rabin-Karp algorithm. This algorithm performs same operation as straightforward search, but compares hashes instead of chars. With ability to rotate hash (remove first, add next) characters there is also optimisation, that hash is calculated only for single next character rather again for whole substring. Unfortunatelly on my machine it is very cheap to compare characters so with my test hashing overweights character compare, except in your test when hash searching is faster then straightforward search. This is best I can write in terms of performance and readability. I've tried with getFst that returns Maybe but it was slower so I decided to return '\0' in case that argument is empty list, which renders '\0' unusable, but then I really doubt that 0 will be used in strings. -- Rabin-Karp string search algorithm, it is very effective in searching of set -- of patterns of length n on same string -- this program is for single pattern search, but can be crafted -- for multiple patterns of length m hSearchReplace :: String - String - String - String hSearchReplace sr rp xs | not (null remaining) = found ++ rp ++ hSearchReplace sr rp (drop (length sr) remaining) | otherwise = found where (found,remaining) = hSearch sr xs hSearch :: String - String - (String,String) hSearch sr xs = hSearch' sr xs hcmp where hsrch = hash sr hcmp = hash $ take ls xs cmp = take ls xs ls = length sr hSearch' [] xs _ _= (xs,[]) hSearch' sr [] _ fndFail = (reverse fndFail,[]) hSearch' srch xxs@(x:xs) hcmps fndFail = if hsrch == hcmps then if isPrefixOf srch xxs then (reverse fndFail,xxs) else searchAgain else searchAgain where searchAgain = hSearch' srch xs (hashRotate (getFst xxs) (getFst nextxxs) (ls-1) hcmps) (x:fndFail) nextxxs = drop ls xxs getFst :: String - Char getFst [] = '\0' getFst (a:as) = a hash :: String - Int hash str = hash' str (length str - 1) where hash' :: String - Int - Int hash' [] _ = 0 hash' (s:str) pow = (101 ^ pow) *(fromEnum s) + hash' str (pow-1) hashRotate :: Char - Char - Int - Int - Int hashRotate cout cin pow hsh = (hsh - ((101 ^ pow) * (fromEnum cout)))*101 + (fromEnum cin) Greetings, Bane. _ Express yourself instantly with MSN Messenger! Download today it's FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Substring replacements
From: Daniel Fischer [EMAIL PROTECTED] To: Branimir Maksimovic [EMAIL PROTECTED] CC: Haskell-Cafe@haskell.org Subject: Re: [Haskell-cafe] Substring replacements Date: Wed, 14 Dec 2005 17:10:20 +0100 I think that's because on your machine Bulat's version have better perfromance with CPU cache. I don;t know but now your version is 25% faster with my test on P4 hyperthreaded. E, what's 'hyperthreaded' ? Unfortunately, I'm completely useless with computers. I think that i've figure it now. Hyperthreading is hardware CPU feature that single CPU core can speed up execution of two running threads. For example if one thread uses integer unit and other FP unit CPU executes that in parallel. But that's not important or significant. What is interestenting is memory latency. If one thread peeks and pokes around memory for , say 1 unit of time, with usual CPU two thread will execute 2 units of time. Hyperthreaded (I'm talking about intel implementation) CPU will execute that in 1.4 points of time giving 60% boost in terms of speed. I've tested some assembler and C program that launches two threads each roaming over memory to anulate impact of cache. What is noticable is that two threads have 60% less memory latency constantly then single thread. That means if single thread for each out of cache memory access waits 300-400 CPU cycles, two threads wait 60% less. Now what has that to do with our programs as they are single threaded? I think it's garbage collection. Our programs run with garbage collector in background and you feel that burden by 20% as your program probably pushes garbage collector to work more than Bulat's version. On hyperthreaded CPU impact of garbage collection is reduced by a factor of 30-60 % resulting in your program being 30% faster on my machine. Greetings, Bane. _ Express yourself instantly with MSN Messenger! Download today it's FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Substring replacements
From: Daniel Fischer [EMAIL PROTECTED] To: Branimir Maksimovic [EMAIL PROTECTED] CC: Haskell-Cafe@haskell.org Subject: Re: [Haskell-cafe] Substring replacements Date: Wed, 14 Dec 2005 20:40:06 +0100 Hi Bane, nice algorithm. Since comparing chars _is_ cheap, it is to be expected that all the hash-rotating is far more costly for short search patterns. The longer the pattern, the better this gets, I think -- though nowhere near KMP (or would it?). Yes,KMP is superior in single pattern search. We didn't tried Boyer-Moore algorithm yet, though. But I think it would be difficult to implement it in Haskell efficiently as it searches backwards and jumps around, and we want memory savings. Though, I even didn't tried yet, but it is certainly very interesting. However, I don't see how to (efficiently) do a multiple pattern search with KMP, so there -- if all patterns have the same length, otherwise I don't see -- Rabin-Karp would probably be the method of choice. Yes, this algorithm can search in parallel patterns of same length. Different search patterns have to be searched same way as with KMP. I tuned it up somewhat: import Data.List (isPrefixOf) import Data.Char (ord) -- using ord instead of fromEnum oddly makes it -- faster for my test, but slower for yours, but only a whiff. Wow, on my machine your version of Rabin-Karp gives 30% boost to my test. This helps me learn Haskell , too . Greetings, Bane. _ Express yourself instantly with MSN Messenger! Download today it's FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Substring replacements
From: Branimir Maksimovic [EMAIL PROTECTED] To: [EMAIL PROTECTED] CC: Haskell-Cafe@haskell.org Subject: Re: [Haskell-cafe] Substring replacements Date: Thu, 15 Dec 2005 00:55:02 + From: Daniel Fischer [EMAIL PROTECTED] To: Branimir Maksimovic [EMAIL PROTECTED] CC: Haskell-Cafe@haskell.org Subject: Re: [Haskell-cafe] Substring replacements Date: Wed, 14 Dec 2005 20:40:06 +0100 Hi Bane, nice algorithm. Since comparing chars _is_ cheap, it is to be expected that all the hash-rotating is far more costly for short search patterns. The longer the pattern, the better this gets, I think -- though nowhere near KMP (or would it?). Yes,KMP is superior in single pattern search. We didn't tried Boyer-Moore algorithm yet, though. But I think it would be difficult to implement it in Haskell efficiently as it searches backwards and jumps around, and we want memory savings. Though, I even didn't tried yet, but it is certainly very interesting. Forget what I've said. Boyer-Moore *can* be implemented efficiently, it is similar to KMP it goes forward, but when it finds last character in pattern, than starts to search backwards. This can be implemented easilly as Haskell lists naturaly reverse order when putting from one list to other. Heh, never say never :) As I see from documents Boyer-Moore has best performance on average and should be better than KMP. Greetings,Bane. _ FREE pop-up blocking with the new MSN Toolbar - get it now! http://toolbar.msn.click-url.com/go/onm00200415ave/direct/01/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Substring replacements
From: [EMAIL PROTECTED] To: haskell-cafe@haskell.org Subject: Re: [Haskell-cafe] Substring replacements Date: Thu, 15 Dec 2005 00:25:19 -0500 G'day all. Quoting Branimir Maksimovic [EMAIL PROTECTED]: After seeing that your program is fastest (I've also tried one from http://haskell.org/hawiki/RunTimeCompilation but perhaps I'm not that good in converting to search replace?) You probably did it right, but you could post your version to the list if you want me to take a look. Oh, here it is but just don;t laugh :) I've hacked with unsafePerformIO as I din't know how to remove IO from match any other way. searchReplaceKMP :: String-String-String - String searchReplaceKMP sr rp s | not (null remaining) = found++rp ++ searchReplaceKMP sr rp remaining | otherwise = found where (found,remaining) = unsafePerformIO $ matchKMP sr s matchKMP :: (Monad m, Eq a) = [a] - ([a] - m ([a],[a])) matchKMP [] = error Can't match empty list matchKMP xs = matchfunc [] where matchfunc = makeMatchFunc [dofail] (zip xs (overlap xs)) dofail = \ps xs - case xs of [] - fail can't match (y:ys) - matchfunc (y:ps) ys type PartialMatchFunc m a = [a] - [a] - m ([a], [a]) makeMatchFunc :: (Monad m, Eq a) = [PartialMatchFunc m a] - [(a, Int)] - PartialMatchFunc m a makeMatchFunc prev [] = \ps xs - return (reverse (drop ((length prev)-1) ps), xs) makeMatchFunc prev ((x,failstate):ms) = thisf where mf = makeMatchFunc (thisf:prev) ms failcont = prev !! (length prev - failstate - 1) thisf = \ps xs - case xs of [] - fail can't match (y:ys) - if (x == y) then mf (y:ps) ys else failcont ps xs overlap :: (Eq a) = [a] - [Int] overlap str = overlap' [0] str where overlap' prev [] = reverse prev overlap' prev (x:xs) = let get_o o | o = 1 || str !! (o-2) == x = o | otherwise = get_o (1 + prev !! (length prev - o + 1)) in overlap' (get_o (head prev + 1):prev) xs -- These are timings (it's performance is about the same as Rabin-Karp): $ time searchr.exe Working:seasearch replace able seaseasearch baker seasearch charlie searchr.exe: user error (can't match) real0m22.187s user0m0.015s sys 0m0.015s [EMAIL PROTECTED] ~/tutorial $ ghc -fglasgow-exts -O2 searchr.hs --make -o searchr.exe Chasing modules from: searchr.hs Compiling Main ( searchr.hs, searchr.o ) Linking ... [EMAIL PROTECTED] ~/tutorial $ time searchr.exe Working very long True Done real0m8.110s user0m0.031s sys 0m0.016s When I wrote the RunTimeCompilation code, it wasn't intended to be a shining example of efficiency, merely an illustration. Remember that it's doing TWO things: compiling the pattern to code, and then performing the search. The compilation phase is likely to be much slower than the search, so the speedup (if any!) would only be realised the SECOND time that you searched a string using the same pattern. (Assuming you re-used the compiled match code, of course!) Oh, that explaines it. Actually this has to be converted to searchReplace in order to be fast, but I don;t know how (yet) as your program is pretty complicated to my humble Haskell skills. I think that your technique can be usefull with Aho-Corasick algorithm as it first constructs finite automaton from tree, then performs search. So, I'll guess I'll try first Boyer-Moore, then Aho-Corasick, eventually run time compilation, but this is too advanced for me for now. Greetings, Bane. _ Express yourself instantly with MSN Messenger! Download today it's FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Substring replacements
From: Daniel Fischer [EMAIL PROTECTED] To: Branimir Maksimovic [EMAIL PROTECTED] CC: Haskell-Cafe@haskell.org Subject: Re: [Haskell-cafe] Substring replacements Date: Tue, 13 Dec 2005 11:23:29 +0100 Am Montag, 12. Dezember 2005 16:28 schrieben Sie: From: Daniel Fischer [EMAIL PROTECTED] To: Branimir Maksimovic [EMAIL PROTECTED] CC: Haskell-Cafe@haskell.org Subject: Re: [Haskell-cafe] Substring replacements Date: Mon, 12 Dec 2005 16:15:46 +0100 Earlier today: Sorry, but Prelude SearchRep searchReplace abaaba ## abababaaba abababaaba I haven't analyzed the algorithm, so I don't know why exactly this fails. I'll take a look sometime soon. I found the problem (one at least). Say the pattern to be replaced begins with 'a' and we have a sufficiently long match with the pattern starting at the first 'a' in the String. Upon encountering the second 'a', while the first pattern still matches, you start pushing onto the rollback-stack. But that isn't inspected anymore, so if the actual occurence of the pattern starts at the third (or fourth, n-th) occurence of 'a' and that is already pushed onto the rollback, you miss it. I've corrected this with adjusting rollback position. if rollBack is null then search for rollback starts at second character if not starts at same as searhed character because I skip what was searched. That's all. Though I'm not so sure now when I read this. Still not working: *New searchReplace abababc # ababababababc ababababababc *New searchReplace1 abababc # ababababababc ababababababc Yes, perhaps you've missed another post of mine. I've noticed that problem when pattern repeats more then 2 times and gave up because now whatever I do, your version is always fastest. So the question is, can we find a cheap test to decide whether to use KMP or Bulat's version? Just interleave string with search hits with one with no seacrh (that means partial too) hits, and your version will gain in speed. More partial matches and full search matches Bulat's version will gain in speed. Longer search strings, your version will have gains. In real world situation your KMP will always be fastest on average. I like that we are not using C arrays as then we have advantage of lazyness and save on memory usage. C++ program will be faster on shorter strings but on this large strings will loose due memory latency. and with your test, both programs are very fast. Greetings, Bane. On my 256MB RAM AMD Duron 1200 MHz, Bulat's version is consistently about 20% faster than my KMP on your test -- btw, I unboxed the pat array, which gave a bit of extra speed, but not much. I think that's because on your machine Bulat's version have better perfromance with CPU cache. I don;t know but now your version is 25% faster with my test on P4 hyperthreaded. your new version: $ time srchrep.exe Working:seasearch replace able seaseasearch baker seasearch charlie True Done real0m8.734s user0m0.015s sys 0m0.000s Bulat's version: [EMAIL PROTECTED] ~/tutorial $ time replace1.exe Working:seasearch replace able seaseasearch baker seasearch charlie True Done real0m11.734s user0m0.015s sys 0m0.015s 3 secs difference now. And apologies to Sebastian Sylvan, I also included an unboxed version of bord, built from the boxed version, and that sped things further up -- not much, again, but there it is. On my machine you got another 10-15% of boost with unboxed arrays. I wonder about this difference, -10% on one system and +20% on another system, ist that normal? Different caching schemes on CPU's perhaps? different memory latencies? hyperthreading helps your version? more code and data, perhaps because of that it pays the price on your machine? Greetings, Bane. _ Don't just search. Find. Check out the new MSN Search! http://search.msn.click-url.com/go/onm00200636ave/direct/01/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: syscall, sigpause and EINTR on Mac OSX
From: Joel Reymont [EMAIL PROTECTED] To: Branimir Maksimovic [EMAIL PROTECTED] CC: haskell-cafe@haskell.org Subject: Re: [Haskell-cafe] Re: syscall, sigpause and EINTR on Mac OSX Date: Mon, 12 Dec 2005 09:00:18 + The app is multi-threaded but uses lightweight threads (unbound). So that means user space threads, in comparison to kernel space threads? If this is so then all of what I said still stands, because userspace threads are executed concurrently too, just that more user threads share same lwp's on Solaris for example, if I understand correctly, taht is user space thread is not bound to single lwp. On linux all threads are kernel space threads. Not sure about windows though. If GHC implements user space threads that would be great, but that does not helps with your problems. Greetings, Bane. On Dec 12, 2005, at 4:24 AM, Branimir Maksimovic wrote: If your app is single threaded you should be ok. But then nothing is executed concurrently? why locking at all then? You wouldn;t have problems with deadlocks and signals if single threaded without locking. Now, I m really puzzled. -- http://wagerlabs.com/ _ Express yourself instantly with MSN Messenger! Download today it's FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Substring replacements
From: Daniel Fischer [EMAIL PROTECTED] To: [EMAIL PROTECTED] CC: Haskell-Cafe@haskell.org Subject: Re: [Haskell-cafe] Substring replacements Date: Mon, 12 Dec 2005 10:31:49 +0100 Am Montag, 12. Dezember 2005 01:34 schrieben Sie: On 12/12/05, Daniel Fischer [EMAIL PROTECTED] wrote: Okay, I have looked up KMP and implemented it. Seems to work -- my first use of QuickCheck, too. It's slower than Bulat's and Tomasz' for Branimir's test :-(, but really fast for my test. Undoubtedly, one can still tune it. Perhaps by using unboxed arrays... /S -- Sebastian Sylvan +46(0)736-818655 UIN: 44640862 I'm afraid, unboxed arrays are out of the question, because bord is incrementally produced :-( Working very long test2: loop No worrie your test is now fastest with both your and mine test. I;ve forgot to change working function in your test:0) mine test: your program is is srchrep.exe [EMAIL PROTECTED] ~/tutorial $ time searchr.exe Working:seasearch replace able seaseasearch baker seasearch charlie True Done real0m14.344s user0m0.015s sys 0m0.015s [EMAIL PROTECTED] ~/tutorial $ time srchrep.exe Working:seasearch replace able seaseasearch baker seasearch charlie True Done real0m10.672s your program is almost 1.5 secs faster then Bulat's user0m0.015s sys 0m0.000s [EMAIL PROTECTED] ~/tutorial $ time replace1.exe Working:seasearch replace able seaseasearch baker seasearch charlie True Done real0m12.016s user0m0.015s sys 0m0.015s now your test: [EMAIL PROTECTED] ~/tutorial $ time searchr.exe Working very long True Done real0m0.312s user0m0.015s sys 0m0.015s [EMAIL PROTECTED] ~/tutorial $ time replace1.exe Working very long False Done real0m12.516s user0m0.015s sys 0m0.015s [EMAIL PROTECTED] ~/tutorial $ time srchrep.exe Working very long True Done real0m0.375s yours is less then second as mine but is fastest in both tests user0m0.015s sys 0m0.015s I don;t know how you get lesser numbers with mine test, but on this machine your KMP algorithm performs best. Greetings ,Bane. _ Don't just search. Find. Check out the new MSN Search! http://search.msn.click-url.com/go/onm00200636ave/direct/01/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Substring replacements
From: Daniel Fischer [EMAIL PROTECTED] To: Branimir Maksimovic [EMAIL PROTECTED] CC: Haskell-Cafe@haskell.org Subject: Re: [Haskell-cafe] Substring replacements Date: Mon, 12 Dec 2005 13:07:29 +0100 Sorry, but Prelude SearchRep searchReplace abaaba ## abababaaba abababaaba I haven't analyzed the algorithm, so I don't know why exactly this fails. I'll take a look sometime soon. It failed because I didn;t adjusted search string for rollBack when previous rollBack is not null. this is corrected version: (with your changes it looks much better) --- searchReplace :: String-String-String - String searchReplace _ xs = xs searchReplace sr rp xs = searchr sr rp xs where searchr :: String-String-String-String-String - String searchr _ _ _ _ = searchr sr rp xs retB rollB | found = rp ++ searchr sr rp rema ret roll | otherwise = reverse (proc ++ rollB) ++ searchr sr rp rema ret roll where (found, proc, rema, ret, roll) = searchr' sr sr (reverse retB ++ xs) rollB searchr' src@(s:sr) src'@(s':sr') xs soFar rollB = searchr'' (drop (length rollB) src) src' xs soFar (not (null rollB),,) s searchr'' _ xs fnd _ _ = (True,fnd,xs,,) searchr'' _ _ fnd (_,ret,roll) _ = (False,ret++roll++fnd,,,) searchr'' src@(s:sr) src'@(s':sr') xxs@(x:xs) soFar (cnt,ret,roll) c | s == x = if s' == x null ret cnt then searchr'' sr sr' xs soFar (True, , x:roll) c else if null ret null roll then searchr'' sr src' xs (x:soFar) (True, , ) c else searchr'' sr src' xs soFar (True, x:roll++ret, ) c | otherwise = if null roll null ret then if c == x then (False, soFar, xxs, , ) else let (from, pre) = break (==c) xs in (False, reverse from ++ x:soFar, pre, , ) else if s'/=x then if null ret then (False, (x:roll) ++ soFar, xs,,) else (False, soFar, xxs,ret,) else if null ret then (False, soFar, xs, , x:roll) else (False, soFar, xxs, ret, ) However it is significantly slower then previous ugly version: searchReplace :: String-String-String - String searchReplace sr rp xs = searchr sr rp xs where searchr :: String-String-String-String-String - String searchr [] _ xs _ _ = xs searchr _ _ [] _ _ = [] searchr sr rp xs retBack rollBack | isFound $ fnd rollBack = rp ++ searchr sr rp (remaining $ fnd rollBack ) ( getRetBack $ fnd rollBack) ( getRollBack $ fnd rollBack) | otherwise = reverse ((processed $ fnd rollBack) ++ rollBack) ++ searchr sr rp (remaining $ fnd rollBack) ( getRetBack $ fnd rollBack) ( getRollBack $ fnd rollBack) where fnd = searchr' sr sr (reverse retBack ++ xs) isFound = fst . fst remaining = snd . snd . fst getRollBack = snd . snd getRetBack = fst . snd processed = fst . snd . fst searchr' :: String-String-String-String-String - ((Bool,(String,String)),(String,String)) searchr' srch@(sr:srs) srch'@(sr':srs') xs fndSoFar rollBack = searchr'' (drop (length rollBack) srch) srch' xs fndSoFar (not (isEmpty rollBack),,) sr searchr'' :: String-String-String-String-(Bool,String,String)-Char - ((Bool,(String,String)),(String,String)) searchr'' [] _ xs fnd _ _ = ((True,(fnd,xs)),(,)) searchr'' _ _ [] fnd (_,retBack,rollBack) _ = ((False,(retBack ++ rollBack ++ fnd,[])),(,)) searchr'' srch@(sr:srs) srch'@(sr':srs') xxs@(x:xs) fndSoFar (cnt,retBack,rollBack) s | sr == x = if cnt sr' == x isEmpty retBack then searchr'' srs srs' xs fndSoFar (True,,x:rollBack) s else if not (isEmpty retBack) || not (isEmpty rollBack) then searchr'' srs srch' xs fndSoFar (True,(x:rollBack) ++ retBack,) s else searchr'' srs srch' xs (x:fndSoFar) (True,,) s | otherwise = if isEmpty rollBack isEmpty retBack then if s == x
Re: [Haskell-cafe] Substring replacements
From: Daniel Fischer [EMAIL PROTECTED] To: Branimir Maksimovic [EMAIL PROTECTED] CC: Haskell-Cafe@haskell.org Subject: Re: [Haskell-cafe] Substring replacements Date: Mon, 12 Dec 2005 16:15:46 +0100 Earlier today: Sorry, but Prelude SearchRep searchReplace abaaba ## abababaaba abababaaba I haven't analyzed the algorithm, so I don't know why exactly this fails. I'll take a look sometime soon. I found the problem (one at least). Say the pattern to be replaced begins with 'a' and we have a sufficiently long match with the pattern starting at the first 'a' in the String. Upon encountering the second 'a', while the first pattern still matches, you start pushing onto the rollback-stack. But that isn't inspected anymore, so if the actual occurence of the pattern starts at the third (or fourth, n-th) occurence of 'a' and that is already pushed onto the rollback, you miss it. I've corrected this with adjusting rollback position. if rollBack is null then search for rollback starts at second character if not starts at same as searhed character because I skip what was searched. That's all. Though I'm not so sure now when I read this. So the question is, can we find a cheap test to decide whether to use KMP or Bulat's version? In real world situation your KMP will always be fastest on average. I like that we are not using C arrays as then we have advantage of lazyness and save on memory usage. C++ program will be faster on shorter strings but on this large strings will loose due memory latency. and with your test, both programs are very fast. Greetings, Bane. _ Express yourself instantly with MSN Messenger! Download today it's FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Substring replacements
From: Daniel Fischer [EMAIL PROTECTED] To: Branimir Maksimovic [EMAIL PROTECTED] CC: Haskell-Cafe@haskell.org Subject: Re: [Haskell-cafe] Substring replacements Date: Mon, 12 Dec 2005 16:15:46 +0100 Earlier today: Sorry, but Prelude SearchRep searchReplace abaaba ## abababaaba abababaaba I haven't analyzed the algorithm, so I don't know why exactly this fails. I'll take a look sometime soon. I found the problem (one at least). Say the pattern to be replaced begins with 'a' and we have a sufficiently long match with the pattern starting at the first 'a' in the String. Upon encountering the second 'a', while the first pattern still matches, you start pushing onto the rollback-stack. But that isn't inspected anymore, so if the actual occurence of the pattern starts at the third (or fourth, n-th) occurence of 'a' and that is already pushed onto the rollback, you miss it. let src = concat (replicate n abc) ++ d let str = concat (replicate (n+k) abc) ++ d then searchReplace src Success! str will work correctly iff k is congruent to 0 or 1 modulo (n+1). Oh, yes this seems the problem for searchr :( I have to look for efficient way in order to circumvent repeated searches. But since your KMP is fastest of all now, I am considering if there is any point now to correct this. And searchr ugly version that I've posted has a bug (not present in MyBane pretty version) . should be : else if sr'/=x Greetings, Bane. _ FREE pop-up blocking with the new MSN Toolbar - get it now! http://toolbar.msn.click-url.com/go/onm00200415ave/direct/01/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
RE: [Haskell-cafe] RE: Substring replacements (was: Differences inoptimisiation
From: Branimir Maksimovic [EMAIL PROTECTED] To: [EMAIL PROTECTED], [EMAIL PROTECTED] CC: Haskell-Cafe@haskell.org Subject: RE: [Haskell-cafe] RE: Substring replacements (was: Differences inoptimisiation Date: Sun, 11 Dec 2005 07:29:46 + I've found one remaining bug, and this is corrected version. Ah, I've forgot to include important optimisation, and patched around something else :) No wonder it was slow with normal test: --- searchReplace :: String-String-String - String searchReplace sr rp xs = searchr sr rp xs where searchr :: String-String-String-String - String searchr [] _ xs _ = xs searchr _ _ [] _ = [] searchr sr rp xs rollBack | fst $ fst $ fnd rollBack = rp ++ searchr sr rp (snd $ snd $ fst $ fnd rollBack ) ( snd $ fnd rollBack) | otherwise = reverse ((fst $ snd $ fst $ fnd rollBack) ++ rollBack) ++ searchr sr rp (snd $ snd $ fst $ fnd rollBack) ( snd $ fnd rollBack) where fnd = searchr' sr xs searchr' :: String-String-String-String - ((Bool,(String,String)),String) searchr' (sr:srs) xs fndSoFar rollBack = searchr'' (drop (length rollBack) (sr:srs)) xs fndSoFar (False,False,) sr searchr'' :: String-String-String-(Bool,Bool,String)-Char - ((Bool,(String,String)),String) searchr'' [] xs fnd _ _ = ((True,(fnd,xs)),) searchr'' _ [] fnd (_,_,rollBack) _ = ((False,(fnd,[])),rollBack) searchr'' (sr:srs) (x:xs) fndSoFar (cnt,f,rollBack) s | sr == x = if cnt (f || s == x) then searchr'' srs xs fndSoFar (True,True,x:rollBack) s else searchr'' srs xs (x:fndSoFar) (True,False,) s | otherwise = if not f then if s == x then ((False,(fndSoFar,x:xs)),) else ((False,searchr''' s xs (x:fndSoFar)),) else ((False,(fndSoFar, x:xs)),rollBack) searchr''' :: Char-String-String - (String,String) searchr''' sr [] fndSoFar = (fndSoFar,[]) searchr''' sr (x:xs) fndSoFar | sr/=x = searchr''' sr xs (x:fndSoFar) | otherwise = (fndSoFar,x:xs) --- _ Don't just search. Find. Check out the new MSN Search! http://search.msn.com/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Differences in optimisiation with interactive andcompiled mo
From: Daniel Fischer [EMAIL PROTECTED] To: Bulat Ziganshin [EMAIL PROTECTED] CC: Haskell-Cafe@haskell.org Subject: Re: [Haskell-cafe] Differences in optimisiation with interactive andcompiled mo Date: Sun, 11 Dec 2005 14:59:55 +0100 Am Samstag, 10. Dezember 2005 22:42 schrieb Bulat Ziganshin: Hello Branimir, Saturday, December 10, 2005, 8:29:09 PM, you wrote: Can you check this version? and this: replace from to = repl where repl s | Just remainder - start_from from s = to ++ repl remainder repl (c:cs) = c : repl cs repl [] = [] start_from (x:xs) (y:ys) | x==y = start_from xs ys start_from [] str= Just str start_from _ _ = Nothing This is the fastest, even without type signatures (those give a wee bit of extra speed). your test (unlikely in real scenario): $ time replace1 Working very long False Done real0m12.531s user0m0.015s sys 0m0.000s my test is not any more 0.25 secs with your test, becasue I've sacrifised that for my test speed :) [EMAIL PROTECTED] ~/tutorial $ time searchr Working very long True Done real0m4.000s user0m0.031s sys 0m0.000s my test: [EMAIL PROTECTED] ~/tutorial $ ghc -fglasgow-exts -O2 replace1.hs --make -o replace1.exe Chasing modules from: replace1.hs Compiling Main ( replace1.hs, replace1.o ) Linking ... [EMAIL PROTECTED] ~/tutorial $ ghc -fglasgow-exts -O2 searchr.hs --make -o searchr.exe Chasing modules from: searchr.hs Compiling Main ( searchr.hs, searchr.o ) Linking ... [EMAIL PROTECTED] ~/tutorial $ time replace1 Working:seasearch replace able seaseasearch baker seasearch charlie True Done real0m11.718s user0m0.015s sys 0m0.015s [EMAIL PROTECTED] ~/tutorial $ time searchr.exe Working:seasearch replace able seaseasearch baker seasearch charlie True Done real0m13.921s user0m0.015s sys 0m0.015s Have you any idea why using a pattern guard is faster (not much, but consistently) than the equivalent case-expression? Probably because either then function is better inlined, or there is optimised tail recursion. In any way ghc very well optimises recursion. I didn't notice any difference in speed whether function signature is tagged only for strings or is polymorphic. So we can freely make polymorphic signatures. Greetings, Bane. _ Express yourself instantly with MSN Messenger! Download today it's FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] RE: Substring replacements (was: Differences in optimisiation
From: Bulat Ziganshin [EMAIL PROTECTED] Reply-To: Bulat Ziganshin [EMAIL PROTECTED] To: Branimir Maksimovic [EMAIL PROTECTED] CC: [EMAIL PROTECTED], Haskell-Cafe@haskell.org Subject: Re: [Haskell-cafe] RE: Substring replacements (was: Differences in optimisiation ...) Date: Sun, 11 Dec 2005 14:03:45 +0300 Hello Branimir, Sunday, December 11, 2005, 5:19:22 AM, you wrote: BM After seeing your test, I've implemented full KMP algorithm, which BM is blazingly fast with your test. It is slower in mine test due excessive are you seen http://haskell.org/hawiki/RunTimeCompilation ? Yes, that's the next step I will take. can you formulate conditions when straightforward algorithm will be better and when KMP algorithm is preferred? Startighforward is better when search string is relatively short and there are lot of matches or partial matches within searched string. Exellent example when one is faster then the other is my test when your algorithm is faster and Danilel's test where KMP excels. Greetings, Bane. _ Express yourself instantly with MSN Messenger! Download today it's FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
RE: [Haskell-cafe] Re: syscall, sigpause and EINTR on Mac OSX
From: Joel Reymont [EMAIL PROTECTED] To: Haskell Cafe haskell-cafe@haskell.org Subject: [Haskell-cafe] Re: syscall, sigpause and EINTR on Mac OSX Date: Sun, 11 Dec 2005 16:33:36 + I looked at the scheduler source code and it appears that GHC goes to wait for signals when a deadlock is detected and there's nothing else to do. It still does not explain where the signal comes from when I'm away from the keyboard. This is not signal, it is result from call to pause() . #if !defined(RTS_SUPPORTS_THREADS) void awaitUserSignals(void) { while (!signals_pending() !interrupted) { pause(); this is where it stops and waits for signals } } #endif you have to look elsewhere as this is normal behavior. Strange is that you are using threaded run time (I guess ) but this function is defined only for single threaded variant. This I implied from #if !defined(RTS_SUPPORTS_THREADS) Greetings, Bane. On Dec 11, 2005, at 4:10 PM, Joel Reymont wrote: (gdb) where #0 0x90006068 in syscall () #1 0x9004420c in sigpause () #2 0x001791b8 in awaitUserSignals () at Signals.c:256 #3 0x0012e1a8 in schedule (mainThread=0x1300360, initialCapability=0x0) at Schedule.c:518 [...] My program is currently stuck here. The man pages say that sigpause will only terminate by being interrupted and EINTR will be the errno. EINTR is signal 2, the same one that I'm trapping and the one sent when ^C is pressed. -- http://wagerlabs.com/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe _ Express yourself instantly with MSN Messenger! Download today it's FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: syscall, sigpause and EINTR on Mac OSX
I've got two versions: HSrts_thr and HSrts_thr_p I don't know what's second for? but there is only one with debug in it's name. So I'm not much of a help here. Greetings, Bane. From: Joel Reymont [EMAIL PROTECTED] To: Branimir Maksimovic [EMAIL PROTECTED] CC: haskell-cafe@haskell.org Subject: Re: [Haskell-cafe] Re: syscall, sigpause and EINTR on Mac OSX Date: Sun, 11 Dec 2005 17:00:50 + Linking ... /usr/bin/ld: can't locate file for: -lHSrts_thr_debug collect2: ld returned 1 exit status How do I get a threaded+debug runtime? On Dec 11, 2005, at 4:50 PM, Branimir Maksimovic wrote: Strange is that you are using threaded run time (I guess ) but this function is defined only for single threaded variant. This I implied from #if !defined(RTS_SUPPORTS_THREADS) -- http://wagerlabs.com/ _ Express yourself instantly with MSN Messenger! Download today it's FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: syscall, sigpause and EINTR on Mac OSX
From: Joel Reymont [EMAIL PROTECTED] To: Branimir Maksimovic [EMAIL PROTECTED] CC: haskell-cafe@haskell.org Subject: Re: [Haskell-cafe] Re: syscall, sigpause and EINTR on Mac OSX Date: Sun, 11 Dec 2005 16:56:23 + On Dec 11, 2005, at 4:50 PM, Branimir Maksimovic wrote: This is not signal, it is result from call to pause() . [...] you have to look elsewhere as this is normal behavior. You are saying that triggering my ^C handler randomly is normal behavior? I understand why it goes to wait for signals but it still does not explain where the signal itself is coming from. I'm saying that neither is this result of signal, nor stack trace shows that any signal handler is called. It just shows call to await and await calls pause. That's all Greetings, Bane. Strange is that you are using threaded run time (I guess ) but this function is defined only for single threaded variant. This I implied from #if !defined(RTS_SUPPORTS_THREADS) I'm not using a threaded runtime in this case. It appears that -debug and -threaded are incompatible as I get an error about a mixed debug/ threaded runtime library not being available. I compile with -debug so that I can run +RTS -Ds to check for deadlocks. Thanks, Joel -- http://wagerlabs.com/ _ FREE pop-up blocking with the new MSN Toolbar - get it now! http://toolbar.msn.click-url.com/go/onm00200415ave/direct/01/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: syscall, sigpause and EINTR on Mac OSX
Though I frorgot to add that deadlock can be caused by signal somewhere else (eg other thread) if signal handler eg locks mutex internally or calls some other non asynchronous safe functions like locking functions. This is likely scenario, if you doubt at signal handlers but I don;t know the details . Deadlocks can be caused by other things, not neccessarily signals. Greetings, Bane. From: Branimir Maksimovic [EMAIL PROTECTED] To: [EMAIL PROTECTED] CC: haskell-cafe@haskell.org Subject: Re: [Haskell-cafe] Re: syscall, sigpause and EINTR on Mac OSX Date: Sun, 11 Dec 2005 17:28:54 + From: Joel Reymont [EMAIL PROTECTED] To: Branimir Maksimovic [EMAIL PROTECTED] CC: haskell-cafe@haskell.org Subject: Re: [Haskell-cafe] Re: syscall, sigpause and EINTR on Mac OSX Date: Sun, 11 Dec 2005 16:56:23 + On Dec 11, 2005, at 4:50 PM, Branimir Maksimovic wrote: This is not signal, it is result from call to pause() . [...] you have to look elsewhere as this is normal behavior. You are saying that triggering my ^C handler randomly is normal behavior? I understand why it goes to wait for signals but it still does not explain where the signal itself is coming from. I'm saying that neither is this result of signal, nor stack trace shows that any signal handler is called. It just shows call to await and await calls pause. That's all Greetings, Bane. Strange is that you are using threaded run time (I guess ) but this function is defined only for single threaded variant. This I implied from #if !defined(RTS_SUPPORTS_THREADS) I'm not using a threaded runtime in this case. It appears that -debug and -threaded are incompatible as I get an error about a mixed debug/ threaded runtime library not being available. I compile with -debug so that I can run +RTS -Ds to check for deadlocks. Thanks, Joel -- http://wagerlabs.com/ _ FREE pop-up blocking with the new MSN Toolbar - get it now! http://toolbar.msn.click-url.com/go/onm00200415ave/direct/01/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe _ Don't just search. Find. Check out the new MSN Search! http://search.msn.com/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: syscall, sigpause and EINTR on Mac OSX
From: Joel Reymont [EMAIL PROTECTED] To: Branimir Maksimovic [EMAIL PROTECTED] CC: haskell-cafe@haskell.org Subject: Re: [Haskell-cafe] Re: syscall, sigpause and EINTR on Mac OSX Date: Sun, 11 Dec 2005 18:25:47 + Understood. But I'm printing things in the signal handler to show that it was triggered. And I trigger it when ^C is pressed (well, one more signal): initSnippets :: IO () initSnippets = do initSSL installHandler sigPIPE Ignore Nothing flip mapM_ [sigTERM, sigINT] $ \sig - do installHandler sig (handler sig) Nothing where handler sig = Catch $ do trace_ $ Signal ++ show sig + + caught. trace_ Broadcasting Quit... broadcast (ForcedQuit :: Event ()) This way I know what the signal was that triggered the handler and I can tell that it was triggered. The deadlock is somewhere else because the handler is not being tripped. After seeing this only I can tell that for example in C++ one can't cout clog cerr or post some event via synchronized event queue or condition variable from signal handler. All of that would result in ghosts and goblins in program. Actually one can't do much at all in signal handlers in multithreaded environment, cause they don;t like each other. If you wan;t to trap ^C then I advise that you give up signal handlers and dedicate one thread to read keyboard events then post those keyboard events like you do from signal handler. That is ignore all signals, but fatal ones in which case you will just abort program (perhaps try some cleanup, if at all possible from signal handler) Hope this helps. Greetings, Bane. The issue of the signal handler being tripped by a phantom ^C (signal 2) is another issue entirely. On Dec 11, 2005, at 6:10 PM, Branimir Maksimovic wrote: Though I frorgot to add that deadlock can be caused by signal somewhere else (eg other thread) if signal handler eg locks mutex internally or calls some other non asynchronous safe functions like locking functions. This is likely scenario, if you doubt at signal handlers but I don;t know the details . Deadlocks can be caused by other things, not neccessarily signals. -- http://wagerlabs.com/ _ Express yourself instantly with MSN Messenger! Download today it's FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: syscall, sigpause and EINTR on Mac OSX
From: Joel Reymont [EMAIL PROTECTED] To: Branimir Maksimovic [EMAIL PROTECTED] CC: haskell-cafe@haskell.org Subject: Re: [Haskell-cafe] Re: syscall, sigpause and EINTR on Mac OSX Date: Sun, 11 Dec 2005 19:09:20 + What I do works so I don't see any reason to do it otherwise. Oh, I 've seen to many programs with undefined behavior floating around that appears to work.:0) Problem with mt programs is that they just appear to work but in havy load situation those errors show once a while. You have only two choices. Either to clean up code or to live with it. You didn;t properly initialize SSL as per documention and you have problem with signal handlers. Both things will work 99% of the time with spurious crashes on occasion. Now, it might work by luck and chance, by some ghc magic or otherwise, but it does work and causes me no problems. Not when I press ^C and everything shuts down cleanly. My issues are 1) A phantom sigINT that gets sent to me out of nowhere and This should be enough reason to scan for keyboard events instead. There is no guarantee that SIGINT would be sent only by keyboard. 2) A deadlock somewhere in my program that I'm trying to troubleshoot The code: type Child a = (MVar (), TMVar (ClockTime, (Event a)), MVar ThreadId) {-# NOINLINE children #-} children :: MVar [Child a] children = unsafePerformIO $ newMVar [] Just to comment about memory allocation and signals. Suppose your malloc locks internally. Let's say thread in the background performs malloc, malloc grabs the lock. Signal is raised , your handler is called , which calls malloc and boom deadlock. If malloc does implement lock free algorithm internaly you don't have a problem, but that's unlikely. broadcast :: Show a = Event a - IO () broadcast event = withMVar children $ \cs - mapM_ (post event) cs where post event (_, mbx, tmv) = do tid - readMVar tmv trace_ $ broadcast: Sending ++ show event ++ to ++ show tid time - getClockTime atomically $ putTMVar mbx (time, event) this is lock? same situation as with malloc. thread locks, then signal arrives , then handler, then deadlock. Greetings, Bane. _ Express yourself instantly with MSN Messenger! Download today it's FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: Substring replacements (was: Differences inoptimisiation
From: Daniel Fischer [EMAIL PROTECTED] To: Branimir Maksimovic [EMAIL PROTECTED] CC: Haskell-Cafe@haskell.org Subject: Re: Substring replacements (was: Differences inoptimisiation Date: Sun, 11 Dec 2005 18:12:12 +0100 Unfortunately: Prelude SearchRep searchReplace aabaabba iii aabaabaabbaa aabaabaabb I've solved this case Prelude SearchRep searchReplace abaaba - abaaabaaba abaaabaab This one is cleaned up, but searching have to be adjusted for false positive patterns. So I need some more time, as this isn't quick fix. Seemingly, your algorithm assumes that the last component of the result of search'' is the beginning of the searched for pattern reversed -- which needn't be. Yes. That is the problem. Now I have to compare if it fits with searched string. One comment on style (I like it in general): IMHO, the use of nested pairs and combinations of fst, snd is not very readable, using triples/quadruples and providing your own accessor-functions (e.g. fst3, thd4) would improve that -- it might have an impact on performance, though, that would require a test or an answer from somebody more knowledgeable. And -- I'm not sure whether that is indeed so -- if you have an argument pattern (x:xs) which may be directly returned, as in fun (x:xs) | even x = ([x],xs) | otherwise = ([],x:xs) the list would have to be reconstructed from its head and tail, which could be avoided by using an as-pattern fun xxs@(x:xs) | even x = ([x],xs) | otherwise = ([],xxs), however, that wouldn't be significant unless it happens really often and the compiler might optimise it away anyway. Thank you! This really helps . I 'll clean up my mess a bit. Greetings, Bane. _ Express yourself instantly with MSN Messenger! Download today it's FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: syscall, sigpause and EINTR on Mac OSX
From: Joel Reymont [EMAIL PROTECTED] To: Branimir Maksimovic [EMAIL PROTECTED] CC: haskell-cafe@haskell.org Subject: Re: [Haskell-cafe] Re: syscall, sigpause and EINTR on Mac OSX Date: Sun, 11 Dec 2005 23:31:44 + Allright, I _am_ convinced. How do I ready ^C from the keyboard??? If this is some daemon program you can't. Perhaps that should be a daemon. Just make console client that will read commands from keyboard and send to your program. In single threaded client you can handle ^C if you like in signal handler without problem. That can be gui program if you like , but console one should be enough. You can implement eg: status, start, pause, quit and so. For now quit will be just fine, later you can add more commands. Make one listener thread in your program for such connections and that's it. Or just use telnet (this isn't safe though), but you can restrict connections. And don't forget to mask SIGINT :) Greetings, Bane. On Dec 11, 2005, at 10:02 PM, Branimir Maksimovic wrote: This should be enough reason to scan for keyboard events instead. There is no guarantee that SIGINT would be sent only by keyboard. -- http://wagerlabs.com/ _ Express yourself instantly with MSN Messenger! Download today it's FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: syscall, sigpause and EINTR on Mac OSX
From: Joel Reymont [EMAIL PROTECTED] To: Branimir Maksimovic [EMAIL PROTECTED] CC: haskell-cafe@haskell.org Subject: Re: [Haskell-cafe] Re: syscall, sigpause and EINTR on Mac OSX Date: Sun, 11 Dec 2005 23:52:19 + On Dec 11, 2005, at 10:02 PM, Branimir Maksimovic wrote: This does not work for ^C. Can it actually be done? Of course I can just read q but that would be too simple :-). Perhaps you can implement this in Haskell? dedicate single thread to just handle ^C signal? this is how you should do it properly, but I would go with console client anyway: http://www.scit.wlv.ac.uk/cgi-bin/mansec?3T+thr_sigsetmask Greetings, Bane. Thanks, Joel -- http://wagerlabs.com/ _ Express yourself instantly with MSN Messenger! Download today it's FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
RE: [Haskell-cafe] Substring replacements
From: Daniel Fischer [EMAIL PROTECTED] To: Haskell-Cafe@haskell.org Subject: [Haskell-cafe] Substring replacements Date: Mon, 12 Dec 2005 01:14:37 +0100 Okay, I have looked up KMP and implemented it. Seems to work -- my first use of QuickCheck, too. It's slower than Bulat's and Tomasz' for Branimir's test :-(, but really fast for my test. Strange I got completelly different results: [EMAIL PROTECTED] ~/tutorial $ time srchrep.exe Working very long True Done real0m16.407s user0m0.015s sys 0m0.015s [EMAIL PROTECTED] ~/tutorial $ ghc -fglasgow-exts -O2 srchrep.hs --make -o srchrep.exe Chasing modules from: srchrep.hs Compiling Main ( srchrep.hs, srchrep.o ) Linking ... [EMAIL PROTECTED] ~/tutorial $ time srchrep.exe Working:seasearch replace able seaseasearch baker seasearch charlie True Done real0m10.156s user0m0.015s sys 0m0.015s [EMAIL PROTECTED] ~/tutorial $ time replace1.exe Working:seasearch replace able seaseasearch baker seasearch charlie True Done real0m11.672s user0m0.015s sys 0m0.015s Now your version is fastest according to my machine, but it is not faster with your test it's slower in compariton to replace1. I've corrected my code so it is fastest with your test,still less then a second, but slowest with mine. Checked with your fail tests and compared results of these 2 tests. Now should be ok. I maintan now two lists one for successes and other for failures. I also prettified code a bit . searchReplace :: String-String-String - String searchReplace sr rp xs = searchr sr rp xs where searchr :: String-String-String-String-String - String searchr [] _ xs _ _ = xs searchr _ _ [] _ _ = [] searchr sr rp xs retBack rollBack | isFound $ fnd rollBack = rp ++ searchr sr rp (remaining $ fnd rollBack ) ( getRetBack $ fnd rollBack) ( getRollBack $ fnd rollBack) | otherwise = reverse ((processed $ fnd rollBack) ++ rollBack) ++ searchr sr rp (remaining $ fnd rollBack) ( getRetBack $ fnd rollBack) ( getRollBack $ fnd rollBack) where fnd = searchr' sr sr (reverse retBack ++ xs) isFound = fst . fst remaining = snd . snd . fst getRollBack = snd . snd getRetBack = fst . snd processed = fst . snd . fst searchr' :: String-String-String-String-String - ((Bool,(String,String)),(String,String)) searchr' srch@(sr:srs) srch'@(sr':srs') xs fndSoFar rollBack = searchr'' (drop (length rollBack) srch) srch' xs fndSoFar (False,,) sr searchr'' :: String-String-String-String-(Bool,String,String)-Char - ((Bool,(String,String)),(String,String)) searchr'' [] _ xs fnd _ _ = ((True,(fnd,xs)),(,)) searchr'' _ _ [] fnd (_,retBack,rollBack) _ = ((False,(retBack ++ rollBack ++ fnd,[])),(,)) searchr'' srch@(sr:srs) srch'@(sr':srs') xxs@(x:xs) fndSoFar (cnt,retBack,rollBack) s | sr == x = if cnt sr' == x isEmpty retBack then searchr'' srs srs' xs fndSoFar (True,,x:rollBack) s else if not (isEmpty retBack) || not (isEmpty rollBack) then searchr'' srs srch' xs fndSoFar (True,(x:rollBack) ++ retBack,) s else searchr'' srs srch' xs (x:fndSoFar) (True,,) s | otherwise = if isEmpty rollBack isEmpty retBack then if s == x then ((False,(fndSoFar,xxs)),(,)) else ((False,searchr''' s xs (x:fndSoFar)),(,)) else if sr' == x isEmpty retBack then ((False,(fndSoFar, xs)), (retBack,x:rollBack)) else ((False,(fndSoFar, xxs)), (retBack,rollBack)) searchr''' :: Char-String-String - (String,String) searchr''' sr [] fndSoFar = (fndSoFar,[]) searchr''' sr xxs@(x:xs) fndSoFar | sr/=x = searchr''' sr xs (x:fndSoFar) | otherwise = (fndSoFar,xxs) isEmpty [] = True isEmpty (a:as) = False --- Greetings, Bane. _ Express yourself instantly with MSN Messenger! Download today it's FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: syscall, sigpause and EINTR on Mac OSX
From: Joel Reymont [EMAIL PROTECTED] To: Branimir Maksimovic [EMAIL PROTECTED] CC: haskell-cafe@haskell.org Subject: Re: [Haskell-cafe] Re: syscall, sigpause and EINTR on Mac OSX Date: Mon, 12 Dec 2005 02:28:54 + My client _is_ single-threaded, I do not use bound (OS) threads at all. Does this shed any light on why my OpenSSL stuff is working as well as my signal handler? ;-) If your app is single threaded you should be ok. But then nothing is executed concurrently? why locking at all then? You wouldn;t have problems with deadlocks and signals if single threaded without locking. Now, I m really puzzled. Greetings, Bane. On Dec 12, 2005, at 12:21 AM, Branimir Maksimovic wrote: In single threaded client you can handle ^C if you like in signal handler without problem. That can be gui program if you like , but console one should be enough. You can implement eg: status, start, pause, quit and so. For now quit will be just fine, later you can add more commands. Make one listener thread in your program for such connections and that's it. -- http://wagerlabs.com/ _ Don't just search. Find. Check out the new MSN Search! http://search.msn.click-url.com/go/onm00200636ave/direct/01/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: Differences in optimisiation with interactive and compiled mo
From: Daniel Fischer [EMAIL PROTECTED] To: Branimir Maksimovic [EMAIL PROTECTED] CC: Haskell-Cafe@haskell.org Subject: Re: Differences in optimisiation with interactive and compiled mo Date: Sat, 10 Dec 2005 15:11:31 +0100 Am Donnerstag, 8. Dezember 2005 19:17 schrieb Branimir Maksimovic: From: Henning Thielemann [EMAIL PROTECTED] To: Branimir Maksimovic [EMAIL PROTECTED] CC: haskell-cafe@haskell.org Subject: Re: [Haskell-cafe] Differences in optimisiation with interactive and compiled mode Date: Thu, 8 Dec 2005 18:38:45 +0100 (MET) On Thu, 8 Dec 2005, Branimir Maksimovic wrote: program performs search replace on a String http://www.haskell.org/pipermail/haskell-cafe/2005-April/009692.html This is nice and ellegant but example search replace program runs more then 50% faster with my implementation. Greetings, Bane. That's probably because Lemmih's is polymorphic. Yesterday evening, I cooked up my own version Then Lemmih's is a bit faster than mine (a bit slower, if compiled for profiling) which is still a bit faster than yours (and, profiling, yours is significantly slower than the others. I've fixed function signatures for strings only. this is my test: $ ghc -fglasgow-exts -O2 srchrep.hs --make -o srchrep.exe Chasing modules from: srchrep.hs Compiling Main ( srchrep.hs, srchrep.o ) Linking ... [EMAIL PROTECTED] ~/tutorial $ ghc -fglasgow-exts -O2 replace.hs --make -o replace.exe Chasing modules from: replace.hs Compiling Main ( replace.hs, replace.o ) Linking ... [EMAIL PROTECTED] ~/tutorial $ ghc -fglasgow-exts -O2 searchr.hs --make -o searchr.exe Chasing modules from: searchr.hs Compiling Main ( searchr.hs, searchr.o ) Linking ... [EMAIL PROTECTED] ~/tutorial $ time ./searchr. searchr.exe searchr.hi searchr.hs searchr.o [EMAIL PROTECTED] ~/tutorial $ time ./searchr.exe Working:seaseasearch replace able seaseaseasearch baker seaseasearch charlie True Done real0m12.547s user0m0.015s sys 0m0.000s [EMAIL PROTECTED] ~/tutorial $ time ./replace.exe Working:seaseasearch replace able seaseaseasearch baker seaseasearch charlie True Done real0m21.078s user0m0.015s sys 0m0.015s [EMAIL PROTECTED] ~/tutorial $ time ./srchrep.exe Working:seaseasearch replace able seaseaseasearch baker seaseasearch charlie True Done real0m12.188s user0m0.015s sys 0m0.000s Your version seems fastest. Greetings, Bane. _ Express yourself instantly with MSN Messenger! Download today it's FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Differences in optimisiation with interactive and compiled mo
From: Tomasz Zielonka [EMAIL PROTECTED] To: Henning Thielemann [EMAIL PROTECTED] CC: Branimir Maksimovic [EMAIL PROTECTED], haskell-cafe@haskell.org Subject: Re: [Haskell-cafe] Differences in optimisiation with interactive and compiled mode Date: Sat, 10 Dec 2005 15:36:57 +0100 On Sat, Dec 10, 2005 at 03:29:49PM +0100, Tomasz Zielonka wrote: On Sat, Dec 10, 2005 at 03:24:56PM +0100, Tomasz Zielonka wrote: *SearchRepl replace ab ba ab ba It also shows that your implementation is not lazy, so it couldn't be used for infinite lists. In some situations, even for short patterns, it just has to check the whole input list to produce the first element of output list. Here is my implementation replace src dst = repl where repl input = dst ++ repl (drop (length src) input) repl (x:xs) = x : repl xs repl [] = [] which should be infinite list friendly. Nice code. But it takes lot of ram (1GB is not enough )and can't execute my test. Other versions don't take that much ram. Actually just 2mb each but yours somehow didn't work well with optimiser. I don't know why. Greetings, Bane. _ Don't just search. Find. Check out the new MSN Search! http://search.msn.click-url.com/go/onm00200636ave/direct/01/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Differences in optimisiation with interactive and compiled mo
From: Tomasz Zielonka [EMAIL PROTECTED] To: Branimir Maksimovic [EMAIL PROTECTED] CC: [EMAIL PROTECTED], haskell-cafe@haskell.org Subject: Re: [Haskell-cafe] Differences in optimisiation with interactive and compiled mo Date: Sat, 10 Dec 2005 18:14:58 +0100 On Sat, Dec 10, 2005 at 04:14:20PM +, Branimir Maksimovic wrote: Nice code. But incorrect. I have broken it when refactoring :-/ Here is the correct version: replace2 src dst = repl where repl input | src `isPrefixOf` input = dst ++ repl (drop (length src) input) repl (x:xs) = x : repl xs repl [] = [] But it takes lot of ram (1GB is not enough )and can't execute my test. Can you check this version? It's ok now; 2 megs like other versions. It's just about 1.5 seconds slower then mine version and Daniels version is a bit faster the mine. [EMAIL PROTECTED] ~/tutorial $ time ./replace1.exe Working:seaseasearch replace able seaseaseasearch baker seaseasearch charlie True Done real0m14.140s user0m0.015s sys 0m0.000s Greetings, Bane. _ Express yourself instantly with MSN Messenger! Download today it's FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
RE: Re[2]: [Haskell-cafe] Differences in optimisiation with interactive and comp
From: Bulat Ziganshin [EMAIL PROTECTED] Reply-To: Bulat Ziganshin [EMAIL PROTECTED] To: Branimir Maksimovic [EMAIL PROTECTED] CC: [EMAIL PROTECTED], [EMAIL PROTECTED],haskell-cafe@haskell.org Subject: Re[2]: [Haskell-cafe] Differences in optimisiation with interactive and compiled mo Date: Sun, 11 Dec 2005 00:42:24 +0300 Hello Branimir, Saturday, December 10, 2005, 8:29:09 PM, you wrote: Can you check this version? and this: replace from to = repl where repl s | Just remainder - start_from from s = to ++ repl remainder repl (c:cs) = c : repl cs repl [] = [] start_from (x:xs) (y:ys) | x==y = start_from xs ys start_from [] str= Just str start_from _ _ = Nothing This one is fastest,not much, but is. So here it goes: your version, then Daniel's, then mine. Greetings, Bane. _ Express yourself instantly with MSN Messenger! Download today it's FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Differences in optimisiation with interactive and compiled mo
From: Daniel Fischer [EMAIL PROTECTED] To: Branimir Maksimovic [EMAIL PROTECTED] CC: Haskell-Cafe@haskell.org Subject: Re: [Haskell-cafe] Differences in optimisiation with interactive and compiled mo Date: Sat, 10 Dec 2005 23:56:28 +0100 Am Samstag, 10. Dezember 2005 18:29 schrieb Branimir Maksimovic: From: Tomasz Zielonka [EMAIL PROTECTED] To: Branimir Maksimovic [EMAIL PROTECTED] CC: [EMAIL PROTECTED], haskell-cafe@haskell.org Subject: Re: [Haskell-cafe] Differences in optimisiation with interactive and compiled mo Date: Sat, 10 Dec 2005 18:14:58 +0100 On Sat, Dec 10, 2005 at 04:14:20PM +, Branimir Maksimovic wrote: Nice code. But incorrect. I have broken it when refactoring :-/ Here is the correct version: replace2 src dst = repl where repl input | src `isPrefixOf` input = dst ++ repl (drop (length src) input) repl (x:xs) = x : repl xs repl [] = [] But it takes lot of ram (1GB is not enough )and can't execute my test. Can you check this version? It's ok now; 2 megs like other versions. It's just about 1.5 seconds slower then mine version and Daniels version is a bit faster the mine. [EMAIL PROTECTED] ~/tutorial $ time ./replace1.exe Working:seaseasearch replace able seaseaseasearch baker seaseasearch charlie True Done real0m14.140s user0m0.015s sys 0m0.000s Greetings, Bane. On my thingy, Tomasz' version is a bit faster than my version of the same algorithm for seasea..., and a bit slower for ... and this algorithm is definitely the fastest submitted. Odd that your timings are different (in order) -- maybe it's something about Linux vs. Windows? Well, I have same results on linux , though I just compared two versions. I think that it because I uise -O2 flag. mine version does benefit with it (perhaps less temporaries and more inlines? ) What's the difference between -O2 and -O because I see noticable difference with my version? Greetings, Bane. _ Express yourself instantly with MSN Messenger! Download today it's FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] RE: Substring replacements (was: Differences in optimisiation ...)
From: Daniel Fischer [EMAIL PROTECTED] To: Branimir Maksimovic [EMAIL PROTECTED] CC: Haskell-Cafe@haskell.org Subject: Substring replacements (was: Differences in optimisiation ...) Date: Sat, 10 Dec 2005 22:56:10 +0100 and if you try it on main2 :: IO () main2 = let src = replicate 1000 'r' dst = # str = replicate 999 'r' ++ 'c': replicate 1000 'r' out = replace src dst $ concat $ replicate 500 str out1 = replace src dst $ concat $ replicate 501 str in do putStrLn $ Working very long putStrLn $ show (out == out1) ++ \nDone you'll see a real difference. I'm not sure, why your algorithm pays a so much higher penalty, though. Maybe, it'll be faster if you make searchr' c local functions? I'll try. Well, this is on my machine with your setup and -O2 flag: $ time replace.exe Working very long False Done real0m31.828s user0m0.015s sys 0m0.000s [EMAIL PROTECTED] ~/tutorial $ time searchr.exe Working very long False Done real0m37.531s user0m0.015s sys 0m0.000s [EMAIL PROTECTED] ~/tutorial $ time srchrep.exe Working very long False Done real0m18.047s user0m0.015s sys 0m0.015s [EMAIL PROTECTED] ~/tutorial $ time replace1.exe Working very long False Done real0m12.531s user0m0.015s sys 0m0.000s replace1 is Bulat's newest algorithm. It is really incredibly fastest with this setup. Greetings, Bane. _ FREE pop-up blocking with the new MSN Toolbar - get it now! http://toolbar.msn.click-url.com/go/onm00200415ave/direct/01/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] RE: Substring replacements (was: Differences in optimisiation ...)
After seeing your test, I've implemented full KMP algorithm, which is blazingly fast with your test. It is slower in mine test due excessive temporaries I guess, but perhaps you can help me to make it better as I'm just Haskell newbie. You can see that by my code :0) Especially I'm clumsy with passing parameters around. main :: IO () main =let src = replicate 1000 'r' dst = # str = replicate 999 'r' ++ 'c': replicate 1000 'r' out = searchReplace src dst $ concat $ replicate 500 str out1 = searchReplace src dst $ concat $ replicate 501 str in do putStrLn $ Working very long putStrLn $ show (out == out1) ++ \nDone --- searchReplace :: String-String-String - String searchReplace sr rp xs = searchr sr rp xs searchr :: String-String-String-String - String searchr [] _ xs _ = xs searchr _ _ [] _ = [] searchr sr rp xs rollBack | fst $ fst fnd = rp ++ searchr sr rp (snd $ snd $ fst fnd) (snd fnd) | otherwise = reverse ((fst $ snd $ fst $ fnd) ++ rollBack) ++ searchr sr rp (snd $ snd $ fst fnd) (snd fnd) where fnd = searchr' (drop (length rollBack) sr) xs searchr' :: String-String-String - ((Bool,(String,String)),String) searchr' (sr:srs) xs fndSoFar = searchr'' (sr:srs) xs fndSoFar (False,False,) sr searchr'' :: String-String-String-(Bool,Bool,String)-Char - ((Bool,(String,String)),String) searchr'' [] xs fnd _ _ = ((True,(fnd,xs)),) searchr'' _ [] fnd _ _ = ((False,(fnd,[])),) searchr'' (sr:srs) (x:xs) fndSoFar (cnt,f,rollBack) s | sr == x = if cnt (f || s == x) then searchr'' srs xs fndSoFar (True,True,x:rollBack) s else searchr'' srs xs (x:fndSoFar) (True,False,) s | otherwise = if not f then ((False,searchr''' s (x:xs) fndSoFar),) else ((False,(fndSoFar,x:xs)),rollBack) searchr''' :: Char-String-String - (String,String) searchr''' sr [] fndSoFar = (fndSoFar,[]) searchr''' sr (x:xs) fndSoFar | sr/=x = searchr''' sr xs (x:fndSoFar) | otherwise = (fndSoFar,x:xs) --- Optimiser works extremilly well with this version in combination with your test: $ ghc -fglasgow-exts -O2 searchr.hs --make -o searchr.exe Chasing modules from: searchr.hs Compiling Main ( searchr.hs, searchr.o ) Linking ... [EMAIL PROTECTED] ~/tutorial $ time searchr.exe Working very long False Done real0m0.250s user0m0.031s sys 0m0.000s Wow, just 0.25 seconds! No c++ program can approach near that! Perhaps I have bug somewhere but I've compared results with yours searchrep and seems same. Greetings, Bane. From: Daniel Fischer [EMAIL PROTECTED] To: Branimir Maksimovic [EMAIL PROTECTED] CC: Haskell-Cafe@haskell.org Subject: Substring replacements (was: Differences in optimisiation ...) Date: Sat, 10 Dec 2005 22:56:10 +0100 Am Samstag, 10. Dezember 2005 02:51 schrieben Sie: From: Daniel Fischer [EMAIL PROTECTED] To: Branimir Maksimovic [EMAIL PROTECTED] CC: Haskell-Cafe@haskell.org Subject: Re: Differences in optimisiation with interactive and compiled mo Date: Fri, 9 Dec 2005 23:27:00 +0100 Still doesn't work, though: *Main searchr hahal jupp hahahalala hahahalala The problem is that the string to replace may contain a repeated pattern and the pattern that begins the actual occurence might be consumed before a failure is detected. Yes, I've corrected it. Now it is just 25% faster and that is only with -O2 flag. Here is whole thing, I hope there are no more bugs left :) : None that sprang to my eyes. However, on my machine, yours is not faster than Lemmih's. Now, using the new Strings, I get the following times: -O2 -O1 no opt Lemmih's: 38.9 sec38.9 sec76.7 sec Yours : 40.1 sec 41.5 sec 131.1 sec Mine : 32.9 sec 33.1 sec82.8 sec. However, there's a problem with Lemmih's replace: *Main searchr ababcab ### ababcababcabab ###abcab *Main replace ababcab ### ababcababcabab ababc###ab due to the fact that Lemmih's version scans the input from right to left (that's easily changed by a few reverses, though -- but costly for long inputs), more serious is Prelude Main replace ja aja jjja ajajajajajajaja. The fastest -- and nicely simple above -- that I could come up with is replace :: String - String - String - String replace _ _ = replace _ str = str replace src dst inp = process inp where n = length src process = process st@(c:cs) | src `isPrefixOf` st = dst ++ process (drop n st
RE: [Haskell-cafe] RE: Substring replacements (was: Differences inoptimisiation
I've found one remaining bug, and this is corrected version. Now it is fastest with your test (still 0.25 seconds), but undoubtly slowest with mine:0) But I crafted this test to be really rigorous to mine implementation. Lot of replaces, repated patterns and so. In real world situtaion it will perform much better, I hope. so here it is: --- main :: IO () main =let src = replicate 1000 'r' dst = # str = replicate 999 'r' ++ 'c': replicate 1000 'r' out = searchReplace src dst $ concat $ replicate 500 str out1 = searchReplace src dst $ concat $ replicate 500 str in do putStrLn $ Working very long putStrLn $ show (out == out1) ++ \nDone --- searchReplace :: String-String-String - String searchReplace sr rp xs = searchr sr rp xs where searchr :: String-String-String-String - String searchr [] _ xs _ = xs searchr _ _ [] _ = [] searchr sr rp xs rollBack | fst $ fst $ fnd = rp ++ searchr sr rp (snd $ snd $ fst $ fnd ) ( snd fnd ) | otherwise = reverse ((fst $ snd $ fst $ fnd ) ++ rollBack) ++ searchr sr rp (snd $ snd $ fst $ fnd) ( snd fnd) where fnd = searchr' (drop (length rollBack) sr) xs searchr' :: String-String-String - ((Bool,(String,String)),String) searchr' (sr:srs) xs fndSoFar = searchr'' (sr:srs) xs fndSoFar (False,False,) sr searchr'' :: String-String-String-(Bool,Bool,String)-Char - ((Bool,(String,String)),String) searchr'' [] xs fnd _ _ = ((True,(fnd,xs)),) searchr'' _ [] fnd (_,_,rollBack) _ = ((False,(fnd,[])),rollBack) searchr'' (sr:srs) (x:xs) fndSoFar (cnt,f,rollBack) s | sr == x = if cnt (f || s == x) then searchr'' srs xs fndSoFar (True,True,x:rollBack) s else searchr'' srs xs (x:fndSoFar) (True,False,) s | otherwise = if not f then if s == x then ((False,(fndSoFar,x:xs)),) else ((False,searchr''' s xs (x:fndSoFar)),) else if s == x getFst rollBack == s then ((False,(fndSoFar, xs)),x:rollBack) else ((False,(fndSoFar,x:xs)),rollBack) searchr''' :: Char-String-String - (String,String) searchr''' sr [] fndSoFar = (fndSoFar,[]) searchr''' sr (x:xs) fndSoFar -- | sr/=x = searchr''' sr xs (x:fndSoFar) | otherwise = (fndSoFar,x:xs) getFst (a:as) = a; --- From: Branimir Maksimovic [EMAIL PROTECTED] To: [EMAIL PROTECTED] CC: Haskell-Cafe@haskell.org Subject: [Haskell-cafe] RE: Substring replacements (was: Differences inoptimisiation ...) Date: Sun, 11 Dec 2005 02:19:22 + After seeing your test, I've implemented full KMP algorithm, which is blazingly fast with your test. It is slower in mine test due excessive temporaries I guess, but perhaps you can help me to make it better as I'm just Haskell newbie. You can see that by my code :0) Especially I'm clumsy with passing parameters around. main :: IO () main =let src = replicate 1000 'r' dst = # str = replicate 999 'r' ++ 'c': replicate 1000 'r' out = searchReplace src dst $ concat $ replicate 500 str out1 = searchReplace src dst $ concat $ replicate 501 str in do putStrLn $ Working very long putStrLn $ show (out == out1) ++ \nDone --- searchReplace :: String-String-String - String searchReplace sr rp xs = searchr sr rp xs searchr :: String-String-String-String - String searchr [] _ xs _ = xs searchr _ _ [] _ = [] searchr sr rp xs rollBack | fst $ fst fnd = rp ++ searchr sr rp (snd $ snd $ fst fnd) (snd fnd) | otherwise = reverse ((fst $ snd $ fst $ fnd) ++ rollBack) ++ searchr sr rp (snd $ snd $ fst fnd) (snd fnd) where fnd = searchr' (drop (length rollBack) sr) xs searchr' :: String-String-String - ((Bool,(String,String)),String) searchr' (sr:srs) xs fndSoFar = searchr'' (sr:srs) xs fndSoFar (False,False,) sr searchr'' :: String-String-String-(Bool,Bool,String)-Char - ((Bool,(String,String)),String) searchr'' [] xs fnd _ _ = ((True,(fnd,xs)),) searchr'' _ [] fnd _ _ = ((False,(fnd
[Haskell-cafe] Re: Differences in optimisiation with interactive and compiled mo
From: Daniel Fischer [EMAIL PROTECTED] To: Branimir Maksimovic [EMAIL PROTECTED] CC: Haskell-Cafe@haskell.org Subject: Re: Differences in optimisiation with interactive and compiled mo Date: Fri, 9 Dec 2005 23:27:00 +0100 Still doesn't work, though: *Main searchr hahal jupp hahahalala hahahalala The problem is that the string to replace may contain a repeated pattern and the pattern that begins the actual occurence might be consumed before a failure is detected. Yes, I've corrected it. Now it is just 25% faster and that is only with -O2 flag. Here is whole thing, I hope there are no more bugs left :) : module Main where import IO import List main = do hSetBuffering stdout LineBuffering let sr = seasearch rp = replace str= able seaseaseasearch baker ssseasearch charlie out = searchr sr rp (take (100*(length str)) $ cycle str) out1 = replace sr rp (take (100*(length str)) $ cycle str) putStrLn $ Working: ++ sr ++ ++ rp ++ ++ str putStrLn $ (show (out == out1)) ++ \nDone\n {- search replace able search baker search charlie -} --- --infinite xs = xs ++ infinite xs searchr :: String-String-String - String searchr [] _ xs = xs --searchr _ [] xs = xs searchr _ _ [] = [] searchr sr rp xs | fst fnd = rp ++ searchr sr rp (snd $ snd fnd) | otherwise = (reverse $ fst $ snd fnd) ++ searchr sr rp (snd $ snd fnd) where fnd = searchr' sr xs searchr' :: String-String-String - (Bool,(String,String)) searchr' (sr:srs) xs fndSoFar = searchr'' (sr:srs) xs fndSoFar (False,False,) sr searchr'' :: String-String-String-(Bool,Bool,String)-Char - (Bool,(String,String)) searchr'' [] xs fnd _ _ = (True,(fnd,xs)) searchr'' _ [] fnd _ _ = (False,(fnd,[])) searchr'' (sr:srs) (x:xs) fndSoFar (cnt,f,rollBack) s | sr == x = if cnt (f || s == x) then searchr'' srs xs fndSoFar (True,True,x:rollBack) s else searchr'' srs xs (x:fndSoFar) (True,f,rollBack) s | otherwise = if not f then (False,searchr''' s (x:xs) fndSoFar) else (False,(fndSoFar,(reverse rollBack)++(x:xs))) searchr''' :: Char-String-String - (String,String) searchr''' sr [] fndSoFar = (fndSoFar,[]) searchr''' sr (x:xs) fndSoFar | sr/=x = searchr''' sr xs (x:fndSoFar) | otherwise = (fndSoFar,x:xs) --- replace :: forall a. (Eq a) = [a] - [a] - [a] - [a] replace src dst = foldr (\x xs - let y=x:xs in if isPrefixOf src y then dst ++ drop (length src) y else y) [] And is *Main searchr bla remove bla bla remove bla bla really intended? Originaly yes, but I've changed that now. Greetings, Bane. Cheers, Daniel Am Freitag, 9. Dezember 2005 10:24 schrieb Branimir Maksimovic: From: Henning Thielemann [EMAIL PROTECTED] To: Branimir Maksimovic [EMAIL PROTECTED] CC: haskell-cafe@haskell.org Subject: Re: [Haskell-cafe] Differences in optimisiation with interactive and compiled mo Date: Fri, 9 Dec 2005 09:23:53 +0100 (MET) On Thu, 8 Dec 2005, Branimir Maksimovic wrote: From: Henning Thielemann [EMAIL PROTECTED] To: Branimir Maksimovic [EMAIL PROTECTED] CC: haskell-cafe@haskell.org Subject: Re: [Haskell-cafe] Differences in optimisiation with interactive and compiled mode Date: Thu, 8 Dec 2005 18:38:45 +0100 (MET) On Thu, 8 Dec 2005, Branimir Maksimovic wrote: program performs search replace on a String http://www.haskell.org/pipermail/haskell-cafe/2005-April/009692.html This is nice and ellegant but example search replace program runs more then 50% faster with my implementation. Is this intended: *SearchReplace searchr ha lo hha hha ? thanks, this is a bug. I over optimised it :) that should be : searchr'' (sr:srs) (x:xs) fndSoFar s | sr == x = searchr'' srs xs (x:fndSoFar) s | otherwise = (False,searchr''' s | (x:xs) fndSoFar) instead of searchr'' (sr:srs) (x:xs) fndSoFar s | sr == x = searchr'' srs xs xxs s | otherwise = (False,searchr''' s xs xxs) where xxs = x:fndSoFar Just to say my algorithm takes some optimisation opportunities. For example if search replace able search baker search charlie then it will run much faster then if able search baker search charlie Worst case is repetitive first mathing character, but than it is fast as normal implementation. Greetings, Bane. _ Don't just search. Find. Check out the new MSN Search! http://search.msn.click
Re: [Haskell-cafe] Mixing C++ and Haskell, OpenSSL thread safety, and using mmap
From: Joel Reymont [EMAIL PROTECTED] To: Branimir Maksimovic [EMAIL PROTECTED] CC: haskell-cafe@haskell.org Subject: Re: [Haskell-cafe] Mixing C++ and Haskell, OpenSSL thread safety, and using mmap Date: Thu, 8 Dec 2005 09:21:08 + I use OpenSSL in a heavily threaded environment. It works without extra locking. I do not use bound (OS) threads, though. If code executes concurrently that means you have a problem with OpenSSL for sure. Probably it works now because SSL calls are not concurrent or so, but I wouldn't risk about it as I am sure that you would have problems with that if calls to SSL functions are concurrent. Also, I think that original problem of gethostbyname just hides real SSL problem as you've probably locked around that too, but you can't be really sure. so either lock around every SSL call with global mutex or set locks in C module then call Haskell or setup callbacks from Haskell, whichever way you prefer. Greetings, Bane. On Dec 8, 2005, at 7:06 AM, Branimir Maksimovic wrote: First I want to say about OpenSSL thread safety. It is not thread safe by default. Who wants to import and use OpenSLL functions with FFI, have to set locking hooks for it, or else spurious crashes with useless stack trace will result. Higher level of concurrency, more likely crash will happen. -- http://wagerlabs.com/ _ Express yourself instantly with MSN Messenger! Download today it's FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe