Re: [Haskell-cafe] Much faster complex monad stack based on CPS state
On 27 September 2011 01:07, Nicu Ionita nicu.ion...@acons.at wrote: I wonder why the transformers library does not use this kind of state monad definition. One disadvantage of ContT and I guess any CPS based monad transformer is that they interact badly with exception handling functions like catch. See [1] for a bug that was caused because of using catch in ContT: Because of this reason I don't provide a MonadTransControl instance for ContT in monad-control[2]. Regards, Bas [1] http://thread.gmane.org/gmane.comp.lang.haskell.cafe/76262/ [2] http://hackage.haskell.org/package/monad-control ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Much faster complex monad stack based on CPS state
Interesting, so what have you used to get that speedup? A monad stack of ContT and State (*)? Just the Cont monad? (*) If so, were you using the strict version of State? Would it be possible to see the differences between the 2 versions of you code? 2011/9/27 Nicu Ionita nicu.ion...@acons.at Hello list, Starting from this emails (http://web.archiveorange.com/** archive/v/nDNOvSM4JT3GJRSjOm9Phttp://web.archiveorange.com/archive/v/nDNOvSM4JT3GJRSjOm9P **) I could refactor my code (a UCI chess engine, with complex functions, in which the search has a complex monad stack) to run twice as fast as with even some hand unroled state transformer! So from 23-24 kilo nodes per second it does now 45 to 50 kNps! And it looks like there is still some improvement room (I have to play a little bit with strictness annotations and so on). (Previously I tried specializations, then I removed a lot of polimorphism, but nothing helped, it was like hitting a wall.) Even more amazingly is that I could program it although I cannot really understand the Cont ContT, but just taking the code example from Ryan Ingram (newtype ContState r s a = ...) and looking a bit at the code from ContT (from the transformers library), and after fixing some compilation errors, it worked and was so fast. I wonder why the transformers library does not use this kind of state monad definition. Or does it, and what I got is just because of the unrolling? Are there monad (transformers) libraries which are faster? I saw the library kan-extensions but I did not understand (yet) how to use it. Nicu __**_ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/**mailman/listinfo/haskell-cafehttp://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Much faster complex monad stack based on CPS state
Bas van Dijk v.dijk@gmail.com wrote: Because of this reason I don't provide a MonadTransControl instance for ContT in monad-control[2]. Is that even possible? I tried hard to come up with just a MonadFix instance for CPS-based monads, and I failed. I would think that MonadTransControl is just as hard, if not even harder. Greets, Ertugrul -- nightmare = unsafePerformIO (getWrongWife = sex) http://ertes.de/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Much faster complex monad stack based on CPS state
On 28 September 2011 14:25, Ertugrul Soeylemez e...@ertes.de wrote: Bas van Dijk v.dijk@gmail.com wrote: Because of this reason I don't provide a MonadTransControl instance for ContT in monad-control[2]. Is that even possible? I once tried and failed so I believe it's not possible. Bas ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Much faster complex monad stack based on CPS state
Well, you can get something close with the help of IORefs, but I forgot the details. I believe this is the paper that explains it: Value recursion in the continuation monad by Magnus Carlsson http://www.carlssonia.org/ogi/mdo-callcc.pdf On 28 September 2011 15:15, Bas van Dijk v.dijk@gmail.com wrote: On 28 September 2011 14:25, Ertugrul Soeylemez e...@ertes.de wrote: Bas van Dijk v.dijk@gmail.com wrote: Because of this reason I don't provide a MonadTransControl instance for ContT in monad-control[2]. Is that even possible? I once tried and failed so I believe it's not possible. Bas ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe -- Push the envelope. Watch it bend. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Much faster complex monad stack based on CPS state
Am 28.09.2011 14:05, schrieb Yves Parès: Interesting, so what have you used to get that speedup? A monad stack of ContT and State (*)? Just the Cont monad? This is a module with a state monad transformer that I used before (the name STPlus is misleading - and sorry for the long email): {-# LANGUAGE RankNTypes, MultiParamTypeClasses, FlexibleInstances #-} module Search.SearchMonad ( STPlus, return, (=), get, put, gets, modify, lift, liftIO, runSearch, execSearch ) where import Control.Monad import Control.Monad.State hiding (lift, gets, modify) newtype STPlus s m a = STPlus { runSTPlus :: s - m (a, s) } {-# INLINE runSTPlus #-} instance Monad m = Monad (STPlus s m) where {-# INLINE return #-} return v = STPlus (\s - return (v, s)) {-# INLINE (=) #-} (=)= bindSTPlus {-# INLINE bindSTPlus #-} bindSTPlus :: Monad m = STPlus s m a - (a - STPlus s m b) - STPlus s m b bindSTPlus ms f = STPlus $ \s - case runSTPlus ms s of m - m = \(v', s') - case f v' of fv - runSTPlus fv s' instance Monad m = MonadState s (STPlus s m) where {-# INLINE get #-} get = STPlus $ \s - return (s, s) {-# INLINE put #-} put s = STPlus $ \_ - return ((), s) instance MonadIO m = MonadIO (STPlus s m) where {-# INLINE liftIO #-} liftIO = lift . liftIO runSearch :: Monad m = STPlus s m a - s - m (a, s) runSearch = runSTPlus execSearch ms s = liftM snd $ runSearch ms s {-# INLINE lift #-} lift :: Monad m = m a - STPlus s m a lift m = STPlus $ \s - m = \v - return (v, s) {-# INLINE gets #-} gets :: Monad m = (s - a) - STPlus s m a -- gets f = STPlus $ \s - return (f s, s) gets f = STPlus $ \s - case f s of fs - return (fs, s) {-# INLINE modify #-} modify :: Monad m = (s - s) - STPlus s m () modify f = STPlus $ \s - case f s of fs - return ((), fs) And this is how the module looks now: {-# LANGUAGE RankNTypes, MultiParamTypeClasses, FlexibleInstances #-} module Search.SearchMonadCPS ( STPlus, return, (=), get, put, gets, modify, lift, liftIO, runSearch, execSearch ) where import Control.Monad import Control.Monad.State hiding (lift, gets, modify) newtype STPlus r s m a = STPlus { runSTPlus :: s - (a - s - m r) - m r } instance Monad (STPlus r s m) where return a = STPlus $ \s k - k a s c = f = STPlus $ \s0 k - runSTPlus c s0 $ \a s1 - runSTPlus (f a) s1 k instance MonadState s (STPlus r s m) where get = STPlus $ \s k - k s s put s = STPlus $ \_ k - k () s instance MonadIO m = MonadIO (STPlus r s m) where {-# INLINE liftIO #-} liftIO = lift . liftIO runSearch :: Monad m = STPlus (a, s) s m a - s - m (a, s) runSearch c s = runSTPlus c s $ \a s0 - return (a, s0) execSearch ms s = liftM snd $ runSearch ms s {-# INLINE lift #-} lift :: Monad m = m a - STPlus r s m a lift m = STPlus $ \s k - m = \a - k a s {-# INLINE gets #-} gets :: Monad m = (s - a) - STPlus r s m a gets f = STPlus $ \s k - k (f s) s {-# INLINE modify #-} modify :: Monad m = (s - s) - STPlus r s m () modify f = STPlus $ \s k - k () (f s) And then I have (in different modules): Client code (starting an PV search to a given depth): type CtxIO = ReaderT Context IO bestMoveCont :: Int - MyState - Maybe Int - [Move] - [Move] - CtxIO IterResult bestMoveCont ... = do ... ((sc, path, rmvsf), statf) - runSearch (alphaBeta abc) stati ... Search framework: class Monad m = Node m where staticVal :: m Int -- static evaluation of a node materVal :: m Int -- material evaluation (for prune purpose) genEdges :: Int - Int - Bool - m ([Move], [Move]) -- generate all legal edges genTactEdges :: m [Move] -- generate all edges in tactical positions ... type Search m a = forall r. STPlus r PVState m a alphaBeta :: Node m = ABControl - m (Int, [Move], [Move]) alphaBeta abc = do let !d = maxdepth abc rmvs = Alt $ rootmvs abc lpv = Seq $ lastpv abc searchReduced a b = pvRootSearch a b d lpv rmvs True searchFull= pvRootSearch salpha0 sbeta0 d lpv rmvs False r - if useAspirWin ... pvRootSearch :: Node m = Int - Int - Int - Seq Move - Alt Move - Bool - Search m (Int, Seq Move, Alt Move) ... And then the chess specific implementation of the game state in another module: type Game r m = STPlus r MyState m ... instance CtxMon m = Node (Game r m) where staticVal = staticVal0 materVal = materVal0 genEdges = genMoves ... genMoves :: CtxMon m = Int - Int - Bool - Game r m ([Move], [Move]) genMoves depth absdp pv = do Nicu (*) If so, were you using the strict version of State? Would it be possible to see the differences between the 2 versions of you code? 2011/9/27 Nicu Ionita nicu.ion...@acons.at mailto:nicu.ion...@acons.at Hello list, Starting from this emails
Re: [Haskell-cafe] Much faster complex monad stack based on CPS state
Am 28.09.2011 02:35, schrieb Ryan Ingram: My guess is that Cont plays really nicely with GHC's inliner, so things that end up looking like return x = \y - ... get optimized really well return x = f -- inline = = ContState $ \s0 k - runCS (return x) s0 $ \a s1 - runCS (f a) s1 k -- inline return = ContState $ \s0 k - runCS (ContState $ \s2 k2 - k2 x s2) s0 $ \a s1 - runCS (f a) s1 k -- runCS record selector = ContState $ \s0 k - (\s2 k2 - k2 x s2) s0 $ \a s1 - runCS (f a) s1 k -- beta = ContState $ \s0 k - (\k2 - k2 x s0) $ \a s1 - runCS (f a) s1 k -- beta = ContState $ \s0 k - (\a s1 - runCS (f a) s1 k) x s0 -- beta = ContState $ \s0 k - runCS (f x) s0 k and then further inlining of f can take place. I was even thinking - and this would have been the next idea to try if I couldn't get your example code to run so fast - to define some rules for the state monad (transformer) to fuse such expressions like m = f = g = ... or even modify f = modify g = modify (g . f) and perhaps other variations, so that it would perhaps end up in some nice combination of f and g, avoiding the intermediate tuples, hopefully with better performance. But then I did not follow it, and I want to concentrate on further improvements with the new code. The way is still long, because the top engines (written in C or C++) can do about 10 mil nps on my machine :-) Nicu ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Much faster complex monad stack based on CPS state
My guess is that Cont plays really nicely with GHC's inliner, so things that end up looking like return x = \y - ... get optimized really well return x = f -- inline = = ContState $ \s0 k - runCS (return x) s0 $ \a s1 - runCS (f a) s1 k -- inline return = ContState $ \s0 k - runCS (ContState $ \s2 k2 - k2 x s2) s0 $ \a s1 - runCS (f a) s1 k -- runCS record selector = ContState $ \s0 k - (\s2 k2 - k2 x s2) s0 $ \a s1 - runCS (f a) s1 k -- beta = ContState $ \s0 k - (\k2 - k2 x s0) $ \a s1 - runCS (f a) s1 k -- beta = ContState $ \s0 k - (\a s1 - runCS (f a) s1 k) x s0 -- beta = ContState $ \s0 k - runCS (f x) s0 k and then further inlining of f can take place. On Mon, Sep 26, 2011 at 4:07 PM, Nicu Ionita nicu.ion...@acons.at wrote: Hello list, Starting from this emails (http://web.archiveorange.com/** archive/v/nDNOvSM4JT3GJRSjOm9Phttp://web.archiveorange.com/archive/v/nDNOvSM4JT3GJRSjOm9P **) I could refactor my code (a UCI chess engine, with complex functions, in which the search has a complex monad stack) to run twice as fast as with even some hand unroled state transformer! So from 23-24 kilo nodes per second it does now 45 to 50 kNps! And it looks like there is still some improvement room (I have to play a little bit with strictness annotations and so on). (Previously I tried specializations, then I removed a lot of polimorphism, but nothing helped, it was like hitting a wall.) Even more amazingly is that I could program it although I cannot really understand the Cont ContT, but just taking the code example from Ryan Ingram (newtype ContState r s a = ...) and looking a bit at the code from ContT (from the transformers library), and after fixing some compilation errors, it worked and was so fast. I wonder why the transformers library does not use this kind of state monad definition. Or does it, and what I got is just because of the unrolling? Are there monad (transformers) libraries which are faster? I saw the library kan-extensions but I did not understand (yet) how to use it. Nicu __**_ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/**mailman/listinfo/haskell-cafehttp://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Much faster complex monad stack based on CPS state
Hello list, Starting from this emails (http://web.archiveorange.com/archive/v/nDNOvSM4JT3GJRSjOm9P) I could refactor my code (a UCI chess engine, with complex functions, in which the search has a complex monad stack) to run twice as fast as with even some hand unroled state transformer! So from 23-24 kilo nodes per second it does now 45 to 50 kNps! And it looks like there is still some improvement room (I have to play a little bit with strictness annotations and so on). (Previously I tried specializations, then I removed a lot of polimorphism, but nothing helped, it was like hitting a wall.) Even more amazingly is that I could program it although I cannot really understand the Cont ContT, but just taking the code example from Ryan Ingram (newtype ContState r s a = ...) and looking a bit at the code from ContT (from the transformers library), and after fixing some compilation errors, it worked and was so fast. I wonder why the transformers library does not use this kind of state monad definition. Or does it, and what I got is just because of the unrolling? Are there monad (transformers) libraries which are faster? I saw the library kan-extensions but I did not understand (yet) how to use it. Nicu ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe