I dug out the pre compute pattern optimized C version- It is about twice as slow as the J version. OTOH the compute pattern optimized C version is about 2-3 faster than the J version; it took some analysis and refactoring to achieve this however and it would be nice to focus on the application rather than the implementation.
On Tue, Apr 18, 2017 at 11:31 AM, Xiao-Yong Jin <[email protected]> wrote: > > > On Apr 18, 2017, at 9:23 AM, Michael Goodrich < > [email protected]> wrote: > > > > Hi Henry, > > > > Thanks for your interest. I owe you some better information. > > > > First off its not really an apples-apples comparison as the C version is > > very mature with some performance tricks designed to reduce calculations > to > > the bare minimum (e.g., do not recalculate matrices but instead do > selected > > in place updates as necessary). This gave me an 5-6X speed improvement > in > > the C version. > > If you are only optimizing away unnecessary computations in C, your code > will not > really beat a well written J code. There are still a lot more you can do > in C. > > > When I attempted to put same in the J code it ran SLOWER > > than simply recalculating entire matrix although in many cases only a > > column was actually updated, so i backed them out. > > J does reasonable inplace (avoiding allocating and copying) updates. > Look them up and see if you can better employ those. In an interpreted > mostly functional language like J, you want to minimize reallocating arrays > and moving whole array. Too much copying hurts more than extra floating > points operations. > > > This is a Markov Chain Monte Carlo Bayesian Artificial Neural Network > > (Three Layer Perceptron) application that in the test problem produces > > about 1e5 chain states (not saving them but streaming them to another C > > prog) using a half dozen matrices the largest of which (for this test > > problem) is about 200x5 > > It's really small and fits in L1 cache. Naive C loops would have no > problem. > If you go larger than L2 cache, you will need to call dgemm or any cache > friendly block multiplication algorithm for better performance. > > > > > Another curiosity is that in the C version using a (user defined) sigmoid > > vice 'tanh' as the non linear activation (on all matrix elements) > expands > > the run time by 1.75X. In the J version the same choice *reduces* run > time > > by about 20% over the 'tanh' J primitive (?). > > > > sgmd =. monad : '1%(1+^-y)' > > In any interpreted language, primitives are always going to outperform > composite > functions. > > > As far as releasing code, this is the outgrowth of my dissertation work > and > > I may hope someday to commercialize and so I reluctant to release it. > Pity > > - I know you cant be sure what I doing to in order to diagnose the > > situation, but perhaps we can find a way to accomplish what you want any > > way by pursuing this together. > > I don't see a good MCMC code in J, so I'm also developing my own. Perhaps > we > cal share that part at some point so you don't have to give away your baby > neural network. > > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm > -- Dominus enim Iesus Ecclesiae Suae et, -Michael "Let your religion be less of a theory and more of a love affair." - G.K. Chesterton ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
