I dug out the pre compute pattern optimized C version-  It is about twice
as slow as the J version.  OTOH the compute pattern optimized C version is
about 2-3 faster than the J version; it took some analysis and refactoring
to achieve this however and it would be nice to focus on the application
rather than the implementation.

On Tue, Apr 18, 2017 at 11:31 AM, Xiao-Yong Jin <[email protected]>
wrote:

>
> > On Apr 18, 2017, at 9:23 AM, Michael Goodrich <
> [email protected]> wrote:
> >
> > Hi Henry,
> >
> > Thanks for your interest.  I owe you some better information.
> >
> > First off its not really an apples-apples comparison as the C version is
> > very mature with some performance tricks designed to reduce calculations
> to
> > the bare minimum (e.g., do not recalculate matrices but instead do
> selected
> > in place updates as necessary).  This gave me an 5-6X speed improvement
> in
> > the C version.
>
> If you are only optimizing away unnecessary computations in C, your code
> will not
> really beat a well written J code.  There are still a lot more you can do
> in C.
>
> >  When I attempted to put same in the J code it ran SLOWER
> > than simply recalculating entire matrix although in many cases only a
> > column was actually updated, so i backed them out.
>
> J does reasonable inplace (avoiding allocating and copying) updates.
> Look them up and see if you can better employ those.  In an interpreted
> mostly functional language like J, you want to minimize reallocating arrays
> and moving whole array.  Too much copying hurts more than extra floating
> points operations.
>
> > This is a Markov Chain Monte Carlo Bayesian Artificial Neural Network
> > (Three Layer Perceptron) application that in the test problem produces
> > about 1e5 chain states (not saving them but streaming them to another C
> > prog) using a half dozen matrices the largest of which (for this test
> > problem) is about 200x5
>
> It's really small and fits in L1 cache.  Naive C loops would have no
> problem.
> If you go larger than L2 cache, you will need to call dgemm or any cache
> friendly block multiplication algorithm for better performance.
>
> >
> > Another curiosity is that in the C version using a (user defined) sigmoid
> > vice 'tanh'  as the non linear activation (on all matrix elements)
> expands
> > the run time by 1.75X.  In the J version the same choice *reduces* run
> time
> > by about 20% over the 'tanh' J primitive (?).
> >
> > sgmd =. monad : '1%(1+^-y)'
>
> In any interpreted language, primitives are always going to outperform
> composite
> functions.
>
> > As far as releasing code, this is the outgrowth of my dissertation work
> and
> > I may hope someday to commercialize and so I reluctant to release it.
> Pity
> > - I know you cant be sure what I doing to in order to diagnose the
> > situation, but perhaps we can find a way to accomplish what you want any
> > way by pursuing this together.
>
> I don't see a good MCMC code in J, so I'm also developing my own.  Perhaps
> we
> cal share that part at some point so you don't have to give away your baby
> neural network.
>
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
>



-- 
Dominus enim Iesus Ecclesiae Suae et,

-Michael


"Let your religion be less of a theory and more of a love affair."
                                             - G.K. Chesterton
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to