Improvement coming from avx is not that important, the major improvement in inner product is the algorithm had been tuned to keep cache hot. Using sse2 can also achieve similar improvement factor when using the new codes.
blas is super optimized, it might have running in multiple threads, so that on my 4 cores cpu, it runs 4x faster, still not sure from where the rest 5x improvement comes. offloading to gpu in i7? On 21 Apr, 2017 11:37 pm, "Xiao-Yong Jin" <[email protected]> wrote: Thanks. It's interesting to see how far the avx has come along. And the |: takes more than 20% of the time? I guess this is something that could be improved. > On Apr 21, 2017, at 7:42 AM, bill lam <[email protected]> wrote: > > Opp, output from dgemm should be transposed to row major. > > dgemm=: 'liblapack.so.3 dgemm_ > n *c *c *i *i *i *d *d *i *d *i *d *d *i'&cd > > mm=: 4 : 0 > k=. ,{.$x > c=. (k,k)$1.5-1.5 > dgemm (,'T');(,'T');k;k;k;(,2.5-1.5);x;k;y;k;(,1.5-1.5);c;k > |:c > ) > > 'A B'=:0?@$~2,,~4096 > echo timespacex'c1=: A+/ .*B' > echo timespacex'c2=: A mm B' > echo c1-:c2 > > NB. avx > load'dgemm.ijs' > 19.4683 2.68438e8 > 1.11488 5.36873e8 > 1 > > NB. j602 > 167.99789 2.684384e8 > 1.224063 5.369056e8 > 1 > > j806 version is already quite good. > > Пт, 21 апр 2017, bill lam написал(а): >> I tested with J calling lapack for matrix multiplication with the >> following script, >> >> NB. extern dgemm_(char * transa, char * transb, int * m, int * n, int * k, >> NB. double * alpha, double * A, int * lda, >> NB. double * B, int * ldb, double * beta, >> NB. double * C, int * ldc); >> >> dgemm=: 'liblapack.so.3 dgemm_ > n *c *c *i *i *i *d *d *i *d *i *d *d *i'&cd >> >> mm=: 4 : 0 >> k=. ,{.$x >> c=. (k,k)$1.5-1.5 >> dgemm (,'T');(,'T');k;k;k;(,2.5-1.5);x;k;y;k;(,1.5-1.5);c;k >> c >> ) >> >> 'A B'=:0?@$~2,,~4096 >> echo timespacex'A+/ .*B' >> echo timespacex'A mm B' >> >> result was, >> 19.3608 2.68437e8 >> 0.886447 2.68442e8 >> >> Note it need to use an optimized version of blas, not the >> reference blas. >> >> Apparently the blas used in julia is sub-optimal. >> >> Вт, 18 апр 2017, bill lam написал(а): >>> I think julia just calls blas. >>> >>> Пн, 17 апр 2017, Xiao-Yong Jin написал(а): >>>> >>>>> On Apr 17, 2017, at 9:26 PM, Henry Rich <[email protected]> wrote: >>>>> >>>>> If you have an implementation of +/ . * on double-precision floats that's faster than J 8.06, I would be obliged if you'd send me a copy of the source code. >>>> >>>> I'm sure your code is much faster than naive c loops. But some how the matrix-matrix multiplication is much slower (10x) than that in julia (tested with a 3-year old version). >>>> >>>> % julia >>>> _ >>>> _ _ _(_)_ | A fresh approach to technical computing >>>> (_) | (_) (_) | Documentation: http://docs.julialang.org >>>> _ _ _| |_ __ _ | Type "help()" to list help topics >>>> | | | | | | |/ _` | | >>>> | | |_| | | | (_| | | Version 0.2.1 (2014-02-11 06:30 UTC) >>>> _/ |\__'_|_|_|\__'_| | >>>> |__/ | x86_64-linux-gnu >>>> >>>> julia> A=rand(4096,4096); B=rand(4096,4096); >>>> >>>> julia> @time A*B; >>>> elapsed time: 2.260157127 seconds (149184640 bytes allocated) >>>> >>>> julia> >>>> % jconsole >>>> JVERSION >>>> Engine: j806/j64avx/linux >>>> Beta-3: commercial/2017-04-10T17:51:14 >>>> Library: 8.06.02 >>>> Platform: Linux 64 >>>> Installer: J806 install >>>> InstallPath: /nfs2/xjin/pkgs/j64-806 >>>> Contact: www.jsoftware.com >>>> 'A B'=:0?@$~2,,~4096 >>>> timespacex'A+/ .*B' >>>> 23.8976 2.68437e8 >>>> timespacex'A+/ .*B' >>>> >>>> ---------------------------------------------------------------------- >>>> For information about J forums see http://www.jsoftware.com/forums.htm >>> >>> -- >>> regards, >>> ==================================================== >>> GPG key 1024D/4434BAB3 2008-08-24 >>> gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3 >>> gpg --keyserver subkeys.pgp.net --armor --export 4434BAB3 >> >> -- >> regards, >> ==================================================== >> GPG key 1024D/4434BAB3 2008-08-24 >> gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3 >> gpg --keyserver subkeys.pgp.net --armor --export 4434BAB3 > > -- > regards, > ==================================================== > GPG key 1024D/4434BAB3 2008-08-24 > gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3 > gpg --keyserver subkeys.pgp.net --armor --export 4434BAB3 > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
