This looks really interesting. My first look at this was mp_bench.ijs example. This produces (on my iMac):
┌────────┬────────┐ │ step │ millis │ ├────────┼────────┤ │mp │3260 │ ├────────┼────────┤ │acreate │708 │ ├────────┼────────┤ │bcreate │474 │ ├────────┼────────┤ │matmul │0 │ ├────────┼────────┤ │sync │3252 │ ├────────┼────────┤ │get │432 │ ├────────┼────────┤ │aftot │4867 │ ├────────┼────────┤ │mp%aftot│0.669918│ └────────┴────────┘ That’s certainly not flattering for Arrayfire. My take on this goes like this. acreate and bcreate take significant time getting their arguments transposed. But once this is done, the matrix multiplication just zips by. Then comes the synchronization step. A full 3 seconds. Is this elapsed time? Surely not actual resources used time? (Maybe my Mac was doing a backup when I set this in motion.) And then get to undo the transposed matrix. Some significant time there. What if we sent regular J data (row major) to Arrayfire and used the af_transpose function within Arrayfire to change columns to rows. Do the af_matmul and then finish up with another af_transpose on the result. This might be a lot quicker. Can someone expand on the timing of af_sync. If I were planning a machine learning (“ML”) example from J, I think it would end up as an initial passing of data in, followed by a good deal of af_ processing, finishing with an af_sync and a return of results to J. Simply trying to maximize the time spent with the fastest tool available. And that means expressing your ML logic in array terms. J should be good at that. Mike Powell ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm