This is probably related to openblas, but it seems to be that tanh() is not 
multi-threaded, which hinders a considerable speed improvement.
For example, MATLAB does multi-thread it and gets something around 3x 
speed-up over the single-threaded version.

For example,

  x = rand(100000,200);
  @time y = tanh(x);

yields:
  - 0.71 sec in Julia
  - 0.76 sec in matlab with -singleCompThread
  - and 0.09 sec in Matlab (this one uses multi-threading by default)

Good news is that julia (w/openblas) is competitive with matlab 
single-threaded version,
though setting the env variable OPENBLAS_NUM_THREADS doesn't have any 
effect on the timings, nor I see higher CPU usage with 'top'.

Is there an override for OPENBLAS_NUM_THREADS in julia? what am I missing?

Reply via email to