Hi Andrew,

Le 23/06/2015 19:08, Andrew E. Davidson a écrit :
> sorry if this has been asked many times before. (maybe this can be
> added to the FAQ?)
> 
> has anyone done any bench marking?

Yes.

> 
> The idea of having a math package that is implemented pure java is
> very attractive. My experience with machine learning is that java is
> very slow. To go fast you need to take advantage of assembler or
> libraries written in fortran or C. For example http://jblas.org/
> <http://jblas.org/>

It is not that simple, and in some case, it can be slower ...

I did not find the benchmark I presented at several symposium in
2010, but here are some rough results.

The tests were done on the QR decomposition plus solving of an
A.X = B linear problem, with dense matrices. I did it for dimensions
up to 4000x4000 if I remember well. The benchmark was made using the
same underlying algorithm (but obvisouly different implementations).

The results were, in increasing performance :

  - Numerical Recipes in fortran, non-optimized
  - Numerical Recipes in fortran, optimized
  - LAPACK with ATLAS as a BLAS implementation
    (almost no difference in non-optimized or optimized)
  - Apache Commons Math !

Well we were only about 2% faster than LAPACK, and it was on only
one algorithm type, on my machine. I was happy and in fact surprised,
I did not expect we could reach LAPACK performances. A more realistic
result is to look also for other algorithms.

Answering your question for the general case, is however more
difficult, and here I don't have real benchmarks, only some general
feelings. I would say that accross different domains, the speed
differences that can be observed are typically a factor 1.5 or 2 (Java
being slower), which is a significant difference but clearly not as
important as most people think. In fact, there are many factors other
than language that are also in this domain of 1.5 or 2.

The lessons I learnt here are *not* that we are faster (for most
operation, I am sure we are slower), but rather than language is
only one factor for speed. Change the algorithm and you change the
speed. Change the compiler and you change the speed. Change the
optimizer and you change the speed. Change the human developer and
you change the speed, change your computer for one that is only a
few months more recent and you change the speed ...

Attempting to use Java-fortran native interface to get speed is
almost always a bad idea. The reason is that the layer between
the languages is difficult to go through and really slow. You
will spend much of the type in this layer rather than in real
processing code. This is especially true for matrices because
double[][] are not packed as a lot of double numbers in some
specified order after an initial pointer, you often have to
copy between Java arrays (which are objects) to C or fortran
arrays and you lose a lot of time doing copies.

Java-fortran native interface is useful, but not for speed
considerations. From my experience, it is more useful for
interfacing with libraries that have only one implementation
and that you cannot afford to port (because they are huge,
because they are highly domain specific and nobody else use
them, because they have been validated and you cannot take
the risk to introduce a bug by porting them, because you don't
have the time, or because you don't have the money).

In my domain (space systems), we use extensively Apache Commons
Math and some upper level Java libraries and since a few years
we have replaced many older fortran and C libraries. In all cases,
we are either as fast or much faster. This is mainly because
when developping these replacements libraries, we have chosen
different architectures, used newer algorithms, used different
trade-offs between memory and processing than what was available
to engineers 20 or 30 years ago. For sure, if they were to develop
again their libraries in fortran by now, they will also improve
their results. So what is important is what you can achieve at
present time, using present algorithms and present languages.

If your work is really focused on linear algebra, there are
other Java libraries that are faster than Apache Commons Math
for this specific domain (some use native interface, some don't).
Linear algebra is one of our weak points. Apache Commons Math
is a library with a broad coverage, not a specialized one for
linear algebra only.

So as a summary, yes there have been some benchmarks. Yes
Java can be fast (and it can also be slow depending on how
well it is developed, just like all other languages).

best regards,
Luc

> 
> 
> Kind Regards
> 
> Andy
> 
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@commons.apache.org
For additional commands, e-mail: user-h...@commons.apache.org

Reply via email to