BTW, Thibaut, in the paper you mention, MPI based implementation beats Spark at least 2 times on performance of the inversion. Kinda what i was saying -- and in this case it doesn't seem that algorithm is as highly interconnected as, e.g., naive blockwise multiplication.
On Thu, May 5, 2016 at 1:50 PM, Dmitriy Lyubimov <dlie...@gmail.com> wrote: > The mantra i keep hearing is that if someone needs matrix inversion then > he/she must be doing something wrong. Not sure how true that is, but in all > cases i have encountered, people try to avoid matrix inversion one way or > another. > > Re: libraries: Mahout is more about apis now than any particular in-core > library. Unfortunately, mahout's in-memory operations are rooted in > single-threaded colt and are pretty slow at the moment. We are looking for > ways of doing in-memory operations faster and integrating something better > and native. > > However, the really limiting factor seems to be Spark programming model > and the effects it brings to interconnected I/O problems with high degree > of scattering. Cf. , for example, to performances you can get with MKL MPI > wrapper. If you are looking for performance of distributed algebra on CPUs, > there's very few things that can compete with MKL MPI wrapper. > > My personal opinion is that for as long as the problem fits in memory (and > most of them do nowadays), no algorithm on spark is going to beat Matlab in > matrix multiplication and such, all things being equal, no matter how many > cores spark cluster gets, on 1gbit networks. The same seems to be 10-fold > true when comparing to GPU based algorithms (case in point: BidMach). > > On Thu, May 5, 2016 at 12:45 PM, thibaut <thibaut.gensol...@gmail.com> > wrote: > >> >> My askings are: >> - Is it better for what we want to do to use Mahout, or Spark ? >> > > Mahout at this point is better for declarative prototyping as it contains > distributed optimizer and compact expression dsl. > > - I saw that you already have a distributed PCA. Do you have a really >> efficient matrix inversion algorithm in Mahout ? >> > PCA underpinnings are described in detail in the "AM:Beyond MapReduce" > book. > >> - How good is the linear algebra library in compare to Matlab for example >> ? >> > See my opinion above about algorithms on spark. Yes, i did some > benchmarking and digging around. Some things could be on-par, but > interconnected things are decidedly worse than single node Matlab (in terms > of speed). > >> >> Finally, our main concern for using Spark is about the linear algebra >> library that is used with Spark. And we were wondering how good is the >> Mahout one ? > > What do you mean specifically? Speed? As i said, the in-core speed is what > one can expect from java based implementation, but in-core speed factor > seems to be far overshadowed by I/O programming model issues in highly > interconnected problems once certain size of the problem is reached. > >> >> > Thanking you in advance, >> >> Best regards. >> Thibaut > > >