Mahout is considerably better at sparse operations and optimizations than
dense ones.

Beyond that, I would expect that you would do better with traditional math
libraries.

And, are you really trying to invert a matrix? The common maxim is that
this implies an error in your method because inversion is O(n^3) and often
ill-conditioned to boot.  Usually, an implicit form of inversion via a
decompositional representation is far better than a true inversion. For
large systems the situation is even more stark, numerical accuracy
limitations and noise in the original data make it impossible to do better
than an approximate AND implicit inverse such as a limited rank SVD.




On Thu, May 5, 2016 at 1:56 PM, Dmitriy Lyubimov <dlie...@gmail.com> wrote:

> BTW, Thibaut, in the paper you mention, MPI based implementation beats
> Spark at least 2 times on performance of the inversion. Kinda what i was
> saying -- and in this case it doesn't seem that algorithm is as highly
> interconnected as, e.g., naive blockwise multiplication.
>
> On Thu, May 5, 2016 at 1:50 PM, Dmitriy Lyubimov <dlie...@gmail.com>
> wrote:
>
> > The mantra i keep hearing is that if someone needs matrix inversion then
> > he/she must be doing something wrong. Not sure how true that is, but in
> all
> > cases i have encountered, people try to avoid matrix inversion one way or
> > another.
> >
> > Re: libraries: Mahout is more about apis now than any particular in-core
> > library. Unfortunately, mahout's in-memory operations are rooted in
> > single-threaded colt and are pretty slow at the moment. We are looking
> for
> > ways of doing in-memory operations faster and integrating something
> better
> > and native.
> >
> > However, the really limiting factor seems to be Spark programming model
> > and the effects it brings to interconnected I/O problems with high degree
> > of scattering. Cf. , for example, to performances you can get with MKL
> MPI
> > wrapper. If you are looking for performance of distributed algebra on
> CPUs,
> > there's very few things that can compete with MKL MPI wrapper.
> >
> > My personal opinion is that for as long as the problem fits in memory
> (and
> > most of them do nowadays), no algorithm on spark is going to beat Matlab
> in
> > matrix multiplication and such, all things being equal, no matter how
> many
> > cores spark cluster gets, on 1gbit networks. The same seems to be 10-fold
> > true when comparing to GPU based algorithms (case in point: BidMach).
> >
> > On Thu, May 5, 2016 at 12:45 PM, thibaut <thibaut.gensol...@gmail.com>
> > wrote:
> >
> >>
> >> My askings are:
> >> - Is it better for what we want to do to use Mahout, or Spark ?
> >>
> >
> > Mahout at this point is better for declarative prototyping as it contains
> > distributed optimizer and compact expression dsl.
> >
> > - I saw that you already have a distributed PCA. Do you have a really
> >> efficient matrix inversion algorithm in Mahout ?
> >>
> > PCA underpinnings are described in detail in the "AM:Beyond MapReduce"
> > book.
> >
> >> - How good is the linear algebra library in compare to Matlab for
> example
> >> ?
> >>
> > See my opinion above about algorithms on spark. Yes, i did some
> > benchmarking and digging around. Some things could be on-par, but
> > interconnected things are decidedly worse than single node Matlab (in
> terms
> > of speed).
> >
> >>
> >> Finally, our main concern for using Spark is about the linear algebra
> >> library that is used with Spark. And we were wondering how good is the
> >> Mahout one ?
> >
> > What do you mean specifically? Speed? As i said, the in-core speed is
> what
> > one can expect from java based implementation, but in-core speed factor
> > seems to be far overshadowed by I/O programming model issues in highly
> > interconnected problems once certain size of the problem is reached.
> >
> >>
> >>
> > Thanking you in advance,
> >>
> >> Best regards.
> >> Thibaut
> >
> >
> >
>

Reply via email to