Hi there, I am working in a group of research of the Michigan University (Mathematics), and we are thinking to increase the speed of some algorithms we are using and developed here, by using distributed systems.
We were thinking about using Spark, but I found recently Mahout and I read about it. We are using a lot KNN and Minimal Spanning Tree here, and our main concern is about dealing with the inversion of Matrix (really really big matrix) I found this paper : https://web.njit.edu/~ansari/papers/16IEEEAccess.pdf <https://web.njit.edu/~ansari/papers/16IEEEAccess.pdf> , Spark-based Large-scale Matrix Inversion for Big Data Processing, which provides a really good method for dealing with the inversion issue. My askings are: - Is it better for what we want to do to use Mahout, or Spark ? - I saw that you already have a distributed PCA. Do you have a really efficient matrix inversion algorithm in Mahout ? - How good is the linear algebra library in compare to Matlab for example ? Finally, our main concern for using Spark is about the linear algebra library that is used with Spark. And we were wondering how good is the Mahout one ? Thanking you in advance, Best regards. Thibaut