Any reason why you can’t use built in linear regression e.g. http://spark.apache.org/docs/latest/ml-classification-regression.html#regression or http://spark.apache.org/docs/latest/mllib-linear-methods.html#linear-least-squares-lasso-and-ridge-regression?
------------------------------------------------------------------------------- Robin East Spark GraphX in Action Michael Malak and Robin East Manning Publications Co. http://www.manning.com/books/spark-graphx-in-action <http://www.manning.com/books/spark-graphx-in-action> > On 3 Nov 2016, at 16:08, im281 [via Apache Spark User List] > <ml-node+s1001560n28006...@n3.nabble.com> wrote: > > I want to solve the linear regression problem using spark with huge > martrices: > > Ax = b > using least squares: > x = Inverse(A-transpose) * A)*A-transpose *b > > The A matrix is a large sparse matrix (as is the b vector). > > I have pondered several solutions to the Ax = b problem including: > > 1) directly solving the problem above where the matrix is transposed, > multiplied by itself, the inverse is taken and then multiplied by A-transpose > and then multiplied by b which will give the solution vector x > > 2) iterative solver (no need to take the inverse) > > My question is: > > What is the best way to solve this problem using the MLib libraries, in JAVA > and using RDD and spark? > > Is there any code as an example? Has anyone done this? > > > > > > The code to take in data represented as a coordinate matrix and perform > transposition and multiplication is shown below but I need to take the > inverse if I use this strategy: > > //Read coordinate matrix from text or database > JavaRDD<String> fileA = sc.textFile(file); > > //map text file with coordinate data (sparse matrix) to > JavaRDD<MatrixEntry> > JavaRDD<MatrixEntry> matrixA = fileA.map(new Function<String, > MatrixEntry>() { > public MatrixEntry call(String x){ > String[] indeceValue = x.split(","); > long i = Long.parseLong(indeceValue[0]); > long j = Long.parseLong(indeceValue[1]); > double value = Double.parseDouble(indeceValue[2]); > return new MatrixEntry(i, j, value ); > } > }); > > //coordinate matrix from sparse data > CoordinateMatrix cooMatrixA = new > CoordinateMatrix(matrixA.rdd()); > > //create block matrix > BlockMatrix matA = cooMatrixA.toBlockMatrix(); > > //create block matrix after matrix multiplication (square > matrix) > BlockMatrix ata = matA.transpose().multiply(matA); > > //print out the original dense matrix > System.out.println(matA.toLocalMatrix().toString()); > > //print out the transpose of the dense matrix > > System.out.println(matA.transpose().toLocalMatrix().toString()); > > //print out the square matrix (after multiplication) > System.out.println(ata.toLocalMatrix().toString()); > > JavaRDD<MatrixEntry> entries = > ata.toCoordinateMatrix().entries().toJavaRDD(); > > > > If you reply to this email, your message will be added to the discussion > below: > http://apache-spark-user-list.1001560.n3.nabble.com/mLIb-solving-linear-regression-with-sparse-inputs-tp28006.html > > <http://apache-spark-user-list.1001560.n3.nabble.com/mLIb-solving-linear-regression-with-sparse-inputs-tp28006.html> > To start a new topic under Apache Spark User List, email > ml-node+s1001560n1...@n3.nabble.com > To unsubscribe from Apache Spark User List, click here > <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=1&code=Um9iaW4uZWFzdEB4ZW5zZS5jby51a3wxfDIzMzQzMDUyNg==>. > NAML > <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml> ----- Robin East Spark GraphX in Action Michael Malak and Robin East Manning Publications Co. http://www.manning.com/books/spark-graphx-in-action -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/mLIb-solving-linear-regression-with-sparse-inputs-tp28006p28007.html Sent from the Apache Spark User List mailing list archive at Nabble.com.