Any reason why you can’t use built in linear regression e.g. 
http://spark.apache.org/docs/latest/ml-classification-regression.html#regression
 or 
http://spark.apache.org/docs/latest/mllib-linear-methods.html#linear-least-squares-lasso-and-ridge-regression?

-------------------------------------------------------------------------------
Robin East
Spark GraphX in Action Michael Malak and Robin East
Manning Publications Co.
http://www.manning.com/books/spark-graphx-in-action 
<http://www.manning.com/books/spark-graphx-in-action>





> On 3 Nov 2016, at 16:08, im281 [via Apache Spark User List] 
> <ml-node+s1001560n28006...@n3.nabble.com> wrote:
> 
> I want to solve the linear regression problem using spark with huge 
> martrices: 
> 
> Ax = b 
> using least squares: 
> x = Inverse(A-transpose) * A)*A-transpose *b 
> 
> The A matrix is a large sparse matrix (as is the b vector). 
> 
> I have pondered several solutions to the Ax = b problem including: 
> 
> 1) directly solving the problem above where the matrix is transposed, 
> multiplied by itself, the inverse is taken and then multiplied by A-transpose 
> and then multiplied by b which will give the solution vector x 
> 
> 2) iterative solver (no need to take the inverse) 
> 
> My question is:
> 
> What is the best way to solve this problem using the MLib libraries, in JAVA 
> and using RDD and spark? 
> 
> Is there any code as an example? Has anyone done this? 
> 
> 
> 
> 
> 
> The code to take in data represented as a coordinate matrix and perform 
> transposition and multiplication is shown below but I need to take the 
> inverse if I use this strategy: 
> 
> //Read coordinate matrix from text or database 
>                 JavaRDD<String> fileA = sc.textFile(file); 
> 
>                 //map text file with coordinate data (sparse matrix) to 
> JavaRDD<MatrixEntry>
>                 JavaRDD<MatrixEntry> matrixA = fileA.map(new Function<String, 
> MatrixEntry>() { 
>                     public MatrixEntry call(String x){ 
>                         String[] indeceValue = x.split(","); 
>                         long i = Long.parseLong(indeceValue[0]); 
>                         long j = Long.parseLong(indeceValue[1]); 
>                         double value = Double.parseDouble(indeceValue[2]); 
>                         return new MatrixEntry(i, j, value ); 
>                     } 
>                 }); 
>                 
>                 //coordinate matrix from sparse data 
>                 CoordinateMatrix cooMatrixA = new 
> CoordinateMatrix(matrixA.rdd()); 
>                 
>                 //create block matrix 
>                 BlockMatrix matA = cooMatrixA.toBlockMatrix(); 
>                 
>                 //create block matrix after matrix multiplication (square 
> matrix) 
>                 BlockMatrix ata = matA.transpose().multiply(matA); 
>                 
>                 //print out the original dense matrix 
>                 System.out.println(matA.toLocalMatrix().toString()); 
>                 
>                 //print out the transpose of the dense matrix 
>                 
> System.out.println(matA.transpose().toLocalMatrix().toString()); 
>                 
>                 //print out the square matrix (after multiplication) 
>                 System.out.println(ata.toLocalMatrix().toString()); 
>                 
>                 JavaRDD<MatrixEntry> entries = 
> ata.toCoordinateMatrix().entries().toJavaRDD(); 
> 
> 
> 
> If you reply to this email, your message will be added to the discussion 
> below:
> http://apache-spark-user-list.1001560.n3.nabble.com/mLIb-solving-linear-regression-with-sparse-inputs-tp28006.html
>  
> <http://apache-spark-user-list.1001560.n3.nabble.com/mLIb-solving-linear-regression-with-sparse-inputs-tp28006.html>
> To start a new topic under Apache Spark User List, email 
> ml-node+s1001560n1...@n3.nabble.com 
> To unsubscribe from Apache Spark User List, click here 
> <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=1&code=Um9iaW4uZWFzdEB4ZW5zZS5jby51a3wxfDIzMzQzMDUyNg==>.
> NAML 
> <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>




-----
Robin East 
Spark GraphX in Action Michael Malak and Robin East 
Manning Publications Co. 
http://www.manning.com/books/spark-graphx-in-action

--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/mLIb-solving-linear-regression-with-sparse-inputs-tp28006p28007.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to