Re: Solve least square problem of the form min norm(A x - b)^2^ + lambda * n * norm(x)^2 ?
Do you have a small test case that can reproduce the out of memory error ? I have also seen some errors on large scale experiments but haven't managed to narrow it down. Thanks Shivaram On Fri, Mar 13, 2015 at 6:20 AM, Jaonary Rabarisoa wrote: > It runs faster but there is some drawbacks. It seems to consume more > memory. I get java.lang.OutOfMemoryError: Java heap space error if I don't > have a sufficient partitions for a fixed amount of memory. With the older > (ampcamp) implementation for the same data size I didn't get it. > > On Thu, Mar 12, 2015 at 11:36 PM, Shivaram Venkataraman < > shiva...@eecs.berkeley.edu> wrote: > >> >> On Thu, Mar 12, 2015 at 3:05 PM, Jaonary Rabarisoa >> wrote: >> >>> In fact, by activating netlib with native libraries it goes faster. >>> >>> Glad you got it work ! Better performance was one of the reasons we made >> the switch. >> >>> Thanks >>> >>> On Tue, Mar 10, 2015 at 7:03 PM, Shivaram Venkataraman < >>> shiva...@eecs.berkeley.edu> wrote: >>> There are a couple of differences between the ml-matrix implementation and the one used in AMPCamp - I think the AMPCamp one uses JBLAS which tends to ship native BLAS libraries along with it. In ml-matrix we switched to using Breeze + Netlib BLAS which is faster but needs some setup [1] to pick up native libraries. If native libraries are not found it falls back to a JVM implementation, so that might explain the slow down. - The other difference if you are comparing the whole image pipeline is that I think the AMPCamp version used NormalEquations which is around 2-3x faster (just in terms of number of flops) compared to TSQR. [1] https://github.com/fommil/netlib-java#machine-optimised-system-libraries Thanks Shivaram On Tue, Mar 10, 2015 at 9:57 AM, Jaonary Rabarisoa wrote: > I'm trying to play with the implementation of least square solver (Ax > = b) in mlmatrix.TSQR where A is a 5*1024 matrix and b a 5*10 > matrix. It works but I notice > that it's 8 times slower than the implementation given in the latest > ampcamp : > http://ampcamp.berkeley.edu/5/exercises/image-classification-with-pipelines.html > . As far as I know these two implementations come from the same basis. > What is the difference between these two codes ? > > > > > > On Tue, Mar 3, 2015 at 8:02 PM, Shivaram Venkataraman < > shiva...@eecs.berkeley.edu> wrote: > >> There are couple of solvers that I've written that is part of the >> AMPLab ml-matrix repo [1,2]. These aren't part of MLLib yet though and if >> you are interested in porting them I'd be happy to review it >> >> Thanks >> Shivaram >> >> >> [1] >> https://github.com/amplab/ml-matrix/blob/master/src/main/scala/edu/berkeley/cs/amplab/mlmatrix/TSQR.scala >> [2] >> https://github.com/amplab/ml-matrix/blob/master/src/main/scala/edu/berkeley/cs/amplab/mlmatrix/NormalEquations.scala >> >> On Tue, Mar 3, 2015 at 9:01 AM, Jaonary Rabarisoa >> wrote: >> >>> Dear all, >>> >>> Is there a least square solver based on DistributedMatrix that we >>> can use out of the box in the current (or the master) version of spark ? >>> It seems that the only least square solver available in spark is >>> private to recommender package. >>> >>> >>> Cheers, >>> >>> Jao >>> >> >> > >>> >> >
Re: Solve least square problem of the form min norm(A x - b)^2^ + lambda * n * norm(x)^2 ?
It runs faster but there is some drawbacks. It seems to consume more memory. I get java.lang.OutOfMemoryError: Java heap space error if I don't have a sufficient partitions for a fixed amount of memory. With the older (ampcamp) implementation for the same data size I didn't get it. On Thu, Mar 12, 2015 at 11:36 PM, Shivaram Venkataraman < shiva...@eecs.berkeley.edu> wrote: > > On Thu, Mar 12, 2015 at 3:05 PM, Jaonary Rabarisoa > wrote: > >> In fact, by activating netlib with native libraries it goes faster. >> >> Glad you got it work ! Better performance was one of the reasons we made > the switch. > >> Thanks >> >> On Tue, Mar 10, 2015 at 7:03 PM, Shivaram Venkataraman < >> shiva...@eecs.berkeley.edu> wrote: >> >>> There are a couple of differences between the ml-matrix implementation >>> and the one used in AMPCamp >>> >>> - I think the AMPCamp one uses JBLAS which tends to ship native BLAS >>> libraries along with it. In ml-matrix we switched to using Breeze + Netlib >>> BLAS which is faster but needs some setup [1] to pick up native libraries. >>> If native libraries are not found it falls back to a JVM implementation, so >>> that might explain the slow down. >>> >>> - The other difference if you are comparing the whole image pipeline is >>> that I think the AMPCamp version used NormalEquations which is around 2-3x >>> faster (just in terms of number of flops) compared to TSQR. >>> >>> [1] >>> https://github.com/fommil/netlib-java#machine-optimised-system-libraries >>> >>> Thanks >>> Shivaram >>> >>> On Tue, Mar 10, 2015 at 9:57 AM, Jaonary Rabarisoa >>> wrote: >>> I'm trying to play with the implementation of least square solver (Ax = b) in mlmatrix.TSQR where A is a 5*1024 matrix and b a 5*10 matrix. It works but I notice that it's 8 times slower than the implementation given in the latest ampcamp : http://ampcamp.berkeley.edu/5/exercises/image-classification-with-pipelines.html . As far as I know these two implementations come from the same basis. What is the difference between these two codes ? On Tue, Mar 3, 2015 at 8:02 PM, Shivaram Venkataraman < shiva...@eecs.berkeley.edu> wrote: > There are couple of solvers that I've written that is part of the > AMPLab ml-matrix repo [1,2]. These aren't part of MLLib yet though and if > you are interested in porting them I'd be happy to review it > > Thanks > Shivaram > > > [1] > https://github.com/amplab/ml-matrix/blob/master/src/main/scala/edu/berkeley/cs/amplab/mlmatrix/TSQR.scala > [2] > https://github.com/amplab/ml-matrix/blob/master/src/main/scala/edu/berkeley/cs/amplab/mlmatrix/NormalEquations.scala > > On Tue, Mar 3, 2015 at 9:01 AM, Jaonary Rabarisoa > wrote: > >> Dear all, >> >> Is there a least square solver based on DistributedMatrix that we can >> use out of the box in the current (or the master) version of spark ? >> It seems that the only least square solver available in spark is >> private to recommender package. >> >> >> Cheers, >> >> Jao >> > > >>> >> >
Re: Solve least square problem of the form min norm(A x - b)^2^ + lambda * n * norm(x)^2 ?
On Thu, Mar 12, 2015 at 3:05 PM, Jaonary Rabarisoa wrote: > In fact, by activating netlib with native libraries it goes faster. > > Glad you got it work ! Better performance was one of the reasons we made the switch. > Thanks > > On Tue, Mar 10, 2015 at 7:03 PM, Shivaram Venkataraman < > shiva...@eecs.berkeley.edu> wrote: > >> There are a couple of differences between the ml-matrix implementation >> and the one used in AMPCamp >> >> - I think the AMPCamp one uses JBLAS which tends to ship native BLAS >> libraries along with it. In ml-matrix we switched to using Breeze + Netlib >> BLAS which is faster but needs some setup [1] to pick up native libraries. >> If native libraries are not found it falls back to a JVM implementation, so >> that might explain the slow down. >> >> - The other difference if you are comparing the whole image pipeline is >> that I think the AMPCamp version used NormalEquations which is around 2-3x >> faster (just in terms of number of flops) compared to TSQR. >> >> [1] >> https://github.com/fommil/netlib-java#machine-optimised-system-libraries >> >> Thanks >> Shivaram >> >> On Tue, Mar 10, 2015 at 9:57 AM, Jaonary Rabarisoa >> wrote: >> >>> I'm trying to play with the implementation of least square solver (Ax = >>> b) in mlmatrix.TSQR where A is a 5*1024 matrix and b a 5*10 >>> matrix. It works but I notice >>> that it's 8 times slower than the implementation given in the latest >>> ampcamp : >>> http://ampcamp.berkeley.edu/5/exercises/image-classification-with-pipelines.html >>> . As far as I know these two implementations come from the same basis. >>> What is the difference between these two codes ? >>> >>> >>> >>> >>> >>> On Tue, Mar 3, 2015 at 8:02 PM, Shivaram Venkataraman < >>> shiva...@eecs.berkeley.edu> wrote: >>> There are couple of solvers that I've written that is part of the AMPLab ml-matrix repo [1,2]. These aren't part of MLLib yet though and if you are interested in porting them I'd be happy to review it Thanks Shivaram [1] https://github.com/amplab/ml-matrix/blob/master/src/main/scala/edu/berkeley/cs/amplab/mlmatrix/TSQR.scala [2] https://github.com/amplab/ml-matrix/blob/master/src/main/scala/edu/berkeley/cs/amplab/mlmatrix/NormalEquations.scala On Tue, Mar 3, 2015 at 9:01 AM, Jaonary Rabarisoa wrote: > Dear all, > > Is there a least square solver based on DistributedMatrix that we can > use out of the box in the current (or the master) version of spark ? > It seems that the only least square solver available in spark is > private to recommender package. > > > Cheers, > > Jao > >>> >> >
Re: Solve least square problem of the form min norm(A x - b)^2^ + lambda * n * norm(x)^2 ?
In fact, by activating netlib with native libraries it goes faster. Thanks On Tue, Mar 10, 2015 at 7:03 PM, Shivaram Venkataraman < shiva...@eecs.berkeley.edu> wrote: > There are a couple of differences between the ml-matrix implementation and > the one used in AMPCamp > > - I think the AMPCamp one uses JBLAS which tends to ship native BLAS > libraries along with it. In ml-matrix we switched to using Breeze + Netlib > BLAS which is faster but needs some setup [1] to pick up native libraries. > If native libraries are not found it falls back to a JVM implementation, so > that might explain the slow down. > > - The other difference if you are comparing the whole image pipeline is > that I think the AMPCamp version used NormalEquations which is around 2-3x > faster (just in terms of number of flops) compared to TSQR. > > [1] > https://github.com/fommil/netlib-java#machine-optimised-system-libraries > > Thanks > Shivaram > > On Tue, Mar 10, 2015 at 9:57 AM, Jaonary Rabarisoa > wrote: > >> I'm trying to play with the implementation of least square solver (Ax = >> b) in mlmatrix.TSQR where A is a 5*1024 matrix and b a 5*10 >> matrix. It works but I notice >> that it's 8 times slower than the implementation given in the latest >> ampcamp : >> http://ampcamp.berkeley.edu/5/exercises/image-classification-with-pipelines.html >> . As far as I know these two implementations come from the same basis. >> What is the difference between these two codes ? >> >> >> >> >> >> On Tue, Mar 3, 2015 at 8:02 PM, Shivaram Venkataraman < >> shiva...@eecs.berkeley.edu> wrote: >> >>> There are couple of solvers that I've written that is part of the AMPLab >>> ml-matrix repo [1,2]. These aren't part of MLLib yet though and if you are >>> interested in porting them I'd be happy to review it >>> >>> Thanks >>> Shivaram >>> >>> >>> [1] >>> https://github.com/amplab/ml-matrix/blob/master/src/main/scala/edu/berkeley/cs/amplab/mlmatrix/TSQR.scala >>> [2] >>> https://github.com/amplab/ml-matrix/blob/master/src/main/scala/edu/berkeley/cs/amplab/mlmatrix/NormalEquations.scala >>> >>> On Tue, Mar 3, 2015 at 9:01 AM, Jaonary Rabarisoa >>> wrote: >>> Dear all, Is there a least square solver based on DistributedMatrix that we can use out of the box in the current (or the master) version of spark ? It seems that the only least square solver available in spark is private to recommender package. Cheers, Jao >>> >>> >> >
Re: Solve least square problem of the form min norm(A x - b)^2^ + lambda * n * norm(x)^2 ?
There are a couple of differences between the ml-matrix implementation and the one used in AMPCamp - I think the AMPCamp one uses JBLAS which tends to ship native BLAS libraries along with it. In ml-matrix we switched to using Breeze + Netlib BLAS which is faster but needs some setup [1] to pick up native libraries. If native libraries are not found it falls back to a JVM implementation, so that might explain the slow down. - The other difference if you are comparing the whole image pipeline is that I think the AMPCamp version used NormalEquations which is around 2-3x faster (just in terms of number of flops) compared to TSQR. [1] https://github.com/fommil/netlib-java#machine-optimised-system-libraries Thanks Shivaram On Tue, Mar 10, 2015 at 9:57 AM, Jaonary Rabarisoa wrote: > I'm trying to play with the implementation of least square solver (Ax = b) > in mlmatrix.TSQR where A is a 5*1024 matrix and b a 5*10 matrix. > It works but I notice > that it's 8 times slower than the implementation given in the latest > ampcamp : > http://ampcamp.berkeley.edu/5/exercises/image-classification-with-pipelines.html > . As far as I know these two implementations come from the same basis. > What is the difference between these two codes ? > > > > > > On Tue, Mar 3, 2015 at 8:02 PM, Shivaram Venkataraman < > shiva...@eecs.berkeley.edu> wrote: > >> There are couple of solvers that I've written that is part of the AMPLab >> ml-matrix repo [1,2]. These aren't part of MLLib yet though and if you are >> interested in porting them I'd be happy to review it >> >> Thanks >> Shivaram >> >> >> [1] >> https://github.com/amplab/ml-matrix/blob/master/src/main/scala/edu/berkeley/cs/amplab/mlmatrix/TSQR.scala >> [2] >> https://github.com/amplab/ml-matrix/blob/master/src/main/scala/edu/berkeley/cs/amplab/mlmatrix/NormalEquations.scala >> >> On Tue, Mar 3, 2015 at 9:01 AM, Jaonary Rabarisoa >> wrote: >> >>> Dear all, >>> >>> Is there a least square solver based on DistributedMatrix that we can >>> use out of the box in the current (or the master) version of spark ? >>> It seems that the only least square solver available in spark is private >>> to recommender package. >>> >>> >>> Cheers, >>> >>> Jao >>> >> >> >
Re: Solve least square problem of the form min norm(A x - b)^2^ + lambda * n * norm(x)^2 ?
I'm trying to play with the implementation of least square solver (Ax = b) in mlmatrix.TSQR where A is a 5*1024 matrix and b a 5*10 matrix. It works but I notice that it's 8 times slower than the implementation given in the latest ampcamp : http://ampcamp.berkeley.edu/5/exercises/image-classification-with-pipelines.html . As far as I know these two implementations come from the same basis. What is the difference between these two codes ? On Tue, Mar 3, 2015 at 8:02 PM, Shivaram Venkataraman < shiva...@eecs.berkeley.edu> wrote: > There are couple of solvers that I've written that is part of the AMPLab > ml-matrix repo [1,2]. These aren't part of MLLib yet though and if you are > interested in porting them I'd be happy to review it > > Thanks > Shivaram > > > [1] > https://github.com/amplab/ml-matrix/blob/master/src/main/scala/edu/berkeley/cs/amplab/mlmatrix/TSQR.scala > [2] > https://github.com/amplab/ml-matrix/blob/master/src/main/scala/edu/berkeley/cs/amplab/mlmatrix/NormalEquations.scala > > On Tue, Mar 3, 2015 at 9:01 AM, Jaonary Rabarisoa > wrote: > >> Dear all, >> >> Is there a least square solver based on DistributedMatrix that we can use >> out of the box in the current (or the master) version of spark ? >> It seems that the only least square solver available in spark is private >> to recommender package. >> >> >> Cheers, >> >> Jao >> > >
Re: Solve least square problem of the form min norm(A x - b)^2^ + lambda * n * norm(x)^2 ?
Hi Jaonary, The RowPartitionedMatrix is a special case of the BlockMatrix, where the colsPerBlock = nCols. I hope that helps. Burak On Mar 6, 2015 9:13 AM, "Jaonary Rabarisoa" wrote: > Hi Shivaram, > > Thank you for the link. I'm trying to figure out how can I port this to > mllib. May you can help me to understand how pieces fit together. > Currently, in mllib there's different types of distributed matrix : > > BlockMatrix, CoordinateMatrix, IndexedRowMatrix and RowMatrix. Which one > should correspond to RowPartitionedMatrix in ml-matrix ? > > > > On Tue, Mar 3, 2015 at 8:02 PM, Shivaram Venkataraman < > shiva...@eecs.berkeley.edu> wrote: > >> There are couple of solvers that I've written that is part of the AMPLab >> ml-matrix repo [1,2]. These aren't part of MLLib yet though and if you are >> interested in porting them I'd be happy to review it >> >> Thanks >> Shivaram >> >> >> [1] >> https://github.com/amplab/ml-matrix/blob/master/src/main/scala/edu/berkeley/cs/amplab/mlmatrix/TSQR.scala >> [2] >> https://github.com/amplab/ml-matrix/blob/master/src/main/scala/edu/berkeley/cs/amplab/mlmatrix/NormalEquations.scala >> >> On Tue, Mar 3, 2015 at 9:01 AM, Jaonary Rabarisoa >> wrote: >> >>> Dear all, >>> >>> Is there a least square solver based on DistributedMatrix that we can >>> use out of the box in the current (or the master) version of spark ? >>> It seems that the only least square solver available in spark is private >>> to recommender package. >>> >>> >>> Cheers, >>> >>> Jao >>> >> >> >
Re: Solve least square problem of the form min norm(A x - b)^2^ + lambda * n * norm(x)^2 ?
Section 3, 4, 5 in http://www.netlib.org/lapack/lawnspdf/lawn204.pdf is a good reference Shivaram On Mar 6, 2015 9:17 AM, "Jaonary Rabarisoa" wrote: > Do you have a reference paper to the implemented algorithm in TSQR.scala ? > > On Tue, Mar 3, 2015 at 8:02 PM, Shivaram Venkataraman < > shiva...@eecs.berkeley.edu> wrote: > >> There are couple of solvers that I've written that is part of the AMPLab >> ml-matrix repo [1,2]. These aren't part of MLLib yet though and if you are >> interested in porting them I'd be happy to review it >> >> Thanks >> Shivaram >> >> >> [1] >> https://github.com/amplab/ml-matrix/blob/master/src/main/scala/edu/berkeley/cs/amplab/mlmatrix/TSQR.scala >> [2] >> https://github.com/amplab/ml-matrix/blob/master/src/main/scala/edu/berkeley/cs/amplab/mlmatrix/NormalEquations.scala >> >> On Tue, Mar 3, 2015 at 9:01 AM, Jaonary Rabarisoa >> wrote: >> >>> Dear all, >>> >>> Is there a least square solver based on DistributedMatrix that we can >>> use out of the box in the current (or the master) version of spark ? >>> It seems that the only least square solver available in spark is private >>> to recommender package. >>> >>> >>> Cheers, >>> >>> Jao >>> >> >> >
Re: Solve least square problem of the form min norm(A x - b)^2^ + lambda * n * norm(x)^2 ?
Do you have a reference paper to the implemented algorithm in TSQR.scala ? On Tue, Mar 3, 2015 at 8:02 PM, Shivaram Venkataraman < shiva...@eecs.berkeley.edu> wrote: > There are couple of solvers that I've written that is part of the AMPLab > ml-matrix repo [1,2]. These aren't part of MLLib yet though and if you are > interested in porting them I'd be happy to review it > > Thanks > Shivaram > > > [1] > https://github.com/amplab/ml-matrix/blob/master/src/main/scala/edu/berkeley/cs/amplab/mlmatrix/TSQR.scala > [2] > https://github.com/amplab/ml-matrix/blob/master/src/main/scala/edu/berkeley/cs/amplab/mlmatrix/NormalEquations.scala > > On Tue, Mar 3, 2015 at 9:01 AM, Jaonary Rabarisoa > wrote: > >> Dear all, >> >> Is there a least square solver based on DistributedMatrix that we can use >> out of the box in the current (or the master) version of spark ? >> It seems that the only least square solver available in spark is private >> to recommender package. >> >> >> Cheers, >> >> Jao >> > >
Re: Solve least square problem of the form min norm(A x - b)^2^ + lambda * n * norm(x)^2 ?
Hi Shivaram, Thank you for the link. I'm trying to figure out how can I port this to mllib. May you can help me to understand how pieces fit together. Currently, in mllib there's different types of distributed matrix : BlockMatrix, CoordinateMatrix, IndexedRowMatrix and RowMatrix. Which one should correspond to RowPartitionedMatrix in ml-matrix ? On Tue, Mar 3, 2015 at 8:02 PM, Shivaram Venkataraman < shiva...@eecs.berkeley.edu> wrote: > There are couple of solvers that I've written that is part of the AMPLab > ml-matrix repo [1,2]. These aren't part of MLLib yet though and if you are > interested in porting them I'd be happy to review it > > Thanks > Shivaram > > > [1] > https://github.com/amplab/ml-matrix/blob/master/src/main/scala/edu/berkeley/cs/amplab/mlmatrix/TSQR.scala > [2] > https://github.com/amplab/ml-matrix/blob/master/src/main/scala/edu/berkeley/cs/amplab/mlmatrix/NormalEquations.scala > > On Tue, Mar 3, 2015 at 9:01 AM, Jaonary Rabarisoa > wrote: > >> Dear all, >> >> Is there a least square solver based on DistributedMatrix that we can use >> out of the box in the current (or the master) version of spark ? >> It seems that the only least square solver available in spark is private >> to recommender package. >> >> >> Cheers, >> >> Jao >> > >
Re: Solve least square problem of the form min norm(A x - b)^2^ + lambda * n * norm(x)^2 ?
The minimization problem you're describing in the email title also looks like it could be solved using the RidgeRegression solver in MLlib, once you transform your DistributedMatrix into an RDD[LabeledPoint]. On Tue, Mar 3, 2015 at 11:02 AM, Shivaram Venkataraman < shiva...@eecs.berkeley.edu> wrote: > There are couple of solvers that I've written that is part of the AMPLab > ml-matrix repo [1,2]. These aren't part of MLLib yet though and if you are > interested in porting them I'd be happy to review it > > Thanks > Shivaram > > > [1] > https://github.com/amplab/ml-matrix/blob/master/src/main/scala/edu/berkeley/cs/amplab/mlmatrix/TSQR.scala > [2] > https://github.com/amplab/ml-matrix/blob/master/src/main/scala/edu/berkeley/cs/amplab/mlmatrix/NormalEquations.scala > > On Tue, Mar 3, 2015 at 9:01 AM, Jaonary Rabarisoa > wrote: > >> Dear all, >> >> Is there a least square solver based on DistributedMatrix that we can use >> out of the box in the current (or the master) version of spark ? >> It seems that the only least square solver available in spark is private >> to recommender package. >> >> >> Cheers, >> >> Jao >> > >
Re: Solve least square problem of the form min norm(A x - b)^2^ + lambda * n * norm(x)^2 ?
There are couple of solvers that I've written that is part of the AMPLab ml-matrix repo [1,2]. These aren't part of MLLib yet though and if you are interested in porting them I'd be happy to review it Thanks Shivaram [1] https://github.com/amplab/ml-matrix/blob/master/src/main/scala/edu/berkeley/cs/amplab/mlmatrix/TSQR.scala [2] https://github.com/amplab/ml-matrix/blob/master/src/main/scala/edu/berkeley/cs/amplab/mlmatrix/NormalEquations.scala On Tue, Mar 3, 2015 at 9:01 AM, Jaonary Rabarisoa wrote: > Dear all, > > Is there a least square solver based on DistributedMatrix that we can use > out of the box in the current (or the master) version of spark ? > It seems that the only least square solver available in spark is private > to recommender package. > > > Cheers, > > Jao >