[jira] [Comment Edited] (SPARK-11918) WLS can not resolve some kinds of equation

2016-09-20 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15021787#comment-15021787
 ] 

Sean Owen edited comment on SPARK-11918 at 9/20/16 9:54 PM:


[~yanboliang] yes this is true in general of ill-conditioned problems. What are 
you proposing? to propagate the error from lapack in a different way? check the 
condition number? it's roughly speaking the correct behavior in that there's no 
real answer here.

EDIT to my old comment: I don't think that's accurate. It's possible to return 
a 'best' answer in at least some cases that would trigger this problem, like 
two identical features.


was (Author: srowen):
[~yanboliang] yes this is true in general of ill-conditioned problems. What are 
you proposing? to propagate the error from lapack in a different way? check the 
condition number? it's roughly speaking the correct behavior in that there's no 
real answer here.

> WLS can not resolve some kinds of equation
> --
>
> Key: SPARK-11918
> URL: https://issues.apache.org/jira/browse/SPARK-11918
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: Yanbo Liang
>Priority: Minor
>  Labels: starter
> Attachments: R_GLM_output
>
>
> Weighted Least Squares (WLS) is one of the optimization method for solve 
> Linear Regression (when #feature < 4096). But if the dataset is very ill 
> condition (such as 0-1 based label used for classification and the equation 
> is underdetermined), the WLS failed (But "l-bfgs" can train and get the 
> model). The failure is caused by the underneath lapack library return error 
> value when Cholesky decomposition.
> This issue is easy to reproduce, you can train a LinearRegressionModel by 
> "normal" solver with the example 
> dataset(https://github.com/apache/spark/blob/master/data/mllib/sample_libsvm_data.txt).
>  The following is the exception:
> {code}
> assertion failed: lapack.dpotrs returned 1.
> java.lang.AssertionError: assertion failed: lapack.dpotrs returned 1.
>   at scala.Predef$.assert(Predef.scala:179)
>   at 
> org.apache.spark.mllib.linalg.CholeskyDecomposition$.solve(CholeskyDecomposition.scala:42)
>   at 
> org.apache.spark.ml.optim.WeightedLeastSquares.fit(WeightedLeastSquares.scala:117)
>   at 
> org.apache.spark.ml.regression.LinearRegression.train(LinearRegression.scala:180)
>   at 
> org.apache.spark.ml.regression.LinearRegression.train(LinearRegression.scala:67)
>   at org.apache.spark.ml.Predictor.fit(Predictor.scala:90)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-11918) WLS can not resolve some kinds of equation

2016-02-01 Thread Imran Younus (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15127437#comment-15127437
 ] 

Imran Younus edited comment on SPARK-11918 at 2/2/16 2:12 AM:
--

Several columns in the given dataset contain only zeros. In this case, the data 
matrix is no full rank. Therefore the Gramian matrix is singular and hence not 
invertible. The Cholesky decomposition will fail in this case.

This will also happen if standard deviation of more than one columns is zero 
(even if the values are not zero).

I think we should catch this error in the code and exit with a warning message.
 OR we can drop columns with zero variance, and continue with the algorithm.


was (Author: iyounus):
Several columns in the given dataset contain only zeros. In this case, the data 
matrix is no full rank. Therefore the Gramian matrix is singular and hence not 
invertible. The Cholesky decomposition will fail in this case.

This will also happen if standard deviation of more than one columns is zero 
(even if the values are not zero).

I think we should catch this error in the code and exit with a warning message.


> WLS can not resolve some kinds of equation
> --
>
> Key: SPARK-11918
> URL: https://issues.apache.org/jira/browse/SPARK-11918
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: Yanbo Liang
>Priority: Minor
>  Labels: starter
> Attachments: R_GLM_output
>
>
> Weighted Least Squares (WLS) is one of the optimization method for solve 
> Linear Regression (when #feature < 4096). But if the dataset is very ill 
> condition (such as 0-1 based label used for classification and the equation 
> is underdetermined), the WLS failed (But "l-bfgs" can train and get the 
> model). The failure is caused by the underneath lapack library return error 
> value when Cholesky decomposition.
> This issue is easy to reproduce, you can train a LinearRegressionModel by 
> "normal" solver with the example 
> dataset(https://github.com/apache/spark/blob/master/data/mllib/sample_libsvm_data.txt).
>  The following is the exception:
> {code}
> assertion failed: lapack.dpotrs returned 1.
> java.lang.AssertionError: assertion failed: lapack.dpotrs returned 1.
>   at scala.Predef$.assert(Predef.scala:179)
>   at 
> org.apache.spark.mllib.linalg.CholeskyDecomposition$.solve(CholeskyDecomposition.scala:42)
>   at 
> org.apache.spark.ml.optim.WeightedLeastSquares.fit(WeightedLeastSquares.scala:117)
>   at 
> org.apache.spark.ml.regression.LinearRegression.train(LinearRegression.scala:180)
>   at 
> org.apache.spark.ml.regression.LinearRegression.train(LinearRegression.scala:67)
>   at org.apache.spark.ml.Predictor.fit(Predictor.scala:90)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-11918) WLS can not resolve some kinds of equation

2015-11-23 Thread Yanbo Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15021729#comment-15021729
 ] 

Yanbo Liang edited comment on SPARK-11918 at 11/23/15 8:31 AM:
---

Further more, I use the breeze library to train the model by local normal 
equation method.
{code}
import sqlCtx.implicits._
import org.apache.spark.mllib.linalg.Vector
import breeze.linalg.DenseMatrix
import breeze.linalg._

val df = MLUtils.loadLibSVMFile(sqlCtx.sparkContext, 
"/Users/yanboliang/data/trunk/spark/data/mllib/sample_libsvm_data.txt").toDF()


val features = df.select(col("features")).map { r =>
  r.getAs[Vector](0)
}.collect().flatMap { v => v.toArray }
val labelArray = df.select(col("label")).map { r =>
  r.getDouble(0)
}.collect()

val Xt = new DenseMatrix[Double](692, 100, features)
val X = Xt.t

val y = new DenseMatrix[Double](100, 1, labelArray)

val XtXi = inv(Xt * X)
val XtY = Xt * y

val coefs = XtXi * XtY

println(coefs.toString)
{code}
It also throw exception like:
{code}
breeze.linalg.MatrixSingularException: 
at breeze.linalg.inv$$anon$1.apply(inv.scala:36)
at breeze.linalg.inv$$anon$1.apply(inv.scala:19)
at breeze.generic.UFunc$class.apply(UFunc.scala:48)
at breeze.linalg.inv$.apply(inv.scala:17)
{code}
The breeze.linalg.inv is also call netlib LAPACK package which is the same 
library as Spark. Tracking the breeze code, we can get this exception is thrown 
at here 
(https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/linalg/functions/inv.scala#L33)
 which is also caused by the underneath lapack error. 


was (Author: yanboliang):
Further more, I use the breeze library to train the model by local normal 
equation method.
{code}
import sqlCtx.implicits._
import org.apache.spark.mllib.linalg.Vector
import breeze.linalg.DenseMatrix
import breeze.linalg._

val df = MLUtils.loadLibSVMFile(sqlCtx.sparkContext, 
"/Users/yanboliang/data/trunk/spark/data/mllib/sample_libsvm_data.txt").toDF()


val features = df.select(col("features")).map { r =>
  r.getAs[Vector](0)
}.collect().flatMap { v => v.toArray }
val labelArray = df.select(col("label")).map { r =>
  r.getDouble(0)
}.collect()

val Xt = new DenseMatrix[Double](692, 100, features)
val X = Xt.t

val y = new DenseMatrix[Double](100, 1, labelArray)

val XtXi = inv(Xt * X)
val XtY = Xt * y

val coefs = XtXi * XtY

println(coefs.toString)
{code}
It also throw exception

> WLS can not resolve some kinds of equation
> --
>
> Key: SPARK-11918
> URL: https://issues.apache.org/jira/browse/SPARK-11918
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Reporter: Yanbo Liang
> Attachments: R_GLM_output
>
>
> Weighted Least Squares (WLS) is one of the optimization method for solve 
> Linear Regression (when #feature < 4096). But if the dataset is very ill 
> condition (such as 0-1 based label used for classification and the equation 
> is underdetermined), the WLS failed. The failure is caused by the underneath 
> Cholesky Decomposition.
> This issue is easy to reproduce, you can train a LinearRegressionModel by 
> "normal" solver with the example 
> dataset(https://github.com/apache/spark/blob/master/data/mllib/sample_libsvm_data.txt).
>  The following is the exception:
> {code}
> assertion failed: lapack.dpotrs returned 1.
> java.lang.AssertionError: assertion failed: lapack.dpotrs returned 1.
>   at scala.Predef$.assert(Predef.scala:179)
>   at 
> org.apache.spark.mllib.linalg.CholeskyDecomposition$.solve(CholeskyDecomposition.scala:42)
>   at 
> org.apache.spark.ml.optim.WeightedLeastSquares.fit(WeightedLeastSquares.scala:117)
>   at 
> org.apache.spark.ml.regression.LinearRegression.train(LinearRegression.scala:180)
>   at 
> org.apache.spark.ml.regression.LinearRegression.train(LinearRegression.scala:67)
>   at org.apache.spark.ml.Predictor.fit(Predictor.scala:90)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-11918) WLS can not resolve some kinds of equation

2015-11-23 Thread Yanbo Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15021729#comment-15021729
 ] 

Yanbo Liang edited comment on SPARK-11918 at 11/23/15 8:44 AM:
---

Further more, I use the breeze library to train the model by local normal 
equation method.
{code}
import sqlCtx.implicits._
import org.apache.spark.mllib.linalg.Vector
import breeze.linalg.DenseMatrix
import breeze.linalg._

val df = MLUtils.loadLibSVMFile(sqlCtx.sparkContext, 
"/Users/yanboliang/data/trunk/spark/data/mllib/sample_libsvm_data.txt").toDF()


val features = df.select(col("features")).map { r =>
  r.getAs[Vector](0)
}.collect().flatMap { v => v.toArray }
val labelArray = df.select(col("label")).map { r =>
  r.getDouble(0)
}.collect()

val Xt = new DenseMatrix[Double](692, 100, features)
val X = Xt.t

val y = new DenseMatrix[Double](100, 1, labelArray)

val XtXi = inv(Xt * X)
val XtY = Xt * y

val coefs = XtXi * XtY

println(coefs.toString)
{code}
It also throw exception like:
{code}
breeze.linalg.MatrixSingularException: 
at breeze.linalg.inv$$anon$1.apply(inv.scala:36)
at breeze.linalg.inv$$anon$1.apply(inv.scala:19)
at breeze.generic.UFunc$class.apply(UFunc.scala:48)
at breeze.linalg.inv$.apply(inv.scala:17)
{code}
breeze.linalg.inv is also call netlib lapack library which is the same as 
Spark. Tracking the breeze code, we can get this exception is thrown at here 
(https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/linalg/functions/inv.scala#L33)
 also caused by the underneath lapack error. 


was (Author: yanboliang):
Further more, I use the breeze library to train the model by local normal 
equation method.
{code}
import sqlCtx.implicits._
import org.apache.spark.mllib.linalg.Vector
import breeze.linalg.DenseMatrix
import breeze.linalg._

val df = MLUtils.loadLibSVMFile(sqlCtx.sparkContext, 
"/Users/yanboliang/data/trunk/spark/data/mllib/sample_libsvm_data.txt").toDF()


val features = df.select(col("features")).map { r =>
  r.getAs[Vector](0)
}.collect().flatMap { v => v.toArray }
val labelArray = df.select(col("label")).map { r =>
  r.getDouble(0)
}.collect()

val Xt = new DenseMatrix[Double](692, 100, features)
val X = Xt.t

val y = new DenseMatrix[Double](100, 1, labelArray)

val XtXi = inv(Xt * X)
val XtY = Xt * y

val coefs = XtXi * XtY

println(coefs.toString)
{code}
It also throw exception like:
{code}
breeze.linalg.MatrixSingularException: 
at breeze.linalg.inv$$anon$1.apply(inv.scala:36)
at breeze.linalg.inv$$anon$1.apply(inv.scala:19)
at breeze.generic.UFunc$class.apply(UFunc.scala:48)
at breeze.linalg.inv$.apply(inv.scala:17)
{code}
The breeze.linalg.inv is also call netlib LAPACK package which is the same 
library as Spark. Tracking the breeze code, we can get this exception is thrown 
at here 
(https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/linalg/functions/inv.scala#L33)
 which is also caused by the underneath lapack error. 

> WLS can not resolve some kinds of equation
> --
>
> Key: SPARK-11918
> URL: https://issues.apache.org/jira/browse/SPARK-11918
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Reporter: Yanbo Liang
> Attachments: R_GLM_output
>
>
> Weighted Least Squares (WLS) is one of the optimization method for solve 
> Linear Regression (when #feature < 4096). But if the dataset is very ill 
> condition (such as 0-1 based label used for classification and the equation 
> is underdetermined), the WLS failed (But "l-bfgs" can train and get the 
> model). The failure is caused by the underneath lapack library return error 
> value when Cholesky decomposition.
> This issue is easy to reproduce, you can train a LinearRegressionModel by 
> "normal" solver with the example 
> dataset(https://github.com/apache/spark/blob/master/data/mllib/sample_libsvm_data.txt).
>  The following is the exception:
> {code}
> assertion failed: lapack.dpotrs returned 1.
> java.lang.AssertionError: assertion failed: lapack.dpotrs returned 1.
>   at scala.Predef$.assert(Predef.scala:179)
>   at 
> org.apache.spark.mllib.linalg.CholeskyDecomposition$.solve(CholeskyDecomposition.scala:42)
>   at 
> org.apache.spark.ml.optim.WeightedLeastSquares.fit(WeightedLeastSquares.scala:117)
>   at 
> org.apache.spark.ml.regression.LinearRegression.train(LinearRegression.scala:180)
>   at 
> org.apache.spark.ml.regression.LinearRegression.train(LinearRegression.scala:67)
>   at org.apache.spark.ml.Predictor.fit(Predictor.scala:90)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)