[jira] [Commented] (SPARK-11918) WLS can not resolve some kinds of equation

2016-09-21 Thread Yanbo Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15508996#comment-15508996
 ] 

Yanbo Liang commented on SPARK-11918:
-

Cholesky decomposition is unstable (for near-singular and rank deficient 
matrices), but it was often used when matrix A is very large and sparse due to 
faster calculation. QR decomposition is more stable than Cholesky, I think we 
should switch to it in the future. I will take a look at this issue. For 
temporary fix, I think throwing a better exception to let users know the 
failure cause is OK. Thanks.

> WLS can not resolve some kinds of equation
> --
>
> Key: SPARK-11918
> URL: https://issues.apache.org/jira/browse/SPARK-11918
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: Yanbo Liang
>Priority: Minor
>  Labels: starter
> Attachments: R_GLM_output
>
>
> Weighted Least Squares (WLS) is one of the optimization method for solve 
> Linear Regression (when #feature < 4096). But if the dataset is very ill 
> condition (such as 0-1 based label used for classification and the equation 
> is underdetermined), the WLS failed (But "l-bfgs" can train and get the 
> model). The failure is caused by the underneath lapack library return error 
> value when Cholesky decomposition.
> This issue is easy to reproduce, you can train a LinearRegressionModel by 
> "normal" solver with the example 
> dataset(https://github.com/apache/spark/blob/master/data/mllib/sample_libsvm_data.txt).
>  The following is the exception:
> {code}
> assertion failed: lapack.dpotrs returned 1.
> java.lang.AssertionError: assertion failed: lapack.dpotrs returned 1.
>   at scala.Predef$.assert(Predef.scala:179)
>   at 
> org.apache.spark.mllib.linalg.CholeskyDecomposition$.solve(CholeskyDecomposition.scala:42)
>   at 
> org.apache.spark.ml.optim.WeightedLeastSquares.fit(WeightedLeastSquares.scala:117)
>   at 
> org.apache.spark.ml.regression.LinearRegression.train(LinearRegression.scala:180)
>   at 
> org.apache.spark.ml.regression.LinearRegression.train(LinearRegression.scala:67)
>   at org.apache.spark.ml.Predictor.fit(Predictor.scala:90)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11918) WLS can not resolve some kinds of equation

2016-09-20 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15507914#comment-15507914
 ] 

Sean Owen commented on SPARK-11918:
---

Copying my comment from the other JIRA - yes we should have a better error.
But is Cholesky the right choice here? for this reason. AtA may not be positive 
definite.

> WLS can not resolve some kinds of equation
> --
>
> Key: SPARK-11918
> URL: https://issues.apache.org/jira/browse/SPARK-11918
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: Yanbo Liang
>Priority: Minor
>  Labels: starter
> Attachments: R_GLM_output
>
>
> Weighted Least Squares (WLS) is one of the optimization method for solve 
> Linear Regression (when #feature < 4096). But if the dataset is very ill 
> condition (such as 0-1 based label used for classification and the equation 
> is underdetermined), the WLS failed (But "l-bfgs" can train and get the 
> model). The failure is caused by the underneath lapack library return error 
> value when Cholesky decomposition.
> This issue is easy to reproduce, you can train a LinearRegressionModel by 
> "normal" solver with the example 
> dataset(https://github.com/apache/spark/blob/master/data/mllib/sample_libsvm_data.txt).
>  The following is the exception:
> {code}
> assertion failed: lapack.dpotrs returned 1.
> java.lang.AssertionError: assertion failed: lapack.dpotrs returned 1.
>   at scala.Predef$.assert(Predef.scala:179)
>   at 
> org.apache.spark.mllib.linalg.CholeskyDecomposition$.solve(CholeskyDecomposition.scala:42)
>   at 
> org.apache.spark.ml.optim.WeightedLeastSquares.fit(WeightedLeastSquares.scala:117)
>   at 
> org.apache.spark.ml.regression.LinearRegression.train(LinearRegression.scala:180)
>   at 
> org.apache.spark.ml.regression.LinearRegression.train(LinearRegression.scala:67)
>   at org.apache.spark.ml.Predictor.fit(Predictor.scala:90)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11918) WLS can not resolve some kinds of equation

2016-02-01 Thread Imran Younus (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15127437#comment-15127437
 ] 

Imran Younus commented on SPARK-11918:
--

Several columns in the given dataset contain only zeros. In this case, the data 
matrix is no full rank. Therefore the Gramian matrix is singular and hence not 
invertible. The Cholesky decomposition will fail in this case.

This will also happen if standard deviation of more than one columns is zero 
(even if the values are not zero).

I think we should catch this error in the code and exit with a warning message.


> WLS can not resolve some kinds of equation
> --
>
> Key: SPARK-11918
> URL: https://issues.apache.org/jira/browse/SPARK-11918
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: Yanbo Liang
>Priority: Minor
>  Labels: starter
> Attachments: R_GLM_output
>
>
> Weighted Least Squares (WLS) is one of the optimization method for solve 
> Linear Regression (when #feature < 4096). But if the dataset is very ill 
> condition (such as 0-1 based label used for classification and the equation 
> is underdetermined), the WLS failed (But "l-bfgs" can train and get the 
> model). The failure is caused by the underneath lapack library return error 
> value when Cholesky decomposition.
> This issue is easy to reproduce, you can train a LinearRegressionModel by 
> "normal" solver with the example 
> dataset(https://github.com/apache/spark/blob/master/data/mllib/sample_libsvm_data.txt).
>  The following is the exception:
> {code}
> assertion failed: lapack.dpotrs returned 1.
> java.lang.AssertionError: assertion failed: lapack.dpotrs returned 1.
>   at scala.Predef$.assert(Predef.scala:179)
>   at 
> org.apache.spark.mllib.linalg.CholeskyDecomposition$.solve(CholeskyDecomposition.scala:42)
>   at 
> org.apache.spark.ml.optim.WeightedLeastSquares.fit(WeightedLeastSquares.scala:117)
>   at 
> org.apache.spark.ml.regression.LinearRegression.train(LinearRegression.scala:180)
>   at 
> org.apache.spark.ml.regression.LinearRegression.train(LinearRegression.scala:67)
>   at org.apache.spark.ml.Predictor.fit(Predictor.scala:90)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11918) WLS can not resolve some kinds of equation

2015-11-23 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15021787#comment-15021787
 ] 

Sean Owen commented on SPARK-11918:
---

[~yanboliang] yes this is true in general of ill-conditioned problems. What are 
you proposing? to propagate the error from lapack in a different way? check the 
condition number? it's roughly speaking the correct behavior in that there's no 
real answer here.

> WLS can not resolve some kinds of equation
> --
>
> Key: SPARK-11918
> URL: https://issues.apache.org/jira/browse/SPARK-11918
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Reporter: Yanbo Liang
> Attachments: R_GLM_output
>
>
> Weighted Least Squares (WLS) is one of the optimization method for solve 
> Linear Regression (when #feature < 4096). But if the dataset is very ill 
> condition (such as 0-1 based label used for classification and the equation 
> is underdetermined), the WLS failed (But "l-bfgs" can train and get the 
> model). The failure is caused by the underneath lapack library return error 
> value when Cholesky decomposition.
> This issue is easy to reproduce, you can train a LinearRegressionModel by 
> "normal" solver with the example 
> dataset(https://github.com/apache/spark/blob/master/data/mllib/sample_libsvm_data.txt).
>  The following is the exception:
> {code}
> assertion failed: lapack.dpotrs returned 1.
> java.lang.AssertionError: assertion failed: lapack.dpotrs returned 1.
>   at scala.Predef$.assert(Predef.scala:179)
>   at 
> org.apache.spark.mllib.linalg.CholeskyDecomposition$.solve(CholeskyDecomposition.scala:42)
>   at 
> org.apache.spark.ml.optim.WeightedLeastSquares.fit(WeightedLeastSquares.scala:117)
>   at 
> org.apache.spark.ml.regression.LinearRegression.train(LinearRegression.scala:180)
>   at 
> org.apache.spark.ml.regression.LinearRegression.train(LinearRegression.scala:67)
>   at org.apache.spark.ml.Predictor.fit(Predictor.scala:90)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11918) WLS can not resolve some kinds of equation

2015-11-23 Thread Yanbo Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15021729#comment-15021729
 ] 

Yanbo Liang commented on SPARK-11918:
-

Further more, I use the breeze library to train the model by local normal 
equation method.
{code}
import sqlCtx.implicits._
import org.apache.spark.mllib.linalg.Vector
import breeze.linalg.DenseMatrix
import breeze.linalg._

val df = MLUtils.loadLibSVMFile(sqlCtx.sparkContext, 
"/Users/yanboliang/data/trunk/spark/data/mllib/sample_libsvm_data.txt").toDF()


val features = df.select(col("features")).map { r =>
  r.getAs[Vector](0)
}.collect().flatMap { v => v.toArray }
val labelArray = df.select(col("label")).map { r =>
  r.getDouble(0)
}.collect()

val Xt = new DenseMatrix[Double](692, 100, features)
val X = Xt.t

val y = new DenseMatrix[Double](100, 1, labelArray)

val XtXi = inv(Xt * X)
val XtY = Xt * y

val coefs = XtXi * XtY

println(coefs.toString)
{code}
It also throw exception

> WLS can not resolve some kinds of equation
> --
>
> Key: SPARK-11918
> URL: https://issues.apache.org/jira/browse/SPARK-11918
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Reporter: Yanbo Liang
> Attachments: R_GLM_output
>
>
> Weighted Least Squares (WLS) is one of the optimization method for solve 
> Linear Regression (when #feature < 4096). But if the dataset is very ill 
> condition (such as 0-1 based label used for classification and the equation 
> is underdetermined), the WLS failed. The failure is caused by the underneath 
> Cholesky Decomposition.
> This issue is easy to reproduce, you can train a LinearRegressionModel by 
> "normal" solver with the example 
> dataset(https://github.com/apache/spark/blob/master/data/mllib/sample_libsvm_data.txt).
>  The following is the exception:
> {code}
> assertion failed: lapack.dpotrs returned 1.
> java.lang.AssertionError: assertion failed: lapack.dpotrs returned 1.
>   at scala.Predef$.assert(Predef.scala:179)
>   at 
> org.apache.spark.mllib.linalg.CholeskyDecomposition$.solve(CholeskyDecomposition.scala:42)
>   at 
> org.apache.spark.ml.optim.WeightedLeastSquares.fit(WeightedLeastSquares.scala:117)
>   at 
> org.apache.spark.ml.regression.LinearRegression.train(LinearRegression.scala:180)
>   at 
> org.apache.spark.ml.regression.LinearRegression.train(LinearRegression.scala:67)
>   at org.apache.spark.ml.Predictor.fit(Predictor.scala:90)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11918) WLS can not resolve some kinds of equation

2015-11-23 Thread Yanbo Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15021756#comment-15021756
 ] 

Yanbo Liang commented on SPARK-11918:
-

Until now, I suspect this is not a bug of MLlib but may be very ill condition 
problem is not suitable to be solved by "normal" equation method. If this 
assumption is right, I think we should document this issue. Looking forward 
your comments [~mengxr].

> WLS can not resolve some kinds of equation
> --
>
> Key: SPARK-11918
> URL: https://issues.apache.org/jira/browse/SPARK-11918
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Reporter: Yanbo Liang
> Attachments: R_GLM_output
>
>
> Weighted Least Squares (WLS) is one of the optimization method for solve 
> Linear Regression (when #feature < 4096). But if the dataset is very ill 
> condition (such as 0-1 based label used for classification and the equation 
> is underdetermined), the WLS failed (But "l-bfgs" can train and get the 
> model). The failure is caused by the underneath lapack library return error 
> value when Cholesky decomposition.
> This issue is easy to reproduce, you can train a LinearRegressionModel by 
> "normal" solver with the example 
> dataset(https://github.com/apache/spark/blob/master/data/mllib/sample_libsvm_data.txt).
>  The following is the exception:
> {code}
> assertion failed: lapack.dpotrs returned 1.
> java.lang.AssertionError: assertion failed: lapack.dpotrs returned 1.
>   at scala.Predef$.assert(Predef.scala:179)
>   at 
> org.apache.spark.mllib.linalg.CholeskyDecomposition$.solve(CholeskyDecomposition.scala:42)
>   at 
> org.apache.spark.ml.optim.WeightedLeastSquares.fit(WeightedLeastSquares.scala:117)
>   at 
> org.apache.spark.ml.regression.LinearRegression.train(LinearRegression.scala:180)
>   at 
> org.apache.spark.ml.regression.LinearRegression.train(LinearRegression.scala:67)
>   at org.apache.spark.ml.Predictor.fit(Predictor.scala:90)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11918) WLS can not resolve some kinds of equation

2015-11-23 Thread Yanbo Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15021830#comment-15021830
 ] 

Yanbo Liang commented on SPARK-11918:
-

[~sowen] Thanks for your comments. I think you have got part of my proposal at 
https://github.com/apache/spark/pull/9905. I also wonder that whether we can 
give better hint for users if they are in the same condition.

> WLS can not resolve some kinds of equation
> --
>
> Key: SPARK-11918
> URL: https://issues.apache.org/jira/browse/SPARK-11918
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Reporter: Yanbo Liang
> Attachments: R_GLM_output
>
>
> Weighted Least Squares (WLS) is one of the optimization method for solve 
> Linear Regression (when #feature < 4096). But if the dataset is very ill 
> condition (such as 0-1 based label used for classification and the equation 
> is underdetermined), the WLS failed (But "l-bfgs" can train and get the 
> model). The failure is caused by the underneath lapack library return error 
> value when Cholesky decomposition.
> This issue is easy to reproduce, you can train a LinearRegressionModel by 
> "normal" solver with the example 
> dataset(https://github.com/apache/spark/blob/master/data/mllib/sample_libsvm_data.txt).
>  The following is the exception:
> {code}
> assertion failed: lapack.dpotrs returned 1.
> java.lang.AssertionError: assertion failed: lapack.dpotrs returned 1.
>   at scala.Predef$.assert(Predef.scala:179)
>   at 
> org.apache.spark.mllib.linalg.CholeskyDecomposition$.solve(CholeskyDecomposition.scala:42)
>   at 
> org.apache.spark.ml.optim.WeightedLeastSquares.fit(WeightedLeastSquares.scala:117)
>   at 
> org.apache.spark.ml.regression.LinearRegression.train(LinearRegression.scala:180)
>   at 
> org.apache.spark.ml.regression.LinearRegression.train(LinearRegression.scala:67)
>   at org.apache.spark.ml.Predictor.fit(Predictor.scala:90)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11918) WLS can not resolve some kinds of equation

2015-11-23 Thread Yanbo Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15021720#comment-15021720
 ] 

Yanbo Liang commented on SPARK-11918:
-

I use the same 
dataset(https://github.com/apache/spark/blob/master/data/mllib/sample_libsvm_data.txt)
 to train LinearRegressionModel with R:::glm, it did not throw exception but 
the result is not confidence. The coefficients of the model contains too many 
NA and NaN which is not reasonable. Please see the attached file to find the 
R:::glm output.

> WLS can not resolve some kinds of equation
> --
>
> Key: SPARK-11918
> URL: https://issues.apache.org/jira/browse/SPARK-11918
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Reporter: Yanbo Liang
> Attachments: R_GLM_output
>
>
> Weighted Least Squares (WLS) is one of the optimization method for solve 
> Linear Regression (when #feature < 4096). But if the dataset is very ill 
> condition (such as 0-1 based label used for classification and the equation 
> is underdetermined), the WLS failed. The failure is caused by the underneath 
> Cholesky Decomposition.
> This issue is easy to reproduce, you can train a LinearRegressionModel by 
> "normal" solver with the example 
> dataset(https://github.com/apache/spark/blob/master/data/mllib/sample_libsvm_data.txt).
>  The following is the exception:
> {code}
> assertion failed: lapack.dpotrs returned 1.
> java.lang.AssertionError: assertion failed: lapack.dpotrs returned 1.
>   at scala.Predef$.assert(Predef.scala:179)
>   at 
> org.apache.spark.mllib.linalg.CholeskyDecomposition$.solve(CholeskyDecomposition.scala:42)
>   at 
> org.apache.spark.ml.optim.WeightedLeastSquares.fit(WeightedLeastSquares.scala:117)
>   at 
> org.apache.spark.ml.regression.LinearRegression.train(LinearRegression.scala:180)
>   at 
> org.apache.spark.ml.regression.LinearRegression.train(LinearRegression.scala:67)
>   at org.apache.spark.ml.Predictor.fit(Predictor.scala:90)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org