Re: [MLlib] LogisticRegressionWithSGD and LogisticRegressionWithLBFGS converge with different weights.
Nice to hear that your experiment is consistent to my assumption. The current L1/L2 will penalize the intercept as well which is not idea. I'm working on GLMNET in Spark using OWLQN, and I can exactly get the same solution as R but with scalability in # of rows and columns. Stay tuned! Sincerely, DB Tsai --- My Blog: https://www.dbtsai.com LinkedIn: https://www.linkedin.com/in/dbtsai On Mon, Sep 29, 2014 at 11:45 AM, Yanbo Liang yanboha...@gmail.com wrote: Thank you for all your patient response. I can conclude that if the data is totally separable or over-fit occurs, weights may be different. And it also consistent with my experiment. I have evaluate two different dataset and the result as followed: Loss function: LogisticGradient Regularizer: L2 regParam: 1.0 numIterations: 1 (SGD) Dataset 1: spark-1.1.0/data/mllib/sample_binary_classification_data.txt # of classes: 2 # of samples: 100 # of features: 692 areaUnderROC of both SGD and LBFGS can reach nearly 1.0 Loss function of both optimization method converge nearly 1.7147811767900675E-5 (very very small) Weights of each optimization method is different but looks like multiple relationship (not very strict) just as what DB Tsai mention above. It might be the dataset is totally separable. Dataset 2: http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary.html#german.numer # of classes: 2 # of samples: 1000 # of features: 24 areaUnderROC of both SGD and LBFGS both are nearly 0.8 Loss function of both optimization method converge nearly 0.5367041390107519 Weights of each optimization method is just the same. 2014-09-29 16:05 GMT+08:00 DB Tsai dbt...@dbtsai.com: Can you check the loss of both LBFGS and SGD implementation? One reason maybe SGD doesn't converge well and you can see that by comparing both log-likelihoods. One other potential reason maybe the label of your training data is totally separable, so you can always increase the log-likelihood by multiply a constant to the weights. Sincerely, DB Tsai --- My Blog: https://www.dbtsai.com LinkedIn: https://www.linkedin.com/in/dbtsai On Sun, Sep 28, 2014 at 11:48 AM, Yanbo Liang yanboha...@gmail.com wrote: Hi We have used LogisticRegression with two different optimization method SGD and LBFGS in MLlib. With the same dataset and the same training and test split, but get different weights vector. For example, we use spark-1.1.0/data/mllib/sample_binary_classification_data.txt as our training and test dataset. With LogisticRegressionWithSGD and LogisticRegressionWithLBFGS as training method and the same other parameters. The precisions of these two methods almost near 100% and AUCs are also near 1.0. As far as I know, the convex optimization problem will converge to the global minimum value. (We use SGD with mini batch fraction as 1.0) But I got two different weights vector? Is this expectation or make sense? - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: [MLlib] LogisticRegressionWithSGD and LogisticRegressionWithLBFGS converge with different weights.
The test accuracy doesn't mean the total loss. All points between (-1, 1) can separate points -1 and +1 and give you 1.0 accuracy, but their coressponding loss are different. -Xiangrui On Sun, Sep 28, 2014 at 2:48 AM, Yanbo Liang yanboha...@gmail.com wrote: Hi We have used LogisticRegression with two different optimization method SGD and LBFGS in MLlib. With the same dataset and the same training and test split, but get different weights vector. For example, we use spark-1.1.0/data/mllib/sample_binary_classification_data.txt as our training and test dataset. With LogisticRegressionWithSGD and LogisticRegressionWithLBFGS as training method and the same other parameters. The precisions of these two methods almost near 100% and AUCs are also near 1.0. As far as I know, the convex optimization problem will converge to the global minimum value. (We use SGD with mini batch fraction as 1.0) But I got two different weights vector? Is this expectation or make sense? - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: [MLlib] LogisticRegressionWithSGD and LogisticRegressionWithLBFGS converge with different weights.
Thank you for all your patient response. I can conclude that if the data is totally separable or over-fit occurs, weights may be different. And it also consistent with my experiment. I have evaluate two different dataset and the result as followed: Loss function: LogisticGradient Regularizer: L2 regParam: 1.0 numIterations: 1 (SGD) Dataset 1: spark-1.1.0/data/mllib/sample_binary_classification_data.txt # of classes: 2 # of samples: 100 # of features: 692 areaUnderROC of both SGD and LBFGS can reach nearly 1.0 Loss function of both optimization method converge nearly 1.7147811767900675E-5 (very very small) Weights of each optimization method is different but looks like multiple relationship (not very strict) just as what DB Tsai mention above. It might be the dataset is totally separable. Dataset 2: http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary.html#german.numer # of classes: 2 # of samples: 1000 # of features: 24 areaUnderROC of both SGD and LBFGS both are nearly 0.8 Loss function of both optimization method converge nearly 0.5367041390107519 Weights of each optimization method is just the same. 2014-09-29 16:05 GMT+08:00 DB Tsai dbt...@dbtsai.com: Can you check the loss of both LBFGS and SGD implementation? One reason maybe SGD doesn't converge well and you can see that by comparing both log-likelihoods. One other potential reason maybe the label of your training data is totally separable, so you can always increase the log-likelihood by multiply a constant to the weights. Sincerely, DB Tsai --- My Blog: https://www.dbtsai.com LinkedIn: https://www.linkedin.com/in/dbtsai On Sun, Sep 28, 2014 at 11:48 AM, Yanbo Liang yanboha...@gmail.com wrote: Hi We have used LogisticRegression with two different optimization method SGD and LBFGS in MLlib. With the same dataset and the same training and test split, but get different weights vector. For example, we use spark-1.1.0/data/mllib/sample_binary_classification_data.txt as our training and test dataset. With LogisticRegressionWithSGD and LogisticRegressionWithLBFGS as training method and the same other parameters. The precisions of these two methods almost near 100% and AUCs are also near 1.0. As far as I know, the convex optimization problem will converge to the global minimum value. (We use SGD with mini batch fraction as 1.0) But I got two different weights vector? Is this expectation or make sense?