Re: What about implementing various hypothesis test for LogisticRegression in MLlib

2014-08-24 Thread Xiangrui Meng
Thanks for the reference! Many tests are not designed for big data:
http://magazine.amstat.org/blog/2010/09/01/statrevolution/ . So we
need to understand which tests are proper. Feel free to create a JIRA
and let's move our discussion there. -Xiangrui

On Fri, Aug 22, 2014 at 8:44 PM, guxiaobo1982 guxiaobo1...@qq.com wrote:
 Hi Xiangrui,

 You can refer to An Introduction to Statistical Learning with Applications
 in R, there are many stander hypothesis test to do regarding to linear
 regression and logistic regression, they should be implement in the fist
 order, then we will  list some other testes, which are also important when
 using logistic regression to build score cards.

 Xiaobo Gu


 -- Original --
 From:  Xiangrui Meng;men...@gmail.com;
 Send time: Wednesday, Aug 20, 2014 2:18 PM
 To: guxiaobo1...@qq.com;
 Cc: user@spark.apache.orguser@spark.apache.org;
 Subject:  Re: What about implementing various hypothesis test for
 LogisticRegression in MLlib

 We implemented chi-squared tests in v1.1:
 https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/stat/Statistics.scala#L166
 and we will add more after v1.1. Feedback on which tests should come
 first would be greatly appreciated. -Xiangrui

 On Tue, Aug 19, 2014 at 9:50 PM, guxiaobo1982 guxiaobo1...@qq.com wrote:
 Hi,

 From the documentation I think only the model fitting part is implement,
 what about the various hypothesis test and performance indexes used to
 evaluate the model fit?

 Regards,

 Xiaobo Gu

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: What about implementing various hypothesis test for LogisticRegression in MLlib

2014-08-22 Thread guxiaobo1982
Hi Xiangrui,


You can refer to An Introduction to Statistical Learning with Applications in 
R, there are many stander hypothesis test to do regarding to linear 
regression and logistic regression, they should be implement in the fist order, 
then we will  list some other testes, which are also important when using 
logistic regression to build score cards.


Xiaobo Gu




-- Original --
From:  Xiangrui Meng;men...@gmail.com;
Send time: Wednesday, Aug 20, 2014 2:18 PM
To: guxiaobo1...@qq.com; 
Cc: user@spark.apache.orguser@spark.apache.org; 
Subject:  Re: What about implementing various hypothesis test for 
LogisticRegression in MLlib



We implemented chi-squared tests in v1.1:
https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/stat/Statistics.scala#L166
and we will add more after v1.1. Feedback on which tests should come
first would be greatly appreciated. -Xiangrui

On Tue, Aug 19, 2014 at 9:50 PM, guxiaobo1982 guxiaobo1...@qq.com wrote:
 Hi,

 From the documentation I think only the model fitting part is implement,
 what about the various hypothesis test and performance indexes used to
 evaluate the model fit?

 Regards,

 Xiaobo Gu

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org