Re: [R] Statistical significance of a classifier

Martin C. Martin Fri, 05 Aug 2005 14:16:09 -0700

Liaw, Andy wrote:

>>From: Martin C. Martin
>>
>>Hi,
>>
>>I have a bunch of data points x from two classes A & B, and 
>>I'm creating 
>>a classifier.  So I have a function f(x) which estimates the 
>>probability 
>>that x is in class A.  (I have an equal number of examples of 
>>each, so 
>>p(class) = 0.5.)
>>
>>One way of seeing how well this does is to compute the error 
>>rate on the 
>>test set, i.e. if f(x)>0.5 call it A, and see how many times I 
>>misclassify an item.  That's what MASS does.  But we should 
>>    
>>
>
>Surely you mean `99% of dataminers/machine learners' rather than `MASS'?
>  
>


That was my impression, but I didn't want to presume to speak for most 
dataminers/machine learners.


>>be able to 
>>do better: misclassifying should be more of a problem if the 
>>regression 
>>is confident then if it isn't.
>>
>>How can I show that my f(x) = P(x is in class A) does better 
>>than chance?
>>    
>>
>
>It depends on what you mean by `better'.  For some problem, people are
>perfectly happy with misclassifcation rate.  For others, the estimated
>probabilities count a lot more.  One possibility is to look at the ROC
>curve.  Another possibility is to look at the calibration curve (see MASS
>the book).
>  
>

Thanks, those are getting closer to what I want.  I think the bottom 
line is that I can't really assign a p-value the way I want to, since 
the problem I'm thinking of is ill-posed.

Thanks,
Martin


        [[alternative HTML version deleted]]

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Statistical significance of a classifier

Reply via email to