[R] SVM probability output variation

2009-10-21 Thread Anders Carlsson

Dear R:ers,

I'm using the svm from the e1071 package to train a model with the 
option probabilities = TRUE. I then use predict with probabilities 
= TRUE and get the probabilities for the data point belonging to either 
class. So far all is well.


My question is why I get different results each time I train the model, 
although I use exactly the same data. The prediction seems to be 
reproducible, but if I re-train the model, the probabilities vary some what.


Here, I have trained a model on exactly the same data five times. When 
predicting using the different models, this is how the probabilities vary:


probabilities
Grp.0Grp.1
0.70771550.2922845
0.79387820.2061218
0.81788330.1821167
0.71222030.2877797

How can the predictions using the same training and test data vary so much?

Thanks,
Anders

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] SVM probability output variation

2009-10-21 Thread Steve Lianoglou

Hi Anders,

On Oct 21, 2009, at 8:49 AM, Anders Carlsson wrote:


Dear R:ers,

I'm using the svm from the e1071 package to train a model with the  
option probabilities = TRUE. I then use predict with  
probabilities = TRUE and get the probabilities for the data point  
belonging to either class. So far all is well.


My question is why I get different results each time I train the  
model, although I use exactly the same data. The prediction seems to  
be reproducible, but if I re-train the model, the probabilities vary  
some what.


Here, I have trained a model on exactly the same data five times.  
When predicting using the different models, this is how the  
probabilities vary:


I'm not sure I'm following the example your giving and the scenario  
you are describing.



probabilities
Grp.0Grp.1
0.70771550.2922845
0.79387820.2061218
0.81788330.1821167
0.71222030.2877797


This seems fine to me: it looks like the probabilities of class  
membership for 4 examples (Note that Grp.0 + Grp.1 = 1).


How can the predictions using the same training and test data vary  
so much?


I'm trying the code below several times (taken from the example), and  
the probabilities calculated from the call to prediction don't change  
much at all:


R data(iris)
R attach(iris)

R model - svm(x, y, probability=TRUE)
R predict(model, x, probability=TRUE)

To be fair, the probabilities aren't exactly the same, but the  
difference between two runs is really small:


R model - svm(x, y, probability=TRUE)
R a - predict(model, x, probability=TRUE)

R model - svm(x, y, probability=TRUE)
R b - predict(model, x, probability=TRUE)

R mean(abs(attr(a, 'probabilities') - attr(b, 'probabilities')))
[1] 0.003215959

Is this what you were talking about, or ... ?

-steve

--
Steve Lianoglou
Graduate Student: Computational Systems Biology
  |  Memorial Sloan-Kettering Cancer Center
  |  Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] SVM probability output variation

2009-10-21 Thread Anders Carlsson
Hi again, and thank you Steve for your reply!


 Hi Anders,
 
 On Oct 21, 2009, at 8:49 AM, Anders Carlsson wrote:
 
  Dear R:ers,
 
  I'm using the svm from the e1071 package to train a model with the
  option probabilities = TRUE. I then use predict with
  probabilities = TRUE and get the probabilities for the data point
  belonging to either class. So far all is well.
 
  My question is why I get different results each time I train the
  model, although I use exactly the same data. The prediction seems to
  be reproducible, but if I re-train the model, the probabilities vary
  some what.
 
  Here, I have trained a model on exactly the same data five times.
  When predicting using the different models, this is how the
  probabilities vary:
 
 I'm not sure I'm following the example your giving and the scenario
 you are describing.

I think you got it!

 
  probabilities
  Grp.0Grp.1
  0.70771550.2922845
  0.79387820.2061218
  0.81788330.1821167
  0.71222030.2877797
 
 This seems fine to me: it looks like the probabilities of class
 membership for 4 examples (Note that Grp.0 + Grp.1 = 1).
 

Yes, within each run all was OK, but I was surprised that it varied to such a 
high degree.

 
  How can the predictions using the same training and test data vary
  so much?
 
 I'm trying the code below several times (taken from the example), and
 the probabilities calculated from the call to prediction don't change
 much at all:
 
 R data(iris)
 R attach(iris)
 
 R model - svm(x, y, probability=TRUE)
 R predict(model, x, probability=TRUE)
 
 To be fair, the probabilities aren't exactly the same, but the
 difference between two runs is really small:
 
 R model - svm(x, y, probability=TRUE)
 R a - predict(model, x, probability=TRUE)
 
 R model - svm(x, y, probability=TRUE)
 R b - predict(model, x, probability=TRUE)
 
 R mean(abs(attr(a, 'probabilities') - attr(b, 'probabilities')))
 [1] 0.003215959
 
 Is this what you were talking about, or ... ?


Yes, exactly that. In your example, though, the variation seems to be a lot 
smaller. I'm guessing that has to with the data. 

If I instead output the decision values, the whole procedure is fully 
reproducible, i.e. the exact same values are returned when I retrain the model. 

I have no idea how the probabilities are calculated, but it seems to be in this 
step that the differences arise. In my case, I feel a bit hesitant to use them 
when they differ that much between runs (15% or so)... 

If important, I use a linear kernel and don't tune the model in any way.


Thank's again!

/Anders
 

 
 -steve
 
 --
 Steve Lianoglou
 Graduate Student: Computational Systems Biology
|  Memorial Sloan-Kettering Cancer Center
|  Weill Medical College of Cornell University
 Contact Info: http://cbio.mskcc.org/~lianos/contact

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] SVM probability output variation

2009-10-21 Thread Steve Lianoglou

Howdy,

On Oct 21, 2009, at 1:05 PM, Anders Carlsson wrote:
snip
Yes, exactly that. In your example, though, the variation seems to  
be a lot smaller. I'm guessing that has to with the data.


If I instead output the decision values, the whole procedure is  
fully reproducible, i.e. the exact same values are returned when I  
retrain the model.


By the decision values, you mean the predict labels, right?

I have no idea how the probabilities are calculated, but it seems to  
be in this step that the differences arise. In my case, I feel a bit  
hesitant to use them when they differ that much between runs (15% or  
so)...


I'd find that a bit disconcerting, too. Can you give a sample of your  
data + code your using that can reproduce this example?


Warning: Brainstorming Below

If I were to calculate probabilities for my class labels, I'd make the  
probability some function of the example's distance from the decision  
boundary.


Now, if your decision boundary isn't changing from run to run (and I  
guess it really shouldn't be, since the SVM returns the maximum margin  
classifier (which is, by definition, unique, right?)), it's hard to  
imagine why these probabilities would change, either ...


... unless you're holding out different subsets of your data during  
training, or perhaps have a different value for your penalty (cost)  
parameter when building the model. I believe you said that you're  
actually training the same exact model each time, though, right?


Anyway, I see the help page for ?svm says this, if it helps:

The probability model for classification fits a logistic distribution  
using maximum likelihood to the decision values of all binary  
classifiers, and computes the a-posteriori class probabilities for the  
multi-class problem using quadratic optimization


-steve

--
Steve Lianoglou
Graduate Student: Computational Systems Biology
  |  Memorial Sloan-Kettering Cancer Center
  |  Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] SVM probability output variation

2009-10-21 Thread Anders Carlsson
Hi,

 snip
  If I instead output the decision values, the whole procedure is
  fully reproducible, i.e. the exact same values are returned when I
  retrain the model.
 
 By the decision values, you mean the predict labels, right?

The output of decision values can be turned on in the predict.svm, and is, as I 
have understood, the distance from the data point to the hyperplane. (I should 
say that my knowledge here is limited to concepts, I know nothing about the 
details in which this works...). I use these to create ROC curves etc.
 
  I have no idea how the probabilities are calculated, but it seems to
  be in this step that the differences arise. In my case, I feel a bit
  hesitant to use them when they differ that much between runs (15% or
  so)...
 
 I'd find that a bit disconcerting, too. Can you give a sample of your
 data + code your using that can reproduce this example?
 

I have the data at the office, so I can't do that now (at home).

 Warning: Brainstorming Below
 
 If I were to calculate probabilities for my class labels, I'd make the
 probability some function of the example's distance from the decision
 boundary.
 
 Now, if your decision boundary isn't changing from run to run (and I
 guess it really shouldn't be, since the SVM returns the maximum margin
 classifier (which is, by definition, unique, right?)), it's hard to
 imagine why these probabilities would change, either ...
 
 ... unless you're holding out different subsets of your data during
 training, or perhaps have a different value for your penalty (cost)
 parameter when building the model. I believe you said that you're
 actually training the same exact model each time, though, right?

Yes, I'm using the exact same data to train each time. I thought this would 
generate identical models, but that doesn't appear to be the case.

 
 Anyway, I see the help page for ?svm says this, if it helps:
 
 The probability model for classification fits a logistic distribution
 using maximum likelihood to the decision values of all binary
 classifiers, and computes the a-posteriori class probabilities for the
 multi-class problem using quadratic optimization

This is where I realise I'm in a bit over my head on the theroy side - this 
means nothing to me...
 
 -steve

Thanks again,
Anders

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.