[R] SVM probability output variation
Dear R:ers, I'm using the svm from the e1071 package to train a model with the option probabilities = TRUE. I then use predict with probabilities = TRUE and get the probabilities for the data point belonging to either class. So far all is well. My question is why I get different results each time I train the model, although I use exactly the same data. The prediction seems to be reproducible, but if I re-train the model, the probabilities vary some what. Here, I have trained a model on exactly the same data five times. When predicting using the different models, this is how the probabilities vary: probabilities Grp.0Grp.1 0.70771550.2922845 0.79387820.2061218 0.81788330.1821167 0.71222030.2877797 How can the predictions using the same training and test data vary so much? Thanks, Anders __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] SVM probability output variation
Hi Anders, On Oct 21, 2009, at 8:49 AM, Anders Carlsson wrote: Dear R:ers, I'm using the svm from the e1071 package to train a model with the option probabilities = TRUE. I then use predict with probabilities = TRUE and get the probabilities for the data point belonging to either class. So far all is well. My question is why I get different results each time I train the model, although I use exactly the same data. The prediction seems to be reproducible, but if I re-train the model, the probabilities vary some what. Here, I have trained a model on exactly the same data five times. When predicting using the different models, this is how the probabilities vary: I'm not sure I'm following the example your giving and the scenario you are describing. probabilities Grp.0Grp.1 0.70771550.2922845 0.79387820.2061218 0.81788330.1821167 0.71222030.2877797 This seems fine to me: it looks like the probabilities of class membership for 4 examples (Note that Grp.0 + Grp.1 = 1). How can the predictions using the same training and test data vary so much? I'm trying the code below several times (taken from the example), and the probabilities calculated from the call to prediction don't change much at all: R data(iris) R attach(iris) R model - svm(x, y, probability=TRUE) R predict(model, x, probability=TRUE) To be fair, the probabilities aren't exactly the same, but the difference between two runs is really small: R model - svm(x, y, probability=TRUE) R a - predict(model, x, probability=TRUE) R model - svm(x, y, probability=TRUE) R b - predict(model, x, probability=TRUE) R mean(abs(attr(a, 'probabilities') - attr(b, 'probabilities'))) [1] 0.003215959 Is this what you were talking about, or ... ? -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] SVM probability output variation
Hi again, and thank you Steve for your reply! Hi Anders, On Oct 21, 2009, at 8:49 AM, Anders Carlsson wrote: Dear R:ers, I'm using the svm from the e1071 package to train a model with the option probabilities = TRUE. I then use predict with probabilities = TRUE and get the probabilities for the data point belonging to either class. So far all is well. My question is why I get different results each time I train the model, although I use exactly the same data. The prediction seems to be reproducible, but if I re-train the model, the probabilities vary some what. Here, I have trained a model on exactly the same data five times. When predicting using the different models, this is how the probabilities vary: I'm not sure I'm following the example your giving and the scenario you are describing. I think you got it! probabilities Grp.0Grp.1 0.70771550.2922845 0.79387820.2061218 0.81788330.1821167 0.71222030.2877797 This seems fine to me: it looks like the probabilities of class membership for 4 examples (Note that Grp.0 + Grp.1 = 1). Yes, within each run all was OK, but I was surprised that it varied to such a high degree. How can the predictions using the same training and test data vary so much? I'm trying the code below several times (taken from the example), and the probabilities calculated from the call to prediction don't change much at all: R data(iris) R attach(iris) R model - svm(x, y, probability=TRUE) R predict(model, x, probability=TRUE) To be fair, the probabilities aren't exactly the same, but the difference between two runs is really small: R model - svm(x, y, probability=TRUE) R a - predict(model, x, probability=TRUE) R model - svm(x, y, probability=TRUE) R b - predict(model, x, probability=TRUE) R mean(abs(attr(a, 'probabilities') - attr(b, 'probabilities'))) [1] 0.003215959 Is this what you were talking about, or ... ? Yes, exactly that. In your example, though, the variation seems to be a lot smaller. I'm guessing that has to with the data. If I instead output the decision values, the whole procedure is fully reproducible, i.e. the exact same values are returned when I retrain the model. I have no idea how the probabilities are calculated, but it seems to be in this step that the differences arise. In my case, I feel a bit hesitant to use them when they differ that much between runs (15% or so)... If important, I use a linear kernel and don't tune the model in any way. Thank's again! /Anders -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] SVM probability output variation
Howdy, On Oct 21, 2009, at 1:05 PM, Anders Carlsson wrote: snip Yes, exactly that. In your example, though, the variation seems to be a lot smaller. I'm guessing that has to with the data. If I instead output the decision values, the whole procedure is fully reproducible, i.e. the exact same values are returned when I retrain the model. By the decision values, you mean the predict labels, right? I have no idea how the probabilities are calculated, but it seems to be in this step that the differences arise. In my case, I feel a bit hesitant to use them when they differ that much between runs (15% or so)... I'd find that a bit disconcerting, too. Can you give a sample of your data + code your using that can reproduce this example? Warning: Brainstorming Below If I were to calculate probabilities for my class labels, I'd make the probability some function of the example's distance from the decision boundary. Now, if your decision boundary isn't changing from run to run (and I guess it really shouldn't be, since the SVM returns the maximum margin classifier (which is, by definition, unique, right?)), it's hard to imagine why these probabilities would change, either ... ... unless you're holding out different subsets of your data during training, or perhaps have a different value for your penalty (cost) parameter when building the model. I believe you said that you're actually training the same exact model each time, though, right? Anyway, I see the help page for ?svm says this, if it helps: The probability model for classification fits a logistic distribution using maximum likelihood to the decision values of all binary classifiers, and computes the a-posteriori class probabilities for the multi-class problem using quadratic optimization -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] SVM probability output variation
Hi, snip If I instead output the decision values, the whole procedure is fully reproducible, i.e. the exact same values are returned when I retrain the model. By the decision values, you mean the predict labels, right? The output of decision values can be turned on in the predict.svm, and is, as I have understood, the distance from the data point to the hyperplane. (I should say that my knowledge here is limited to concepts, I know nothing about the details in which this works...). I use these to create ROC curves etc. I have no idea how the probabilities are calculated, but it seems to be in this step that the differences arise. In my case, I feel a bit hesitant to use them when they differ that much between runs (15% or so)... I'd find that a bit disconcerting, too. Can you give a sample of your data + code your using that can reproduce this example? I have the data at the office, so I can't do that now (at home). Warning: Brainstorming Below If I were to calculate probabilities for my class labels, I'd make the probability some function of the example's distance from the decision boundary. Now, if your decision boundary isn't changing from run to run (and I guess it really shouldn't be, since the SVM returns the maximum margin classifier (which is, by definition, unique, right?)), it's hard to imagine why these probabilities would change, either ... ... unless you're holding out different subsets of your data during training, or perhaps have a different value for your penalty (cost) parameter when building the model. I believe you said that you're actually training the same exact model each time, though, right? Yes, I'm using the exact same data to train each time. I thought this would generate identical models, but that doesn't appear to be the case. Anyway, I see the help page for ?svm says this, if it helps: The probability model for classification fits a logistic distribution using maximum likelihood to the decision values of all binary classifiers, and computes the a-posteriori class probabilities for the multi-class problem using quadratic optimization This is where I realise I'm in a bit over my head on the theroy side - this means nothing to me... -steve Thanks again, Anders __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.