[R] Sugeestion about tuning of SVM

2012-06-15 Thread Guido Leoni
Dear list
I've a generic question about how to tune an SVM
I'm trying to classify  with caret package some population data from a
case-control study . In each column of my matrix there are the SNP
genotypes , in each row there are the individuals.
I correctly splitted my total dataset in training(132 individuals) and test
(50 individuals) (respecting the total observed genotypic frequencies and
the % of cases and controls)
After training (with radial RBF function)  I have an accuracy of the best
model of 76% but applying the model to my test dataset the accuracy
decreases to 52%.
Obviously i expected the decrease but this appear to be quite big in my
opinion.
I manually checked the predictions for my test dataset and some cases that
have no risk allele are not well classified. Similar cases in my training
dataset are well recognized.
Please could you suggest to me which parameters modify  in order to improve
the classification for the test dataset? or better which could be the
causes that could originate this big discrepancy?
I know that my question is very generic but i'm very newbie to this kind of
analysis so please any suggestion is the welcome
thank you very much
Guido

-- 
Guido Leoni
National Research Institute on Food and Nutrition
(I.N.R.A.N.)
via Ardeatina 546
00178 Rome
Italy

tel + 39 06 51 49 41 (operator)
+ 39 06 51 49 4498 (direct)

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sugeestion about tuning of SVM

2012-06-15 Thread Mark Leeds
Hi: I don't know anything about gentoypes but it sounds like you overfitted
the training set so you should try using regularization. In standard
svm-classification algorithms, that can be done by decreasing the parameter
C which decreases the objective functional penalty for mis-classifying. (
allows the margin to increase by allowing  the
algorithm to mis-classify more often ) But you're using caret rather than
one of the svm packages directly so the parameter might be called something
else rather than C.

There are so many books on support vector machines but a nice intro from an
R perspective is Support Vector Machines in R in the Journal of
Statistical Software. ( it's free at www.jstatsoft.com )










On Fri, Jun 15, 2012 at 8:19 AM, Guido Leoni guido.le...@gmail.com wrote:

 Dear list
 I've a generic question about how to tune an SVM
 I'm trying to classify  with caret package some population data from a
 case-control study . In each column of my matrix there are the SNP
 genotypes , in each row there are the individuals.
 I correctly splitted my total dataset in training(132 individuals) and test
 (50 individuals) (respecting the total observed genotypic frequencies and
 the % of cases and controls)
 After training (with radial RBF function)  I have an accuracy of the best
 model of 76% but applying the model to my test dataset the accuracy
 decreases to 52%.
 Obviously i expected the decrease but this appear to be quite big in my
 opinion.
 I manually checked the predictions for my test dataset and some cases that
 have no risk allele are not well classified. Similar cases in my training
 dataset are well recognized.
 Please could you suggest to me which parameters modify  in order to improve
 the classification for the test dataset? or better which could be the
 causes that could originate this big discrepancy?
 I know that my question is very generic but i'm very newbie to this kind of
 analysis so please any suggestion is the welcome
 thank you very much
 Guido

 --
 Guido Leoni
 National Research Institute on Food and Nutrition
 (I.N.R.A.N.)
 via Ardeatina 546
 00178 Rome
 Italy

 tel + 39 06 51 49 41 (operator)
+ 39 06 51 49 4498 (direct)

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.