Shuji, My answer is "it depends". Using regression on zero/one dummy variables can work, but there are more direct approaches to classification that might work better.
In the case that you showed us, the data were linearly separable by one predictor. If the data that you posted are representative of your typical data sets, you might want to use a very simple approach: identify the important predictor and use an ROC curve to find a good cutoff for it. You don't need the other data. However, if these data represent an "easy" case, you might want to try other approaches to classification. I usually start with either randomForest or a bagged tree (see the randomForest and ipred packages). If I can get good performance with those models, I'll try simpler models that are more interpretable, like single trees (the rpart package), FDA with method = mars (using the mda package) and other approaches. Max -----Original Message----- From: ?? ?? [mailto:[EMAIL PROTECTED] Sent: Monday, April 09, 2007 1:49 PM To: Kuhn, Max Cc: [email protected] Subject: Re: [R] Could not fit correct values in discriminant analysis by bruto. Dear Max, Thank you very much ! Your sample code is very helpful. In linear separable problem, I should use fda by linear regression instead of bruto unless taking some dimensional reduction process, should I? Cheers. Shuji On 2007/04/09, at 22:25, Kuhn, Max wrote: > Shuji, > > I suspect that bruto blows up because your data are linearly > separable. > To see this (if you didn't already know), try > > library(lattice) > splom(~x, groups = y) > > and look at the first row. If you are trying to do classification, > there > are a few methods that would choke on this (logistic regression) and a > few that won't (trees, svms etc). I would guess that bruto is in the > latter group. > > However, if you are try to do classification, try using bruto via fda: > >> tmp <- cbind(x, factor(y)) >> >> fdaFit <- fda(y2~., tmp) >> fdaFit > Call: > fda(formula = y2 ~ ., data = tmp) > > Dimension: 1 > > Percent Between-Group Variance Explained: > v1 > 100 > > Degrees of Freedom (per dimension): 5 > > Training Misclassification Error: 0 ( N = 20 ) >> >> predict(fdaFit, type = "posterior")[1:3,] > 0 1 > 2 0 1 > 2 0 1 > 2 0 1 > > Max > > -----Original Message----- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of ?? ?? > Sent: Sunday, April 08, 2007 10:47 PM > To: [email protected] > Subject: [R] Could not fit correct values in discriminant analysis by > bruto. > > Dear R-users, > > I would like to use "bruto" function in mda package for flexible > discriminant analysis. > Then, I tried, for example, following approach. > >> x > band1 band2 band3 > 1 -1.206780 -1.448007 -1.084431 > 2 -0.294938 -0.113222 -0.888895 > 3 -0.267303 -0.241567 -1.040979 > 4 -1.206780 -1.448007 -1.040979 > 5 -1.151518 -0.806286 -0.671630 > 6 -1.179146 -1.396670 -1.453775 > 7 -0.294938 -0.241567 -1.453775 > 8 -0.350200 -0.267239 -1.084431 > 9 -1.151518 -0.857623 -0.649901 > 10 1.362954 -1.396670 -2.235926 > 11 -0.239675 1.118883 1.457551 > 12 -0.294938 -1.268325 -0.497817 > 13 -0.294938 -0.729278 -0.106745 > 14 -1.123883 -0.703612 -0.150196 > 15 0.616905 1.144548 -0.150196 > 16 -0.267303 1.657930 1.044750 > 17 1.611637 1.041874 0.610225 > 18 -1.123883 -0.677941 0.262605 > 19 -0.239675 -0.626604 -0.128473 > 20 2.274797 1.118883 1.805171 > >> y > [1] 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 > >> fit <- bruto(x,y) > > But, obtained fit$fitted.values are enormously high (or low) . > Execution of bruto(x[,2:3], y) is done well (values are nearly 1 or > 0). > Values of column 1 are wrong or appropriate option is needed? > I contacted the package maintainer, but the problem could not be > solved. > > Thanks > > Shuji Kawaguchi > >> R.version > platform i386-apple-darwin8.8.1 > arch i386 > os darwin8.8.1 > system i386, darwin8.8.1 > version.string R version 2.4.0 (2006-10-03) ---------------------------------------------------------------------- LEGAL NOTICE\ Unless expressly stated otherwise, this messag...{{dropped}} ______________________________________________ [email protected] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
