Dan:
1) There is no guarantee that PCA will show separate groups, of course, as that is not its purpose, although it is frequently a side effect. 2) If you were to use a classification method of some sort (discriminant analysis, neural nets, SVM's, model=based classification, ...), my understanding is that yes, indeed, severely unbalanced group membership would, indeed, affect results. A guess is that Bayesian or other methods that could explicitly model the prior membership probabilities would do better. To make it clear why, suppose that there was a 99.9% preference of "dog" and .05% each of the others. Than your datasets would have almost no information on how covariates could distinguish the classes and the best classifier would be to call everything a "dog" no matter what values the covariates had. I presume experts will have more and better to say about this. -- Bert Gunter Genentech Non-Clinical Statistics South San Francisco, CA "The business of the statistician is to catalyze the scientific learning process." - George E. P. Box > -----Original Message----- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of Dan Bolser > Sent: Thursday, November 04, 2004 9:41 AM > To: R mailing list > Subject: [R] highly biased PCA data? > > > Hello, supposing that I have two or three clear categories > for my data, > lets say pet preferece across fish, cat, dog. Lets say most > people rate > their preference as being mostly one of the categories. > > I want to do pca on the data to see three 'groups' of people, > one group > for fish, one for cat and one for dog. I would like to see > the odd person > who likes both or all three in the (appropriate) middle of > the other main > groups. > > Will my data be affected by the fact that I have interviewed 1000 dog > owners, 100 cat owners and 10 fish owners? (assuming that > each scale of > preference has an equal range). > > Cheers, > dan. > > ______________________________________________ > [EMAIL PROTECTED] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > ______________________________________________ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
