On Thu, 4 Nov 2004, Berton Gunter wrote: > >Dan: > > >1) There is no guarantee that PCA will show separate groups, of course, as >that is not its purpose, although it is frequently a side effect. > >2) If you were to use a classification method of some sort (discriminant >analysis, neural nets, SVM's, model=based classification, ...), my >understanding is that yes, indeed, severely unbalanced group membership >would, indeed, affect results. A guess is that Bayesian or other methods >that could explicitly model the prior membership probabilities would do >better. To make it clear why, suppose that there was a 99.9% preference of >"dog" and .05% each of the others. Than your datasets would have almost no >information on how covariates could distinguish the classes and the best >classifier would be to call everything a "dog" no matter what values the >covariates had. > >I presume experts will have more and better to say about this.
Sounds interesting. Thanks very much for the input. Just out of curiosity, given that I can make my data more uniform (less biased), how could I best generate a 2d plot to encapsulate the clusters (and inter cluster relationships)? Actually I am thinking of a 2d density. > >-- Bert Gunter >Genentech Non-Clinical Statistics >South San Francisco, CA > >"The business of the statistician is to catalyze the scientific learning >process." - George E. P. Box > > > >> -----Original Message----- >> From: [EMAIL PROTECTED] >> [mailto:[EMAIL PROTECTED] On Behalf Of Dan Bolser >> Sent: Thursday, November 04, 2004 9:41 AM >> To: R mailing list >> Subject: [R] highly biased PCA data? >> >> >> Hello, supposing that I have two or three clear categories >> for my data, >> lets say pet preferece across fish, cat, dog. Lets say most >> people rate >> their preference as being mostly one of the categories. >> >> I want to do pca on the data to see three 'groups' of people, >> one group >> for fish, one for cat and one for dog. I would like to see >> the odd person >> who likes both or all three in the (appropriate) middle of >> the other main >> groups. >> >> Will my data be affected by the fact that I have interviewed 1000 dog >> owners, 100 cat owners and 10 fish owners? (assuming that >> each scale of >> preference has an equal range). >> >> Cheers, >> dan. >> >> ______________________________________________ >> [EMAIL PROTECTED] mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide! >> http://www.R-project.org/posting-guide.html >> > ______________________________________________ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
