Hi I read in my dataset using dt <read.table("filename") calling unique(levels(dt$genome1)) yields the following
"aero" "aful" "aquae" "atum_D" "bbur" "bhal" "bmel" "bsub" [9] "buch" "cace" "ccre" "cglu" "cjej" "cper" "cpneuA" "cpneuC" [17] "cpneuJ" "ctraM" "ecoliO157" "hbsp" "hinf" "hpyl" "linn" "llact" [25] "lmon" "mgen" "mjan" "mlep" "mlot" "mpneu" "mpul" "mthe" [33] "mtub" "mtub_cdc" "nost" "pabyssi" "paer" "paero" "pmul" "pyro" [41] "rcon" "rpxx" "saur_mu50" "saur_n315" "sent" "smel" "spneu" "spyo" [49] "ssol" "stok" "styp" "synecho" "tacid" "tmar" "tpal" "tvol" [57] "uure" "vcho" "xfas" "ypes" It shows 60 genomes, which is correct. I extracted a subset as follows possible_relatives_subset <- subset(dt, Y < -5) I am pasting the results below genome1 genome2 parameterX Y 21 sent ecoliO157 0.00590 -200.633493 22 sent paer 0.18603 -100.200570 27 styp ecoliO157 0.00484 -240.708645 28 styp paer 0.18497 -30.250127 41 paer sent 0.18603 -60.200570 44 paer styp 0.18497 -80.250127 49 paer hinf 0.18913 -90.056333 53 paer vcho 0.18703 -10.153929 55 paer pmul 0.18587 -100.208042 67 paer buch 0.21485 -80.898667 70 paer ypes 0.18460 -107.267454 82 paer xfas 0.26268 -61.920552 95 hinf ecoliO157 0.07654 -163.018417 96 hinf paer 0.18913 -10.056333 103 vcho ecoliO157 0.09518 -140.921153 104 vcho paer 0.18703 -10.153929 107 pmul ecoliO157 0.07328 -165.215225 108 pmul paer 0.18587 -10.208042 131 buch ecoliO157 0.15412 -11.746939 132 buch paer 0.21485 -8.898667 137 ypes ecoliO157 0.02705 -19.171851 138 ypes paer 0.18460 -10.267454 171 ecoliO157 sent 0.00590 -20.633493 174 ecoliO157 styp 0.00484 -20.708645 179 ecoliO157 hinf 0.07654 -6.018417 183 ecoliO157 vcho 0.09518 -14.921153 185 ecoliO157 pmul 0.07328 -6.215225 197 ecoliO157 buch 0.15412 -11.746939 200 ecoliO157 ypes 0.02705 -9.171851 211 ecoliO157 xfas 0.25833 -71.091552 217 xfas ecoliO157 0.25833 -75.091552 218 xfas paer 0.26268 -64.920552 I think even a cursory look will tell us that there are not as many unique genomes in the subset results. (around 8/10). However when I do unique(levels(possible_relatives_subset$genome1)), I get [1] "aero" "aful" "aquae" "atum_D" "bbur" "bhal" "bmel" "bsub" [9] "buch" "cace" "ccre" "cglu" "cjej" "cper" "cpneuA" "cpneuC" [17] "cpneuJ" "ctraM" "ecoliO157" "hbsp" "hinf" "hpyl" "linn" "llact" [25] "lmon" "mgen" "mjan" "mlep" "mlot" "mpneu" "mpul" "mthe" [33] "mtub" "mtub_cdc" "nost" "pabyssi" "paer" "paero" "pmul" "pyro" [41] "rcon" "rpxx" "saur_mu50" "saur_n315" "sent" "smel" "spneu" "spyo" [49] "ssol" "stok" "styp" "synecho" "tacid" "tmar" "tpal" "tvol" [57] "uure" "vcho" "xfas" "ypes" Where am I going wrong? I tried calling unique without the levels too, which gives me the following response [1] sent styp paer hinf vcho pmul buch ypes ecoliO157 xfas 60 Levels: aero aful aquae atum_D bbur bhal bmel bsub buch cace ccre cglu cjej cper cpneuA ... ypes --- Weiwei Shi <[EMAIL PROTECTED]> wrote: > Then you need to provide more details about the > calls you made and your dataset. > For example, you can tell us by > str(prunedrelatives, 1) > > how did you call unique on prunedrelative and so on? > I made a test > data it gave me what you wanted (omitted here). > > On 1/26/07, lalitha viswanath > <[EMAIL PROTECTED]> wrote: > > Hi > > The pruned dataset has 8 unique genomes in it > while > > the dataset before pruning has 65 unique genomes > in > > it. > > However calling unique on the pruned dataset seems > to > > return 65 no matter what. > > > > Any assistance in this matter would be > appreciated. > > > > Thanks > > Lalitha > > --- Weiwei Shi <[EMAIL PROTECTED]> wrote: > > > > > Hi, > > > > > > Even you removed "many" genomes1 by setting > score< > > > -5; it is not > > > necessary saying you changed the uniqueness. > > > > > > To check this, you can do like > > > p0 <- unique(dataset[dataset$score< -5, > "genome1"]) > > > # same as subset > > > p1 <- unique(dataset[dataset$score>= -5, > "genome1"]) > > > > > > setdiff(p1, p0) > > > > > > if the output above has NULL, then it means even > > > though you remove > > > many genomes1, but it does not help changing the > > > uniqueness. > > > > > > HTH, > > > > > > weiwei > > > > > > > > > > > > On 1/25/07, lalitha viswanath > > > <[EMAIL PROTECTED]> wrote: > > > > Hi > > > > I am new to R programming and am using subset > to > > > > extract part of a data as follows > > > > > > > > names(dataset) = > > > > c("genome1","genome2","dist","score"); > > > > prunedrelatives <- subset(dataset, score < > -5); > > > > > > > > However when I use unique to find the number > of > > > unique > > > > genomes now present in prunedrelatives I get > > > results > > > > identical to calling unique(dataset$genome1) > > > although > > > > subset has eliminated many genomes and > records. > > > > > > > > I would greatly appreciate your input about > using > > > > "unique" correctly in this regard. > > > > > > > > Thanks > > > > Lalitha > > > > > > > > > > > > > > > > > > > > > > ____________________________________________________________________________________ > > > > TV dinner still cooling? > > > > Check out "Tonight's Picks" on Yahoo! TV. > > > > > > > > ______________________________________________ > > > > R-help@stat.math.ethz.ch mailing list > > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > > PLEASE do read the posting guide > > > http://www.R-project.org/posting-guide.html > > > > and provide commented, minimal, > self-contained, > > > reproducible code. > > > > > > > > > > > > > -- > > > Weiwei Shi, Ph.D > > > Research Scientist > > > GeneGO, Inc. > > > > > > "Did you always know?" > > > "No, I did not. But I believed..." > > > ---Matrix III > > > > > > > > > > > > > > ____________________________________________________________________________________ > > Bored stiff? Loosen up... > > Download and play hundreds of games for free on > > > > > -- > Weiwei Shi, Ph.D > Research Scientist > GeneGO, Inc. > > "Did you always know?" > "No, I did not. But I believed..." > ---Matrix III > ____________________________________________________________________________________ We won't tell. Get more on shows you hate to love ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.