Re: [R] unique/subset problem
Hi The pruned dataset has 8 unique genomes in it while the dataset before pruning has 65 unique genomes in it. However calling unique on the pruned dataset seems to return 65 no matter what. Any assistance in this matter would be appreciated. Thanks Lalitha --- Weiwei Shi [EMAIL PROTECTED] wrote: Hi, Even you removed many genomes1 by setting score -5; it is not necessary saying you changed the uniqueness. To check this, you can do like p0 - unique(dataset[dataset$score -5, genome1]) # same as subset p1 - unique(dataset[dataset$score= -5, genome1]) setdiff(p1, p0) if the output above has NULL, then it means even though you remove many genomes1, but it does not help changing the uniqueness. HTH, weiwei On 1/25/07, lalitha viswanath [EMAIL PROTECTED] wrote: Hi I am new to R programming and am using subset to extract part of a data as follows names(dataset) = c(genome1,genome2,dist,score); prunedrelatives - subset(dataset, score -5); However when I use unique to find the number of unique genomes now present in prunedrelatives I get results identical to calling unique(dataset$genome1) although subset has eliminated many genomes and records. I would greatly appreciate your input about using unique correctly in this regard. Thanks Lalitha TV dinner still cooling? Check out Tonight's Picks on Yahoo! TV. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Weiwei Shi, Ph.D Research Scientist GeneGO, Inc. Did you always know? No, I did not. But I believed... ---Matrix III Bored stiff? Loosen up... __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] unique/subset problem
Without knowing more about your data, it is hard to say for certain, but might you be confusing unique _values_ with _factor levels_? mydata - as.factor(sort(rep(1:5, 2))) # mydata has 10 values, 5 unique values, and 5 factor levels mydata [1] 1 1 2 2 3 3 4 4 5 5 Levels: 1 2 3 4 5 unique(mydata) [1] 1 2 3 4 5 Levels: 1 2 3 4 5 mydata.subset - mydata[1:4] # the subset now has only 2 unique values, but the output # still lists all five factor levels unique(mydata.subset) [1] 1 2 Levels: 1 2 3 4 5 # try drop=TRUE as an option to subset mydata.subset - mydata[1:4, drop=TRUE] unique(mydata.subset) [1] 1 2 Levels: 1 2 Alternatively, if this is the problem and you don't need those data to be factors, you could always convert them to a more appropriate form. Sarah On 1/25/07, lalitha viswanath [EMAIL PROTECTED] wrote: Hi I am new to R programming and am using subset to extract part of a data as follows names(dataset) = c(genome1,genome2,dist,score); prunedrelatives - subset(dataset, score -5); However when I use unique to find the number of unique genomes now present in prunedrelatives I get results identical to calling unique(dataset$genome1) although subset has eliminated many genomes and records. I would greatly appreciate your input about using unique correctly in this regard. Thanks Lalitha -- Sarah Goslee http://www.functionaldiversity.org __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] unique/subset problem
Then you need to provide more details about the calls you made and your dataset. For example, you can tell us by str(prunedrelatives, 1) how did you call unique on prunedrelative and so on? I made a test data it gave me what you wanted (omitted here). On 1/26/07, lalitha viswanath [EMAIL PROTECTED] wrote: Hi The pruned dataset has 8 unique genomes in it while the dataset before pruning has 65 unique genomes in it. However calling unique on the pruned dataset seems to return 65 no matter what. Any assistance in this matter would be appreciated. Thanks Lalitha --- Weiwei Shi [EMAIL PROTECTED] wrote: Hi, Even you removed many genomes1 by setting score -5; it is not necessary saying you changed the uniqueness. To check this, you can do like p0 - unique(dataset[dataset$score -5, genome1]) # same as subset p1 - unique(dataset[dataset$score= -5, genome1]) setdiff(p1, p0) if the output above has NULL, then it means even though you remove many genomes1, but it does not help changing the uniqueness. HTH, weiwei On 1/25/07, lalitha viswanath [EMAIL PROTECTED] wrote: Hi I am new to R programming and am using subset to extract part of a data as follows names(dataset) = c(genome1,genome2,dist,score); prunedrelatives - subset(dataset, score -5); However when I use unique to find the number of unique genomes now present in prunedrelatives I get results identical to calling unique(dataset$genome1) although subset has eliminated many genomes and records. I would greatly appreciate your input about using unique correctly in this regard. Thanks Lalitha TV dinner still cooling? Check out Tonight's Picks on Yahoo! TV. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Weiwei Shi, Ph.D Research Scientist GeneGO, Inc. Did you always know? No, I did not. But I believed... ---Matrix III Bored stiff? Loosen up... Download and play hundreds of games for free on Yahoo! Games. http://games.yahoo.com/games/front -- Weiwei Shi, Ph.D Research Scientist GeneGO, Inc. Did you always know? No, I did not. But I believed... ---Matrix III __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] unique/subset problem
Hi I read in my dataset using dt read.table(filename) calling unique(levels(dt$genome1)) yields the following aero aful aquae atum_D bbur bhal bmel bsub [9] buch cace ccre cglu cjej cper cpneuAcpneuC [17] cpneuJctraM ecoliO157 hbsp hinf hpyl linn llact [25] lmon mgen mjan mlep mlot mpneu mpul mthe [33] mtub mtub_cdc nost pabyssi paer paero pmul pyro [41] rcon rpxx saur_mu50 saur_n315 sent smel spneu spyo [49] ssol stok styp synecho tacid tmar tpal tvol [57] uure vcho xfas ypes It shows 60 genomes, which is correct. I extracted a subset as follows possible_relatives_subset - subset(dt, Y -5) I am pasting the results below genome1 genome2 parameterX Y 21 sent ecoliO157 0.00590 -200.633493 22 sent paer 0.18603 -100.200570 27 styp ecoliO157 0.00484 -240.708645 28 styp paer 0.18497 -30.250127 41 paer sent 0.18603 -60.200570 44 paer styp 0.18497 -80.250127 49 paer hinf 0.18913 -90.056333 53 paer vcho 0.18703 -10.153929 55 paer pmul 0.18587 -100.208042 67 paer buch 0.21485 -80.898667 70 paer ypes 0.18460 -107.267454 82 paer xfas 0.26268 -61.920552 95 hinf ecoliO157 0.07654 -163.018417 96 hinf paer 0.18913 -10.056333 103 vcho ecoliO157 0.09518 -140.921153 104 vcho paer 0.18703 -10.153929 107 pmul ecoliO157 0.07328 -165.215225 108 pmul paer 0.18587 -10.208042 131 buch ecoliO157 0.15412 -11.746939 132 buch paer 0.21485 -8.898667 137 ypes ecoliO157 0.02705 -19.171851 138 ypes paer 0.18460 -10.267454 171 ecoliO157 sent 0.00590 -20.633493 174 ecoliO157 styp 0.00484 -20.708645 179 ecoliO157 hinf 0.07654 -6.018417 183 ecoliO157 vcho 0.09518 -14.921153 185 ecoliO157 pmul 0.07328 -6.215225 197 ecoliO157 buch 0.15412 -11.746939 200 ecoliO157 ypes 0.02705 -9.171851 211 ecoliO157 xfas 0.25833 -71.091552 217 xfas ecoliO157 0.25833 -75.091552 218 xfas paer 0.26268 -64.920552 I think even a cursory look will tell us that there are not as many unique genomes in the subset results. (around 8/10). However when I do unique(levels(possible_relatives_subset$genome1)), I get [1] aero aful aquae atum_D bbur bhal bmel bsub [9] buch cace ccre cglu cjej cper cpneuAcpneuC [17] cpneuJctraM ecoliO157 hbsp hinf hpyl linn llact [25] lmon mgen mjan mlep mlot mpneu mpul mthe [33] mtub mtub_cdc nost pabyssi paer paero pmul pyro [41] rcon rpxx saur_mu50 saur_n315 sent smel spneu spyo [49] ssol stok styp synecho tacid tmar tpal tvol [57] uure vcho xfas ypes Where am I going wrong? I tried calling unique without the levels too, which gives me the following response [1] sent styp paer hinf vcho pmul buch ypes ecoliO157 xfas 60 Levels: aero aful aquae atum_D bbur bhal bmel bsub buch cace ccre cglu cjej cper cpneuA ... ypes --- Weiwei Shi [EMAIL PROTECTED] wrote: Then you need to provide more details about the calls you made and your dataset. For example, you can tell us by str(prunedrelatives, 1) how did you call unique on prunedrelative and so on? I made a test data it gave me what you wanted (omitted here). On 1/26/07, lalitha viswanath [EMAIL PROTECTED] wrote: Hi The pruned dataset has 8 unique genomes in it while the dataset before pruning has 65 unique genomes in it. However calling unique on the pruned dataset seems to return 65 no matter what. Any assistance in this matter would be appreciated. Thanks Lalitha --- Weiwei Shi [EMAIL PROTECTED] wrote: Hi, Even you removed many genomes1 by setting score -5; it is not necessary saying you changed the uniqueness. To check this, you can do like p0 - unique(dataset[dataset$score -5, genome1]) # same as subset p1 - unique(dataset[dataset$score= -5, genome1]) setdiff(p1, p0) if the output above has NULL, then it means even though you remove many genomes1, but it does not help changing the uniqueness. HTH, weiwei On 1/25/07, lalitha viswanath [EMAIL PROTECTED] wrote: Hi I am new to R programming and am using subset to extract part of a data as follows names(dataset) = c(genome1,genome2,dist,score); prunedrelatives - subset(dataset, score -5); However when I use unique to
Re: [R] unique/subset problem
check ?read.table and add as.is=T in the option. So you read string as character now and avoid the factor things. Then repeat your work. For example x0 - read.table(~/Documents/tox/noodles/four_sheets_orig/reg_r2.txt, sep=\t, nrows=10) str(x0,1) `data.frame': 10 obs. of 7 variables: $ V1: Factor w/ 10 levels -4086733916,..: 10 9 8 7 6 5 4 3 2 1 $ V2: Factor w/ 10 levels -1963744741,..: 10 8 7 4 5 6 3 9 1 2 $ V3: Factor w/ 7 levels -1687428658,..: 7 4 4 2 5 1 6 6 3 4 $ V4: Factor w/ 2 levels 5,MECHANISM: 2 1 1 1 1 1 1 1 1 1 $ V5: Factor w/ 2 levels 0,TYPE: 2 1 1 1 1 1 1 1 1 1 $ V6: Factor w/ 2 levels USER_,alexey: 1 2 2 2 2 2 2 2 2 2 $ V7: Factor w/ 2 levels 3,TRUST: 2 1 1 1 1 1 1 1 1 1 x0 - read.table(~/Documents/tox/noodles/four_sheets_orig/reg_r2.txt, sep=\t, nrows=10, as.is=T) str(x0,1) `data.frame': 10 obs. of 7 variables: $ V1: chr LINK_ID -4293537751 -4247422653 -4223137153 ... $ V2: chr ID1 65259 1020286 -518245428 ... $ V3: chr ID2 6436 6436 -2099509019 ... $ V4: chr MECHANISM 5 5 5 ... $ V5: chr TYPE 0 0 0 ... $ V6: chr USER_ alexey alexey alexey ... $ V7: chr TRUST 3 3 3 ... HTH, weiwei On 1/26/07, lalitha viswanath [EMAIL PROTECTED] wrote: Hi I read in my dataset using dt read.table(filename) calling unique(levels(dt$genome1)) yields the following aero aful aquae atum_D bbur bhal bmel bsub [9] buch cace ccre cglu cjej cper cpneuAcpneuC [17] cpneuJctraM ecoliO157 hbsp hinf hpyl linn llact [25] lmon mgen mjan mlep mlot mpneu mpul mthe [33] mtub mtub_cdc nost pabyssi paer paero pmul pyro [41] rcon rpxx saur_mu50 saur_n315 sent smel spneu spyo [49] ssol stok styp synecho tacid tmar tpal tvol [57] uure vcho xfas ypes It shows 60 genomes, which is correct. I extracted a subset as follows possible_relatives_subset - subset(dt, Y -5) I am pasting the results below genome1 genome2 parameterX Y 21 sent ecoliO157 0.00590 -200.633493 22 sent paer 0.18603 -100.200570 27 styp ecoliO157 0.00484 -240.708645 28 styp paer 0.18497 -30.250127 41 paer sent 0.18603 -60.200570 44 paer styp 0.18497 -80.250127 49 paer hinf 0.18913 -90.056333 53 paer vcho 0.18703 -10.153929 55 paer pmul 0.18587 -100.208042 67 paer buch 0.21485 -80.898667 70 paer ypes 0.18460 -107.267454 82 paer xfas 0.26268 -61.920552 95 hinf ecoliO157 0.07654 -163.018417 96 hinf paer 0.18913 -10.056333 103 vcho ecoliO157 0.09518 -140.921153 104 vcho paer 0.18703 -10.153929 107 pmul ecoliO157 0.07328 -165.215225 108 pmul paer 0.18587 -10.208042 131 buch ecoliO157 0.15412 -11.746939 132 buch paer 0.21485 -8.898667 137 ypes ecoliO157 0.02705 -19.171851 138 ypes paer 0.18460 -10.267454 171 ecoliO157 sent 0.00590 -20.633493 174 ecoliO157 styp 0.00484 -20.708645 179 ecoliO157 hinf 0.07654 -6.018417 183 ecoliO157 vcho 0.09518 -14.921153 185 ecoliO157 pmul 0.07328 -6.215225 197 ecoliO157 buch 0.15412 -11.746939 200 ecoliO157 ypes 0.02705 -9.171851 211 ecoliO157 xfas 0.25833 -71.091552 217 xfas ecoliO157 0.25833 -75.091552 218 xfas paer 0.26268 -64.920552 I think even a cursory look will tell us that there are not as many unique genomes in the subset results. (around 8/10). However when I do unique(levels(possible_relatives_subset$genome1)), I get [1] aero aful aquae atum_D bbur bhal bmel bsub [9] buch cace ccre cglu cjej cper cpneuAcpneuC [17] cpneuJctraM ecoliO157 hbsp hinf hpyl linn llact [25] lmon mgen mjan mlep mlot mpneu mpul mthe [33] mtub mtub_cdc nost pabyssi paer paero pmul pyro [41] rcon rpxx saur_mu50 saur_n315 sent smel spneu spyo [49] ssol stok styp synecho tacid tmar tpal tvol [57] uure vcho xfas ypes Where am I going wrong? I tried calling unique without the levels too, which gives me the following response [1] sent styp paer hinf vcho pmul buch ypes ecoliO157 xfas 60 Levels: aero aful aquae atum_D bbur bhal bmel bsub buch cace ccre cglu cjej cper cpneuA ... ypes --- Weiwei Shi [EMAIL PROTECTED] wrote: Then you need to provide more details about the calls you made and your dataset. For example, you can tell us by str(prunedrelatives, 1) how did you call unique on prunedrelative and so on? I made a test data it gave me what you wanted (omitted here). On
Re: [R] unique/subset problem
oh, i forgot, you can also convert factor into string like dataset$genome1 - as.character(dataset$genome1) so you don't have to use as.numeric(dataset$score) if you use as.is=T when you read.table HTH, weiwei On 1/26/07, Weiwei Shi [EMAIL PROTECTED] wrote: check ?read.table and add as.is=T in the option. So you read string as character now and avoid the factor things. Then repeat your work. For example x0 - read.table(~/Documents/tox/noodles/four_sheets_orig/reg_r2.txt, sep=\t, nrows=10) str(x0,1) `data.frame': 10 obs. of 7 variables: $ V1: Factor w/ 10 levels -4086733916,..: 10 9 8 7 6 5 4 3 2 1 $ V2: Factor w/ 10 levels -1963744741,..: 10 8 7 4 5 6 3 9 1 2 $ V3: Factor w/ 7 levels -1687428658,..: 7 4 4 2 5 1 6 6 3 4 $ V4: Factor w/ 2 levels 5,MECHANISM: 2 1 1 1 1 1 1 1 1 1 $ V5: Factor w/ 2 levels 0,TYPE: 2 1 1 1 1 1 1 1 1 1 $ V6: Factor w/ 2 levels USER_,alexey: 1 2 2 2 2 2 2 2 2 2 $ V7: Factor w/ 2 levels 3,TRUST: 2 1 1 1 1 1 1 1 1 1 x0 - read.table(~/Documents/tox/noodles/four_sheets_orig/reg_r2.txt, sep=\t, nrows=10, as.is=T) str(x0,1) `data.frame': 10 obs. of 7 variables: $ V1: chr LINK_ID -4293537751 -4247422653 -4223137153 ... $ V2: chr ID1 65259 1020286 -518245428 ... $ V3: chr ID2 6436 6436 -2099509019 ... $ V4: chr MECHANISM 5 5 5 ... $ V5: chr TYPE 0 0 0 ... $ V6: chr USER_ alexey alexey alexey ... $ V7: chr TRUST 3 3 3 ... HTH, weiwei On 1/26/07, lalitha viswanath [EMAIL PROTECTED] wrote: Hi I read in my dataset using dt read.table(filename) calling unique(levels(dt$genome1)) yields the following aero aful aquae atum_D bbur bhal bmel bsub [9] buch cace ccre cglu cjej cper cpneuAcpneuC [17] cpneuJctraM ecoliO157 hbsp hinf hpyl linn llact [25] lmon mgen mjan mlep mlot mpneu mpul mthe [33] mtub mtub_cdc nost pabyssi paer paero pmul pyro [41] rcon rpxx saur_mu50 saur_n315 sent smel spneu spyo [49] ssol stok styp synecho tacid tmar tpal tvol [57] uure vcho xfas ypes It shows 60 genomes, which is correct. I extracted a subset as follows possible_relatives_subset - subset(dt, Y -5) I am pasting the results below genome1 genome2 parameterX Y 21 sent ecoliO157 0.00590 -200.633493 22 sent paer 0.18603 -100.200570 27 styp ecoliO157 0.00484 -240.708645 28 styp paer 0.18497 -30.250127 41 paer sent 0.18603 -60.200570 44 paer styp 0.18497 -80.250127 49 paer hinf 0.18913 -90.056333 53 paer vcho 0.18703 -10.153929 55 paer pmul 0.18587 -100.208042 67 paer buch 0.21485 -80.898667 70 paer ypes 0.18460 -107.267454 82 paer xfas 0.26268 -61.920552 95 hinf ecoliO157 0.07654 -163.018417 96 hinf paer 0.18913 -10.056333 103 vcho ecoliO157 0.09518 -140.921153 104 vcho paer 0.18703 -10.153929 107 pmul ecoliO157 0.07328 -165.215225 108 pmul paer 0.18587 -10.208042 131 buch ecoliO157 0.15412 -11.746939 132 buch paer 0.21485 -8.898667 137 ypes ecoliO157 0.02705 -19.171851 138 ypes paer 0.18460 -10.267454 171 ecoliO157 sent 0.00590 -20.633493 174 ecoliO157 styp 0.00484 -20.708645 179 ecoliO157 hinf 0.07654 -6.018417 183 ecoliO157 vcho 0.09518 -14.921153 185 ecoliO157 pmul 0.07328 -6.215225 197 ecoliO157 buch 0.15412 -11.746939 200 ecoliO157 ypes 0.02705 -9.171851 211 ecoliO157 xfas 0.25833 -71.091552 217 xfas ecoliO157 0.25833 -75.091552 218 xfas paer 0.26268 -64.920552 I think even a cursory look will tell us that there are not as many unique genomes in the subset results. (around 8/10). However when I do unique(levels(possible_relatives_subset$genome1)), I get [1] aero aful aquae atum_D bbur bhal bmel bsub [9] buch cace ccre cglu cjej cper cpneuAcpneuC [17] cpneuJctraM ecoliO157 hbsp hinf hpyl linn llact [25] lmon mgen mjan mlep mlot mpneu mpul mthe [33] mtub mtub_cdc nost pabyssi paer paero pmul pyro [41] rcon rpxx saur_mu50 saur_n315 sent smel spneu spyo [49] ssol stok styp synecho tacid tmar tpal tvol [57] uure vcho xfas ypes Where am I going wrong? I tried calling unique without the levels too, which gives me the following response [1] sent styp paer hinf vcho pmul buch ypes ecoliO157 xfas 60 Levels: aero aful aquae atum_D
[R] unique/subset problem
Hi I am new to R programming and am using subset to extract part of a data as follows names(dataset) = c(genome1,genome2,dist,score); prunedrelatives - subset(dataset, score -5); However when I use unique to find the number of unique genomes now present in prunedrelatives I get results identical to calling unique(dataset$genome1) although subset has eliminated many genomes and records. I would greatly appreciate your input about using unique correctly in this regard. Thanks Lalitha TV dinner still cooling? Check out Tonight's Picks on Yahoo! TV. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] unique/subset problem
Hi, Even you removed many genomes1 by setting score -5; it is not necessary saying you changed the uniqueness. To check this, you can do like p0 - unique(dataset[dataset$score -5, genome1]) # same as subset p1 - unique(dataset[dataset$score= -5, genome1]) setdiff(p1, p0) if the output above has NULL, then it means even though you remove many genomes1, but it does not help changing the uniqueness. HTH, weiwei On 1/25/07, lalitha viswanath [EMAIL PROTECTED] wrote: Hi I am new to R programming and am using subset to extract part of a data as follows names(dataset) = c(genome1,genome2,dist,score); prunedrelatives - subset(dataset, score -5); However when I use unique to find the number of unique genomes now present in prunedrelatives I get results identical to calling unique(dataset$genome1) although subset has eliminated many genomes and records. I would greatly appreciate your input about using unique correctly in this regard. Thanks Lalitha TV dinner still cooling? Check out Tonight's Picks on Yahoo! TV. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Weiwei Shi, Ph.D Research Scientist GeneGO, Inc. Did you always know? No, I did not. But I believed... ---Matrix III __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] unique columns of a matrix
Dear all, I have a matrix of repeating columns in R, for example a matrix X is [,1] [,2] [,3] [,4] [1,] 1 1 1 1 [2,] 1 1 2 2 I want to store unique columns of the matrix X in a new matrix Y. Therefore, Y will be [,1] [,2] [1,] 1 1 [2,] 1 2 It will be really appreciated if you can provide me some function for this job. Thanks for your time and effort in advance, Roman -- -- Roman Akhter Ahmed (Ph.D. Candidate) Department of Econometrics and Business Statistics Room 659, Building 11 (East Wing), Clayton Campus Monash University, Victoria 3800, Australia Ph.: +61 3 9905 8346 (W), +61 3 9543 1958 (R) Web: http://www.buseco.monash.edu.au/staff/profile.php?uid=rahmed -- __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] unique columns of a matrix
Dear Roman, You can use unique(X, MARGIN=2). See ?unique for details. I hope this helps, John John Fox Department of Sociology McMaster University Hamilton, Ontario Canada L8S 4M4 905-525-9140x23604 http://socserv.mcmaster.ca/jfox -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Roman Akhter Ahmed Sent: Monday, December 18, 2006 8:42 PM To: r-help@stat.math.ethz.ch Subject: [R] unique columns of a matrix Dear all, I have a matrix of repeating columns in R, for example a matrix X is [,1] [,2] [,3] [,4] [1,] 1 1 1 1 [2,] 1 1 2 2 I want to store unique columns of the matrix X in a new matrix Y. Therefore, Y will be [,1] [,2] [1,] 1 1 [2,] 1 2 It will be really appreciated if you can provide me some function for this job. Thanks for your time and effort in advance, Roman -- -- Roman Akhter Ahmed (Ph.D. Candidate) Department of Econometrics and Business Statistics Room 659, Building 11 (East Wing), Clayton Campus Monash University, Victoria 3800, Australia Ph.: +61 3 9905 8346 (W), +61 3 9543 1958 (R) Web: http://www.buseco.monash.edu.au/staff/profile.php?uid=rahmed -- __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] unique sets of factors
All: I have a matrix, X, with a LARGE number of rows. Consider the following three rows of that matrix: 1 1 1 1 2 2 3 3 1 1 1 1 3 3 2 2 3 3 2 2 1 1 1 1 I wish to fit many one-way ANOVAs to some response variable using each row as a set of factors. For example, for each row above I will do something like anova(lm(Y~as.factor(X[1,]))). My problem is that in the above example, I do not want to fit models for both rows 1 and 2 as they are essentially duplicates in terms of the ANOVA model. Clearly row 3, although it has the same number of 1's, 2's, and 3's, is a different model. Is there some computationally efficient way to remove such factor duplicates from my large matrix? I have been banging my head against the wall all morning. Thanks!! Tony -- ### Tony Long Ecology and Evolutionary Biology Steinhaus Hall University of California at Irvine Irvine, CA 92697-2525 Tel: (949) 824-2562 (office) Tel: (949) 824-5994 (lab) Fax: (949) 824-2181 email: [EMAIL PROTECTED] http://hjmuller.bio.uci.edu/~labhome/ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] unique sets of factors
If DF is a data frame containing the rows then: unique(t(apply(DF, 1, function(x) as.numeric(factor(x, levels = unique(x)) On 10/19/06, Tony Long [EMAIL PROTECTED] wrote: All: I have a matrix, X, with a LARGE number of rows. Consider the following three rows of that matrix: 1 1 1 1 2 2 3 3 1 1 1 1 3 3 2 2 3 3 2 2 1 1 1 1 I wish to fit many one-way ANOVAs to some response variable using each row as a set of factors. For example, for each row above I will do something like anova(lm(Y~as.factor(X[1,]))). My problem is that in the above example, I do not want to fit models for both rows 1 and 2 as they are essentially duplicates in terms of the ANOVA model. Clearly row 3, although it has the same number of 1's, 2's, and 3's, is a different model. Is there some computationally efficient way to remove such factor duplicates from my large matrix? I have been banging my head against the wall all morning. Thanks!! Tony -- ### Tony Long Ecology and Evolutionary Biology Steinhaus Hall University of California at Irvine Irvine, CA 92697-2525 Tel: (949) 824-2562 (office) Tel: (949) 824-5994 (lab) Fax: (949) 824-2181 email: [EMAIL PROTECTED] http://hjmuller.bio.uci.edu/~labhome/ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] unique sets of factors
Or since that messes up the values: u - unique(t(apply(DF, 1, function(x) as.numeric(factor(x, levels = unique(x)) DF[rownames(u), ] On 10/19/06, Gabor Grothendieck [EMAIL PROTECTED] wrote: If DF is a data frame containing the rows then: unique(t(apply(DF, 1, function(x) as.numeric(factor(x, levels = unique(x)) On 10/19/06, Tony Long [EMAIL PROTECTED] wrote: All: I have a matrix, X, with a LARGE number of rows. Consider the following three rows of that matrix: 1 1 1 1 2 2 3 3 1 1 1 1 3 3 2 2 3 3 2 2 1 1 1 1 I wish to fit many one-way ANOVAs to some response variable using each row as a set of factors. For example, for each row above I will do something like anova(lm(Y~as.factor(X[1,]))). My problem is that in the above example, I do not want to fit models for both rows 1 and 2 as they are essentially duplicates in terms of the ANOVA model. Clearly row 3, although it has the same number of 1's, 2's, and 3's, is a different model. Is there some computationally efficient way to remove such factor duplicates from my large matrix? I have been banging my head against the wall all morning. Thanks!! Tony -- ### Tony Long Ecology and Evolutionary Biology Steinhaus Hall University of California at Irvine Irvine, CA 92697-2525 Tel: (949) 824-2562 (office) Tel: (949) 824-5994 (lab) Fax: (949) 824-2181 email: [EMAIL PROTECTED] http://hjmuller.bio.uci.edu/~labhome/ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Unique rows
hello all, I have a dataset where the subjects are duplicated. How do I subset such that I can get only I row/subject. aa-c(1,1,2,2,3,3,4,4,5,5,6,6) bb-c(56,56,33,33,53,53,20,20,63,63,9,9) cc-data.frame(aa,bb) I would like to subset df(cc) such that I can get aa bb 1 56 2 33 3 53 4 20 5 63 6 9 I know this should be fairly easy but I can't figure how to do it in a dataframe and keep all my columns Thanks __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Unique rows
if you want the first row for the unique 'aa' entries, try the following: cc[!duplicated(cc$aa), ] I hope it helps. Best, Dimitris Dimitris Rizopoulos Ph.D. Student Biostatistical Centre School of Public Health Catholic University of Leuven Address: Kapucijnenvoer 35, Leuven, Belgium Tel: +32/(0)16/336899 Fax: +32/(0)16/337015 Web: http://med.kuleuven.be/biostat/ http://www.student.kuleuven.be/~m0390867/dimitris.htm Quoting Lanre Okusanya [EMAIL PROTECTED]: hello all, I have a dataset where the subjects are duplicated. How do I subset such that I can get only I row/subject. aa-c(1,1,2,2,3,3,4,4,5,5,6,6) bb-c(56,56,33,33,53,53,20,20,63,63,9,9) cc-data.frame(aa,bb) I would like to subset df(cc) such that I can get aa bb 1 56 2 33 3 53 4 20 5 63 6 9 I know this should be fairly easy but I can't figure how to do it in a dataframe and keep all my columns Thanks __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Unique rows
cc[!duplicated(cc$bb),] aa bb 1 1 56 3 2 33 5 3 53 7 4 20 9 5 63 11 6 9 On 8/9/06, Lanre Okusanya [EMAIL PROTECTED] wrote: hello all, I have a dataset where the subjects are duplicated. How do I subset such that I can get only I row/subject. aa-c(1,1,2,2,3,3,4,4,5,5,6,6) bb-c(56,56,33,33,53,53,20,20,63,63,9,9) cc-data.frame(aa,bb) I would like to subset df(cc) such that I can get aa bb 1 56 2 33 3 53 4 20 5 63 6 9 I know this should be fairly easy but I can't figure how to do it in a dataframe and keep all my columns Thanks __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve? [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Unique rows
Thanks. I tried that, however for some reason, it still left some duplicates On 8/9/06, Gary Collins [EMAIL PROTECTED] wrote: try unique(cc) aa bb 1 1 56 3 2 33 5 3 53 7 4 20 9 5 63 11 6 9 HTH Gary On 09/08/06, Lanre Okusanya [EMAIL PROTECTED] wrote: hello all, I have a dataset where the subjects are duplicated. How do I subset such that I can get only I row/subject. aa-c(1,1,2,2,3,3,4,4,5,5,6,6) bb-c(56,56,33,33,53,53,20,20,63,63,9,9) cc-data.frame(aa,bb) I would like to subset df(cc) such that I can get aa bb 1 56 2 33 3 53 4 20 5 63 6 9 I know this should be fairly easy but I can't figure how to do it in a dataframe and keep all my columns Thanks __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Unique rows
Thanks! That worked well On 8/9/06, Dimitrios Rizopoulos [EMAIL PROTECTED] wrote: if you want the first row for the unique 'aa' entries, try the following: cc[!duplicated(cc$aa), ] I hope it helps. Best, Dimitris Dimitris Rizopoulos Ph.D. Student Biostatistical Centre School of Public Health Catholic University of Leuven Address: Kapucijnenvoer 35, Leuven, Belgium Tel: +32/(0)16/336899 Fax: +32/(0)16/337015 Web: http://med.kuleuven.be/biostat/ http://www.student.kuleuven.be/~m0390867/dimitris.htm Quoting Lanre Okusanya [EMAIL PROTECTED]: hello all, I have a dataset where the subjects are duplicated. How do I subset such that I can get only I row/subject. aa-c(1,1,2,2,3,3,4,4,5,5,6,6) bb-c(56,56,33,33,53,53,20,20,63,63,9,9) cc-data.frame(aa,bb) I would like to subset df(cc) such that I can get aa bb 1 56 2 33 3 53 4 20 5 63 6 9 I know this should be fairly easy but I can't figure how to do it in a dataframe and keep all my columns Thanks __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] unique, but keep LAST occurence
?unique says Value: An object of the same type of 'x'. but if an element is equal to one with a smaller index, it is removed. However, I need to keep the one with the LARGEST index. Can someone please show me the light? I thought about reversing the row order twice, but I couldn't get it to work right (My data frame has 125000 rows and 7 columns, and I'm 'uniqueing' on column #1 (chron) only, although the class of the column may not matter.) Say, e.g., DF - data.frame(t = c(1,2,3,1,4,5,1,2,3), x = c(0,1,2,3,4,5,6,7,8)) I would like the result to be (sorted as well) t x 1 6 2 7 3 8 4 4 5 5 If I got the original rownames, that would be a bonus (for debugging.) R.version _ platform i386-pc-mingw32 arch i386 os mingw32 system i386, mingw32 status major 2 minor 3.1 year 2006 month 06 day01 svn rev38247 language R version.string Version 2.3.1 (2006-06-01) Thanks for any hints! David David L. Reiner Rho Trading Securities, LLC Chicago IL 60605 312-362-4963 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] unique, but keep LAST occurence
On Mon, 2006-07-24 at 12:00 -0500, [EMAIL PROTECTED] wrote: ?unique says Value: An object of the same type of 'x'. but if an element is equal to one with a smaller index, it is removed. However, I need to keep the one with the LARGEST index. Can someone please show me the light? I thought about reversing the row order twice, but I couldn't get it to work right (My data frame has 125000 rows and 7 columns, and I'm 'uniqueing' on column #1 (chron) only, although the class of the column may not matter.) Say, e.g., DF - data.frame(t = c(1,2,3,1,4,5,1,2,3), x = c(0,1,2,3,4,5,6,7,8)) I would like the result to be (sorted as well) t x 1 6 2 7 3 8 4 4 5 5 If I got the original rownames, that would be a bonus (for debugging.) Does this get it? DF[sapply(unique(DF$t), function(x) max(which(DF$t == x))), ] t x 7 1 6 8 2 7 9 3 8 5 4 4 6 5 5 HTH, Marc Schwartz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] unique, but keep LAST occurence
Try: largestDF - DF[nrow(DF)- which(!duplicated(rev(DF$t)))+1,] You can then sort this however you like in the usual way. Row names will be preserved. -- Bert Gunter Genentech Non-Clinical Statistics South San Francisco, CA The business of the statistician is to catalyze the scientific learning process. - George E. P. Box -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of [EMAIL PROTECTED] Sent: Monday, July 24, 2006 10:00 AM To: r-help@stat.math.ethz.ch Subject: [R] unique, but keep LAST occurence ?unique says Value: An object of the same type of 'x'. but if an element is equal to one with a smaller index, it is removed. However, I need to keep the one with the LARGEST index. Can someone please show me the light? I thought about reversing the row order twice, but I couldn't get it to work right (My data frame has 125000 rows and 7 columns, and I'm 'uniqueing' on column #1 (chron) only, although the class of the column may not matter.) Say, e.g., DF - data.frame(t = c(1,2,3,1,4,5,1,2,3), x = c(0,1,2,3,4,5,6,7,8)) I would like the result to be (sorted as well) t x 1 6 2 7 3 8 4 4 5 5 If I got the original rownames, that would be a bonus (for debugging.) R.version _ platform i386-pc-mingw32 arch i386 os mingw32 system i386, mingw32 status major 2 minor 3.1 year 2006 month 06 day01 svn rev38247 language R version.string Version 2.3.1 (2006-06-01) Thanks for any hints! David David L. Reiner Rho Trading Securities, LLC Chicago IL 60605 312-362-4963 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] unique, but keep LAST occurence
Thank you, Bert and Mark. I believe Mark's solution works, but it was taking a very long time. Bert's is very fast. My day is saved! David L. Reiner Rho Trading Securities, LLC Chicago IL 60605 312-362-4963 -Original Message- From: Berton Gunter [mailto:[EMAIL PROTECTED] Sent: Monday, July 24, 2006 12:51 PM To: David Reiner [EMAIL PROTECTED]; r-help@stat.math.ethz.ch Subject: RE: [R] unique, but keep LAST occurence Try: largestDF - DF[nrow(DF)- which(!duplicated(rev(DF$t)))+1,] You can then sort this however you like in the usual way. Row names will be preserved. -- Bert Gunter Genentech Non-Clinical Statistics South San Francisco, CA The business of the statistician is to catalyze the scientific learning process. - George E. P. Box -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of [EMAIL PROTECTED] Sent: Monday, July 24, 2006 10:00 AM To: r-help@stat.math.ethz.ch Subject: [R] unique, but keep LAST occurence ?unique says Value: An object of the same type of 'x'. but if an element is equal to one with a smaller index, it is removed. However, I need to keep the one with the LARGEST index. Can someone please show me the light? I thought about reversing the row order twice, but I couldn't get it to work right (My data frame has 125000 rows and 7 columns, and I'm 'uniqueing' on column #1 (chron) only, although the class of the column may not matter.) Say, e.g., DF - data.frame(t = c(1,2,3,1,4,5,1,2,3), x = c(0,1,2,3,4,5,6,7,8)) I would like the result to be (sorted as well) t x 1 6 2 7 3 8 4 4 5 5 If I got the original rownames, that would be a bonus (for debugging.) R.version _ platform i386-pc-mingw32 arch i386 os mingw32 system i386, mingw32 status major 2 minor 3.1 year 2006 month 06 day01 svn rev38247 language R version.string Version 2.3.1 (2006-06-01) Thanks for any hints! David David L. Reiner Rho Trading Securities, LLC Chicago IL 60605 312-362-4963 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] unique, but keep LAST occurence
I have a question about deparse function in R What is the reason that deparse use an argument like width.cutoff ? Why the maximum cutoff is 500? I was manipulating an R formula and used deparse. Since the length of user's formula was greater then 500, my code didnt work. thanks Johan - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] unique deletes names - intended?
On Tue, 4 Jul 2006, Heinz Tuechler wrote: Dear All, as shown in the example, unique() deletes names of vector elements. Is this intended? Yes. Think of the vector as a set: it is supposed to immaterial which of the duplicated elements is retained. The help page says An object of the same type of 'x'. but if an element is equal to one with a smaller index, it is removed. so it is starting with a new object, not 'x'. However, the array method works differently, so the documentation needs clarification. Of course, one can use indexing by !duplicated() instead. Be careful, as you might get a method for [ and that might not do want you intended (e.g. for a time series). Greetings, Heinz ## unique deletes names v1 - c(a=1, b=2, c=3, e=2, a=4) unique(v1) # names deleted v1[!duplicated(v1)] # names preserved platform i386-pc-mingw32 arch i386 os mingw32 system i386, mingw32 status Patched major 2 minor 3.1 year 2006 month 07 day01 svn rev38471 language R version.string Version 2.3.1 Patched (2006-07-01 r38471) __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] unique deletes names - intended?
Dear All, as shown in the example, unique() deletes names of vector elements. Is this intended? Of course, one can use indexing by !duplicated() instead. Greetings, Heinz ## unique deletes names v1 - c(a=1, b=2, c=3, e=2, a=4) unique(v1) # names deleted v1[!duplicated(v1)] # names preserved platform i386-pc-mingw32 arch i386 os mingw32 system i386, mingw32 status Patched major 2 minor 3.1 year 2006 month 07 day01 svn rev38471 language R version.string Version 2.3.1 Patched (2006-07-01 r38471) __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Unique?
Hi Cameron You need to be more specific when you ask a question so you can get a better answer. Anyhow, when you say that you want to retain all the other variables do you mean that you want to create a new column in the dataset that contains the calculated sum? If that is the case you can use a construction like: set.seed(1) step4-data.frame(TRIPID=rep(c(111,222,333),3),CONVUNIT=rpois(9,40)) result-tapply(step4$CONVUNIT,INDEX=step4$TRIPID,FUN=sum) step4[,SUM]=result[match(step4[,TRIPID],names(result))] step4 TRIPID CONVUNIT Sum 1111 36 122 2222 48 121 3333 48 129 4111 42 122 5222 30 121 6333 43 129 7111 44 122 8222 43 121 9333 38 129 Cheers Francisco From: Guenther, Cameron [EMAIL PROTECTED] To: Francisco J. Zagmutt [EMAIL PROTECTED] Subject: RE: [R] Unique? Date: Thu, 11 May 2006 12:08:31 -0400 It is close but not quite what I want. I need to retain all of the other variables as well. Cameron Guenther, Ph.D. Associate Research Scientist FWC/FWRI, Marine Fisheries Research 100 8th Avenue S.E. St. Petersburg, FL 33701 (727)896-8626 Ext. 4305 [EMAIL PROTECTED] -Original Message- From: Francisco J. Zagmutt [mailto:[EMAIL PROTECTED] Sent: Wednesday, May 10, 2006 6:06 PM To: Guenther, Cameron; r-help@stat.math.ethz.ch Subject: RE: [R] Unique? If you only care about the sum of CONVUNIT by each TRIPID then you can use tapply i.e.: step4-data.frame(TRIPID=rep(c(111,222,333),3),CONVUNIT=rpois(9,40)) result-tapply(step4$CONVUNIT,INDEX=step4$TRIPID,FUN=sum) result 111 222 333 115 107 123 Is this what you wanted to do? I can't think of anything faster than tapply for your problem. I hope this helps Francisco From: Guenther, Cameron [EMAIL PROTECTED] To: r-help@stat.math.ethz.ch Subject: [R] Unique? Date: Wed, 10 May 2006 17:02:33 -0400 Hello, I have sample data set that looks like: YEAR MONTH DAY CONTINUESPL TIMEFISH TIMEUNIT AREACOUNTY DEPTH DEPUNIT GEARTRIPID CONVUNIT 1992 1 26 1 SP0073928 8 H7 25 4 NA 100 02163399054 161 1992 1 26 1 SP0073928 8 H7 25 4 NA 100 02163399054 8 1992 1 26 2 SP0004228 8 H7 25 4 NA 100 02163399054 161 1992 1 26 2 SP0004228 8 H7 25 4 NA 100 02163399054 8 1992 1 25 NA SP0052652 8 H7 25 4 NA 100 02163399057 85 1992 1 26 NA SP0037940 8 H7 25 4 NA 100 02163399058 70 1992 1 27 NA SP0072357 8 H7 25 4 NA 100 02163399059 15 1992 1 27 NA SP0072357 8 H7 25 4 NA 100 02163399059 20 1992 1 27 NA SP0026324 8 H7 25 4 NA 100 02163399060 8 1992 1 28 1 SP0072357 8 H7 25 4 NA 100 02163399062 200 How can I use unique to extract the rows that have repeated tripid's only, not a unique value for each variable but only for TRIPID. I then want to condense the unique values by summing the CONVUNIT for each unique value of TRIPID. I posted a similar question last week and received a sufficient answer of how to do this without using uniqe. The solution below worked just fine on this sample data set but the full data set has 446,000 rows of data and my computer and R simply cannot handle this follwing code on data this large. conds-by(Step4,Step4$TRIPID,function(x) replace(x[1,],CONVUNIT,sum(x$CONVUNIT))) Step5-do.call(rbind,conds) Thank you, Cameron Guenther, Ph.D. Associate Research Scientist FWC/FWRI, Marine Fisheries Research 100 8th Avenue S.E. St. Petersburg, FL 33701 (727)896-8626 Ext. 4305 [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Unique?
Hello, I have sample data set that looks like: YEARMONTH DAY CONTINUESPL TIMEFISH TIMEUNITAREACOUNTY DEPTH DEPUNIT GEARTRIPID CONVUNIT 19921 26 1 SP0073928 8 H 7 25 4 NA 100 02163399054 161 19921 26 1 SP0073928 8 H 7 25 4 NA 100 02163399054 8 19921 26 2 SP0004228 8 H 7 25 4 NA 100 02163399054 161 19921 26 2 SP0004228 8 H 7 25 4 NA 100 02163399054 8 19921 25 NA SP0052652 8 H 7 25 4 NA 100 02163399057 85 19921 26 NA SP0037940 8 H 7 25 4 NA 100 02163399058 70 19921 27 NA SP0072357 8 H 7 25 4 NA 100 02163399059 15 19921 27 NA SP0072357 8 H 7 25 4 NA 100 02163399059 20 19921 27 NA SP0026324 8 H 7 25 4 NA 100 02163399060 8 19921 28 1 SP0072357 8 H 7 25 4 NA 100 02163399062 200 How can I use unique to extract the rows that have repeated tripid's only, not a unique value for each variable but only for TRIPID. I then want to condense the unique values by summing the CONVUNIT for each unique value of TRIPID. I posted a similar question last week and received a sufficient answer of how to do this without using uniqe. The solution below worked just fine on this sample data set but the full data set has 446,000 rows of data and my computer and R simply cannot handle this follwing code on data this large. conds-by(Step4,Step4$TRIPID,function(x) replace(x[1,],CONVUNIT,sum(x$CONVUNIT))) Step5-do.call(rbind,conds) Thank you, Cameron Guenther, Ph.D. Associate Research Scientist FWC/FWRI, Marine Fisheries Research 100 8th Avenue S.E. St. Petersburg, FL 33701 (727)896-8626 Ext. 4305 [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Unique?
On May 10, 2006, at 4:02 PM, Guenther, Cameron wrote: How can I use unique to extract the rows that have repeated tripid's only, not a unique value for each variable but only for TRIPID. I then want to condense the unique values by summing the CONVUNIT for each unique value of TRIPID. Thanks, Cameron, for this question. This type of manipulation would be relatively simple to do in a RDBMS (e.g. MySQL, PostgreSQL, Oracle, etc.) But I'm curious to see how one would do the same in R. So, if folks send you solutions off-list, please do post them back to the list. Regards, - Robert http://www.cwelug.org/downloads Help others get OpenSource software. Distribute FLOSS for Windows, Linux, *BSD, and MacOS X with BitTorrent __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Unique?
If you only care about the sum of CONVUNIT by each TRIPID then you can use tapply i.e.: step4-data.frame(TRIPID=rep(c(111,222,333),3),CONVUNIT=rpois(9,40)) result-tapply(step4$CONVUNIT,INDEX=step4$TRIPID,FUN=sum) result 111 222 333 115 107 123 Is this what you wanted to do? I can't think of anything faster than tapply for your problem. I hope this helps Francisco From: Guenther, Cameron [EMAIL PROTECTED] To: r-help@stat.math.ethz.ch Subject: [R] Unique? Date: Wed, 10 May 2006 17:02:33 -0400 Hello, I have sample data set that looks like: YEAR MONTH DAY CONTINUESPL TIMEFISH TIMEUNIT AREACOUNTY DEPTH DEPUNIT GEARTRIPID CONVUNIT 1992 1 26 1 SP0073928 8 H 7 25 4 NA 100 02163399054161 1992 1 26 1 SP0073928 8 H 7 25 4 NA 100 021633990548 1992 1 26 2 SP0004228 8 H 7 25 4 NA 100 02163399054161 1992 1 26 2 SP0004228 8 H 7 25 4 NA 100 021633990548 1992 1 25 NA SP0052652 8 H 7 25 4 NA 100 0216339905785 1992 1 26 NA SP0037940 8 H 7 25 4 NA 100 0216339905870 1992 1 27 NA SP0072357 8 H 7 25 4 NA 100 0216339905915 1992 1 27 NA SP0072357 8 H 7 25 4 NA 100 0216339905920 1992 1 27 NA SP0026324 8 H 7 25 4 NA 100 021633990608 1992 1 28 1 SP0072357 8 H 7 25 4 NA 100 02163399062200 How can I use unique to extract the rows that have repeated tripid's only, not a unique value for each variable but only for TRIPID. I then want to condense the unique values by summing the CONVUNIT for each unique value of TRIPID. I posted a similar question last week and received a sufficient answer of how to do this without using uniqe. The solution below worked just fine on this sample data set but the full data set has 446,000 rows of data and my computer and R simply cannot handle this follwing code on data this large. conds-by(Step4,Step4$TRIPID,function(x) replace(x[1,],CONVUNIT,sum(x$CONVUNIT))) Step5-do.call(rbind,conds) Thank you, Cameron Guenther, Ph.D. Associate Research Scientist FWC/FWRI, Marine Fisheries Research 100 8th Avenue S.E. St. Petersburg, FL 33701 (727)896-8626 Ext. 4305 [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Unique?
Dear Cameron, This is not with unique, but it gets the job done. Just create a new variable that is the three variables concatenated together. Then, you can just sum by this variable, like the following: mymat - matrix(letters, ncol=3, nrow=260) mymat - as.data.frame(mymat) mymat$dat - rnorm(260) mymat$id - paste(mymat[,1], mymat[,2], mymat[,3]) aggregate(mymat$dat, list(mymat$id), sum) HTH, Dave. On 5/10/06, Robert Citek [EMAIL PROTECTED] wrote: On May 10, 2006, at 4:02 PM, Guenther, Cameron wrote: How can I use unique to extract the rows that have repeated tripid's only, not a unique value for each variable but only for TRIPID. I then want to condense the unique values by summing the CONVUNIT for each unique value of TRIPID. Thanks, Cameron, for this question. This type of manipulation would be relatively simple to do in a RDBMS (e.g. MySQL, PostgreSQL, Oracle, etc.) But I'm curious to see how one would do the same in R. So, if folks send you solutions off-list, please do post them back to the list. Regards, - Robert http://www.cwelug.org/downloads Help others get OpenSource software. Distribute FLOSS for Windows, Linux, *BSD, and MacOS X with BitTorrent __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Dave Armstrong University of Maryland Dept of Government and Politics 3140 Tydings Hall College Park, MD 20742 Office: 2103L Cole Field House Phone: 301-405-9735 e-mail: [EMAIL PROTECTED] web: www.davearmstrong-ps.com Facts are meaningless. You can use facts to prove anything that's even remotely true. - Homer Simpson To this day, philosophers suffer from Plato's disease: the assumption that reality fundamentally consists of abstract essences best described by words or geometry. (In truth, reality is largely a probabilistic affair best described by statistics) - Steve Sailer The Unexpected Uselessness of Philosophy [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Unique arrangements of a vector
Tarmo Remmel wrote: Dear List, Running on a PC (Windows 2000) with 256 MB RAM, Version R1.9.1 This one is quite outdated... I have a relatively simple problem, which I can solve for relatively small datasets, but run into difficulties with larger ones. I believe that my approach is a hack rather than something elegant and I was hoping that somebody on this list might help me improve my code. Basically, given a vector of values (e.g., 0,0,1,1), I want to generate all of the unique arrangements of these values, of which there are 4!/(2!2!) = 6. 0 0 1 1 0 1 0 1 0 1 1 0 1 0 0 1 1 0 1 0 1 1 0 0 Using unique() in conjunction with expand.grid(), and later filtering impossible results, I can obtain the answer. However, this is slow, and does not work for large initial vectors and is difficult to filter when using values beyond 0,1. Is there some mathematically elegant method for doing this? I'd hope to have initial vectors significantly longer than the demonstrated 4 values (e.g., thousands). Nice for length 4, but you will get problems far sooner than for length 1000... please calculate the size before! For thing as short as 4, you might want to try out permutations() in package gtools (formerly in bundle gregmisc, since yesterday a single package). Uwe Ligges Any help is appreciated and I will gladly SUM afterwards. Thank you, Tarmo __ Tarmo Remmel Ph.D. GUESS Lab, Department of Geography University of Toronto at Mississauga Mississauga, Ontario, L5L 1C6 Tel: 905-828-3868 Fax: 905-828-5273 Skype: tarmoremmel http://eratos.erin.utoronto.ca/remmelt __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Unique arrangements of a vector
Dear List, Running on a PC (Windows 2000) with 256 MB RAM, Version R1.9.1 I have a relatively simple problem, which I can solve for relatively small datasets, but run into difficulties with larger ones. I believe that my approach is a hack rather than something elegant and I was hoping that somebody on this list might help me improve my code. Basically, given a vector of values (e.g., 0,0,1,1), I want to generate all of the unique arrangements of these values, of which there are 4!/(2!2!) = 6. 0 0 1 1 0 1 0 1 0 1 1 0 1 0 0 1 1 0 1 0 1 1 0 0 Using unique() in conjunction with expand.grid(), and later filtering impossible results, I can obtain the answer. However, this is slow, and does not work for large initial vectors and is difficult to filter when using values beyond 0,1. Is there some mathematically elegant method for doing this? I'd hope to have initial vectors significantly longer than the demonstrated 4 values (e.g., thousands). Any help is appreciated and I will gladly SUM afterwards. Thank you, Tarmo __ Tarmo Remmel Ph.D. GUESS Lab, Department of Geography University of Toronto at Mississauga Mississauga, Ontario, L5L 1C6 Tel: 905-828-3868 Fax: 905-828-5273 Skype: tarmoremmel http://eratos.erin.utoronto.ca/remmelt __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] unique rows
Dear list, I would like to extract from a matrix all those rows, that are unique. By unique, I don't mean the unique that is accomplished by the function unique(), though... Consider the following example: h [,1] [,2] [1,]44 [2,]14 [3,]41 Now unique(h) returns exactly the same - because 1 4 and 4 1 is not the same for that function. What I would like to see, though, are only the first two rows (or the first and the third, it does not matter). Does anybody know how to do that? Cheers, Dax. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] unique rows
Dear Dax, I'll bet that someone comes up with a better approach, but the following does appear to work: u - unique(t(sapply(as.data.frame(t(h)), sort))) I hope this helps, John John Fox Department of Sociology McMaster University Hamilton, Ontario Canada L8S 4M4 905-525-9140x23604 http://socserv.mcmaster.ca/jfox -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of dax42 Sent: Saturday, January 29, 2005 7:54 AM To: r-help@stat.math.ethz.ch Subject: [R] unique rows Dear list, I would like to extract from a matrix all those rows, that are unique. By unique, I don't mean the unique that is accomplished by the function unique(), though... Consider the following example: h [,1] [,2] [1,]44 [2,]14 [3,]41 Now unique(h) returns exactly the same - because 1 4 and 4 1 is not the same for that function. What I would like to see, though, are only the first two rows (or the first and the third, it does not matter). Does anybody know how to do that? Cheers, Dax. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] unique rows
There may be more efficient ways, but unique(t(apply(h, 1, sort))) does what I think you want. Patrick Burns Burns Statistics [EMAIL PROTECTED] +44 (0)20 8525 0696 http://www.burns-stat.com (home of S Poetry and A Guide for the Unwilling S User) dax42 wrote: Dear list, I would like to extract from a matrix all those rows, that are unique. By unique, I don't mean the unique that is accomplished by the function unique(), though... Consider the following example: h [,1] [,2] [1,]44 [2,]14 [3,]41 Now unique(h) returns exactly the same - because 1 4 and 4 1 is not the same for that function. What I would like to see, though, are only the first two rows (or the first and the third, it does not matter). Does anybody know how to do that? Cheers, Dax. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] unique rows
John Fox [EMAIL PROTECTED] writes: Dear Dax, I'll bet that someone comes up with a better approach, but the following does appear to work: u - unique(t(sapply(as.data.frame(t(h)), sort))) Or maybe just unique(t(apply(h,1,sort))) -- O__ Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] unique rows
On 29-Jan-05 dax42 wrote: Dear list, I would like to extract from a matrix all those rows, that are unique. By unique, I don't mean the unique that is accomplished by the function unique(), though... Consider the following example: h [,1] [,2] [1,]44 [2,]14 [3,]41 Now unique(h) returns exactly the same - because 1 4 and 4 1 is not the same for that function. What I would like to see, though, are only the first two rows (or the first and the third, it does not matter). Does anybody know how to do that? Cheers, Dax. How about: h[!duplicated(t(apply(h,1,sort))),] [,1] [,2] [1,]44 [2,]14 Better than unique(t(apply(h,1,sort))) [,1] [,2] [1,]44 [2,]14 in general (though it comes to the same for your example) since it preserves the order of elements in each row. Cheers, Ted. E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 [NB: New number!] Date: 29-Jan-05 Time: 14:26:31 -- XFMail -- __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Unique lists from a list
Hi I have a list. Two of the elements of this list are Name and Address, both of which are character vectors. Name and Address are linked, so that the same Name always associates with the same Address. What I want to do is pull out the unique values, as a new list of the same format (ie two elements of character vectors). Now I've worked out that unique(list$Name) will give me a list of the unique names, but how do I then go and link those to the correct (unique) addresses so I end up with a new list which is the same format as the rest, but now unique? Cheers Mick __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Unique lists from a list
On Wed, 1 Sep 2004, michael watson (IAH-C) wrote: I have a list. Two of the elements of this list are Name and Address, both of which are character vectors. Name and Address are linked, so that the same Name always associates with the same Address. What I want to do is pull out the unique values, as a new list of the same format (ie two elements of character vectors). Now I've worked out that unique(list$Name) will give me a list of the unique names, but how do I then go and link those to the correct (unique) addresses so I end up with a new list which is the same format as the rest, but now unique? match, as in match(unique(list$Name), list$name), OR indexing as in Address - list$Address names(Address) - list$Name Name - unique(list$Name) list(Name, as.vector(Address[Name]) OR choose a better data structure as in unique(as.data.frame(list)) -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Unique lists from a list
Try this: l.1 - list(list(name='a', addr='123'),list(name='b', addr='234'), list(name='b', addr='234'), list(name='a', addr='123')) # create a list l.names - unlist(lapply(l.1, '[[', 'name')) # get the 'name' l.u - unique(l.names) # make unique new.list - l.1[match(l.u, l.names)] # create new list with just one 'name' __ James HoltmanWhat is the problem you are trying to solve? Executive Technical Consultant -- Office of Technology, Convergys [EMAIL PROTECTED] +1 (513) 723-2929 michael watson (IAH-C) To: [EMAIL PROTECTED] [EMAIL PROTECTED]cc: .ac.uk Subject: [R] Unique lists from a list Sent by: [EMAIL PROTECTED] ath.ethz.ch 09/01/2004 10:31 Hi I have a list. Two of the elements of this list are Name and Address, both of which are character vectors. Name and Address are linked, so that the same Name always associates with the same Address. What I want to do is pull out the unique values, as a new list of the same format (ie two elements of character vectors). Now I've worked out that unique(list$Name) will give me a list of the unique names, but how do I then go and link those to the correct (unique) addresses so I end up with a new list which is the same format as the rest, but now unique? Cheers Mick __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Unique lists from a list
name - c(a, b, a, c, d, a, b) addr - c(10, 20, 10, 30, 40, 10, 20) duplicated(name) [1] FALSE FALSE TRUE FALSE FALSE TRUE TRUE which(duplicated(name)) [1] 3 6 7 addr[ -which(duplicated(name)) ] [1] 10 20 30 40 cbind( name, addr) [ -which(duplicated(name)), ] name addr [1,] a 10 [2,] b 20 [3,] c 30 [4,] d 40 Make sure that person named a always lives in address 10 (i.e. one-to-one mapping). If it is possible for person a to have two addresses (e.g. house and office) 10 and 11, then it might be better to collect both address. In this case, you can try : addr2 - c(10, 20, 11, 30, 40, 12, 21) tapply(addr2, as.factor(name), function(x) paste(x, collapse=, ) ) abcd 10, 11, 12 20, 21 30 40 To convert this into a list, use sapply(a, strsplit, split=, ). On Wed, 2004-09-01 at 15:31, michael watson (IAH-C) wrote: Hi I have a list. Two of the elements of this list are Name and Address, both of which are character vectors. Name and Address are linked, so that the same Name always associates with the same Address. What I want to do is pull out the unique values, as a new list of the same format (ie two elements of character vectors). Now I've worked out that unique(list$Name) will give me a list of the unique names, but how do I then go and link those to the correct (unique) addresses so I end up with a new list which is the same format as the rest, but now unique? Cheers Mick __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html