Re: [R] subsetting by groups, with conditions
Hi, I think you can also use plyr for this, dft - read.table(textConnection(P1idVeg1Veg2AreaPoly2 P2ID 1 p p 1 1 1 p p 1.5 2 2 p p 2 3 2 p h 3.5 4), header=T) library(plyr) ddply(dft, .(P1id), function(.df) { .ddf - subset(.df, as.character(Veg1)==as.character(Veg2)) .ddf[which.max(.ddf$AreaPoly2), ] }) HTH, baptiste 2009/12/29 Seth W Bigelow sbige...@fs.fed.us: I have a data set similar to this: P1id Veg1 Veg2 AreaPoly2 P2ID 1 p p 1 1 1 p p 1.5 2 2 p p 2 3 2 p h 3.5 4 For each group of Poly1id records, I wish to output (subset) the record which has largest AreaPoly2 value, but only if Veg1=Veg2. For this example, the desired dataset would be P1id Veg1 Veg2 AreaPoly2 P2ID 1 p p 1.5 2 2 p p 2 3 Can anyone point me in the right direction on this? Dr. Seth W. Bigelow Biologist, USDA-FS Pacific Southwest Research Station 1731 Research Park Drive, Davis California [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] subsetting by groups, with conditions
I have a data set similar to this: P1idVeg1Veg2AreaPoly2 P2ID 1 p p 1 1 1 p p 1.5 2 2 p p 2 3 2 p h 3.5 4 For each group of Poly1id records, I wish to output (subset) the record which has largest AreaPoly2 value, but only if Veg1=Veg2. For this example, the desired dataset would be P1idVeg1Veg2AreaPoly2 P2ID 1 p p 1.5 2 2 p p 2 3 Can anyone point me in the right direction on this? Dr. Seth W. Bigelow Biologist, USDA-FS Pacific Southwest Research Station 1731 Research Park Drive, Davis California [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] subsetting by groups, with conditions
try this: x - read.table(textConnection(P1idVeg1Veg2AreaPoly2 P2ID + 1 p p 1 1 + 1 p p 1.5 2 + 2 p p 2 3 + 2 p h 3.5 4), header=TRUE, as.is=TRUE) # split the dataframe by P1id x.s - split(x, x$P1id) # now go through the sets to see which is the largest result - lapply(x.s, function(.sub){ + .match - subset(.sub, Veg1 == Veg2) + if (length(.match) 0){ + return(.match[which.max(.match$AreaPoly2),]) + } + else { + return(NULL) + } + }) do.call(rbind, result) P1id Veg1 Veg2 AreaPoly2 P2ID 11pp 1.52 22pp 2.03 On Mon, Dec 28, 2009 at 8:03 PM, Seth W Bigelow sbige...@fs.fed.us wrote: I have a data set similar to this: P1idVeg1Veg2AreaPoly2 P2ID 1 p p 1 1 1 p p 1.5 2 2 p p 2 3 2 p h 3.5 4 For each group of Poly1id records, I wish to output (subset) the record which has largest AreaPoly2 value, but only if Veg1=Veg2. For this example, the desired dataset would be P1idVeg1Veg2AreaPoly2 P2ID 1 p p 1.5 2 2 p p 2 3 Can anyone point me in the right direction on this? Dr. Seth W. Bigelow Biologist, USDA-FS Pacific Southwest Research Station 1731 Research Park Drive, Davis California [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] subsetting by groups, with conditions
Assuming your data frame is called DF we can use sqldf like this. The inner select calculates the maximum AreaPoly2 for each group such that Veg1 = Veg2 and the outer select returns the corresponding row. library(sqldf) sqldf(select * from DF a where AreaPoly2 = (select max(AreaPoly2) from DF where Veg1 = Veg2 and P1id = a.P1id)) Running it looks like this: library(sqldf) sqldf(select * from DF a where AreaPoly2 = + (select max(AreaPoly2) from DF where Veg1 = Veg2 and P1id = a.P1id)) P1id Veg1 Veg2 AreaPoly2 P2ID 11pp 1.52 22pp 2.03 On Mon, Dec 28, 2009 at 8:03 PM, Seth W Bigelow sbige...@fs.fed.us wrote: I have a data set similar to this: P1id Veg1 Veg2 AreaPoly2 P2ID 1 p p 1 1 1 p p 1.5 2 2 p p 2 3 2 p h 3.5 4 For each group of Poly1id records, I wish to output (subset) the record which has largest AreaPoly2 value, but only if Veg1=Veg2. For this example, the desired dataset would be P1id Veg1 Veg2 AreaPoly2 P2ID 1 p p 1.5 2 2 p p 2 3 Can anyone point me in the right direction on this? Dr. Seth W. Bigelow Biologist, USDA-FS Pacific Southwest Research Station 1731 Research Park Drive, Davis California [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] subsetting by groups, with conditions
On Dec 28, 2009, at 7:03 PM, Seth W Bigelow wrote: I have a data set similar to this: P1idVeg1Veg2AreaPoly2 P2ID 1 p p 1 1 1 p p 1.5 2 2 p p 2 3 2 p h 3.5 4 For each group of Poly1id records, I wish to output (subset) the record which has largest AreaPoly2 value, but only if Veg1=Veg2. For this example, the desired dataset would be P1idVeg1Veg2AreaPoly2 P2ID 1 p p 1.5 2 2 p p 2 3 Can you be more expansive (or perhaps more accurate?) about the conditions you want satisfied? Looking at the that dataset, I only see one row that has the largest value for AreaPoly2 within the three records where Veg1==Veg2. Otherwise I would think the answer might be along these lines: dft - read.table(textConnection(P1idVeg1Veg2 AreaPoly2 P2ID + 1 p p 1 1 + 1 p p 1.5 2 + 2 p p 2 3 + 2 p h 3.5 4), header=T) dft$Veg1 - factor(dft$Veg1, levels=levels(dft$Veg2)) s.dft - subset(dft, Veg1==Veg2) s.dft[which.max(s.dft$AreaPoly2),] P1id Veg1 Veg2 AreaPoly2 P2ID 32pp 23 -- David Can anyone point me in the right direction on this? Dr. Seth W. Bigelow Biologist, USDA-FS Pacific Southwest Research Station 1731 Research Park Drive, Davis California [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.