On 04-May-05 Roger Bivand wrote: > On Wed, 4 May 2005, Sean Davis wrote: > >> see ?aggregate. > > Or maybe tapply, or its close relative, by: > >> by(df, list(df$station, df$date), function(x) > + x$row[which.max(x$chlorophyll)]) >: Ancona >: 21/06/01 > [1] NA > ------------------------------------------------------------ >: Castagneto >: 21/06/01 > [1] 3 > ------------------------------------------------------------ >: Ancona >: 23/06/01 > [1] 6 > ------------------------------------------------------------ >: Castagneto >: 23/06/01 > [1] NA > > since happily a row ID column was included in the data frame. Note that > which.max only reports the row of the first maximum if there are ties.
I've tried to work out a method which gives a cleaner result (for instance, the NAs are ugly and unnecessary). I've called Alessandro's data (below) "chl" (for chlorophyll), and using Roger's command above assign the result to "tmp": tmp<-by(chl, list(chl$station, chl$date), function(x) x$row[which.max(x$chlorophyll)] ) Then, using either tmp[1:2,] or tmp[,1:2] we get tmp[,1:2] ## 21/06/01 23/06/01 ## Ancona NA 6 ## Castagneto 3 NA which is a better layout but still has the NAs. It would be better to be able to get something like ## Ancona 23/06/01 6 ## Castagneto 21/06/01 3 but I don't see how to do it even for just these 2 stations. Now, however, suppose we want not just the rows but the values as well. Try a modified function tmp<-by(chl, list(chl$station, chl$date), function(x) list(Row=x$row[which.max(x$chlorophyll)], Val=max(x$chlorophyll)) ) Now str(tmp) ## List of 4 ## $ : NULL ## $ :List of 2 ## ..$ Row: int 3 ## ..$ Val: num 2.4 ## $ :List of 2 ## ..$ Row: int 6 ## ..$ Val: num 2.5 ## $ : NULL ## - attr(*, "dim")= int [1:2] 2 2 ## - attr(*, "dimnames")=List of 2 ## ..$ : chr [1:2] "Ancona" "Castagneto" ## ..$ : chr [1:2] "21/06/01" "23/06/01" ## - attr(*, "call")= language by.data.frame(data = chl, INDICES = ## list(chl$station, chl$date), FUN = function(x) list(Row = ## x$row[which.max(x$chlorophyll)], ... ## - attr(*, "class")= chr "by" I've not succeeded (though experience tells me that others could) in extracting from this something like the following: ## Ancona Castagneto ##Row 6 3 ##Val 2.5 2.4 ##Date 23/06/01 21/06/01 Questions: (a) What's the trick? (b) How to generalise it? Ted. > >> >> Sean >> >> On May 4, 2005, at 11:43 AM, alessandro carletti wrote: >> >> > Sorry for disturbing you with another newbie question! >> > I have a data frame about coastal waters quality >> > parameters: for some parameters (e.g. NH3) I have only >> > 1 observation for each sampling station and each >> > sampling date, while in other cases (chlorophyll) I >> > have 1 obs for each meter-depth for each station and >> > date. How can I select only the max chlorophyll value >> > for each station/date? >> > >> > example >> > >> > row station date depth chlorophyll >> > 1 Castagneto 21/06/01 -0.5 2.0 >> > 2 Castagneto 21/06/01 -1.5 2.2 >> > 3 Castagneto 21/06/01 -2.5 2.4 >> > 4 Castagneto 21/06/01 -3.5 2.1 >> > 5 Ancona 23/06/01 -0.5 2.4 >> > 6 Ancona 23/06/01 -1.5 2.5 >> > 7 Ancona 23/06/01 -2.5 2.2 >> > 8 Ancona 23/06/01 -3.5 2.1 >> > 9 Ancona 23/06/01 -4.5 1.9 >> > ... >> > >> > I'd like to select only row 3 and 6, the ones with max >> > chlorophyll values, or have the mean for the rows 1:4 >> > and 5:9 >> > >> > Thanks -------------------------------------------------------------------- E-Mail: (Ted Harding) <[EMAIL PROTECTED]> Fax-to-email: +44 (0)870 094 0861 Date: 05-May-05 Time: 14:13:13 ------------------------------ XFMail ------------------------------ ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html