Another way to do this is by using the SQLite select statements making use of the sqldf package.
First let us assume that the alphabetic order of the Quality levels reflects their desirability as well. In the example data this is true but if its not true in the whole data set where there may be additional levels of Quality then it would need to be recoded first. The first select statement sorts the data and the second one just takes the last row in each group. (It seems that when using the group by clause that it takes the last row of each group but I could not actually find this documented so beware.) Note that Time and Date names are changed by the underlying software which is why we must refer to them using Time__1 and Date__1. Lines <- "DeployID Date.Time LocationQuality Latitude Longitude STM05-1 28/02/2005 17:35 Good -35.562 177.158 STM05-1 28/02/2005 19:44 Good -35.487 177.129 STM05-1 28/02/2005 23:01 Unknown -35.399 177.064 STM05-1 01/03/2005 07:28 Unknown -34.978 177.268 STM05-1 01/03/2005 18:06 Poor -34.799 177.027 STM05-1 01/03/2005 18:47 Poor -34.85 177.059 STM05-2 28/02/2005 12:49 Good -35.928 177.328 STM05-2 28/02/2005 21:23 Poor -35.926 177.314 " # in next line replace textConnection(Lines) with "myfile.dat" library(sqldf) DF <- read.table(textConnection(Lines), skip = 1, as.is = TRUE, col.names = c("Id", "Date", "Time", "Quality", "Lat", "Long")) DFo <- sqldf("select * from DF order by substr(Date__1, 7, 4) || substr(Date__1, 4, 2) || substr(Date__1, 1, 2) DESC, Quality DESC, abs(substr(Time__1, 1, 2) + substr(Time__1, 4, 2) /60 - 12) DESC") sqldf("select * from DFo group by Date__1") #### # Here is a second different way to do it. # Another way to do it also using sqldf is via nested selects like this using # the same DF as above sqldf("select * from DF u where abs(substr(Time__1, 1, 2) + substr(Time__1, 4, 2) /60 - 12) = (select min(abs(substr(Time__1, 1, 2) + substr(Time__1, 4, 2) /60 - 12)) from DF x where Quality = (select min(Quality) from DF y where x.Date__1 = y.Date__1) and x.Date__1 = u.Date__1)") On 8/7/07, Gabor Grothendieck <[EMAIL PROTECTED]> wrote: > Read the data in and transform the Date, Time and Quality columns > so that Date and Time are the respective chron classes (see RNews 4/1 Help > Desk article) and Quality is a factor such that lower levels are better. > > Then sort the data frame by Date, Quality and time from noon. That would > actually be the default ordering if the shown factor levels are the only ones > since they naturally fall into that order but we have explicitly ordered them > in case your extension to other levels no longer obeys that property. > > Finally take the first row from each Date group. > > Lines <- "DeployID Date.Time LocationQuality Latitude Longitude > STM05-1 28/02/2005 17:35 Good -35.562 177.158 > STM05-1 28/02/2005 19:44 Good -35.487 177.129 > STM05-1 28/02/2005 23:01 Unknown -35.399 177.064 > STM05-1 01/03/2005 07:28 Unknown -34.978 177.268 > STM05-1 01/03/2005 18:06 Poor -34.799 177.027 > STM05-1 01/03/2005 18:47 Poor -34.85 177.059 > STM05-2 28/02/2005 12:49 Good -35.928 177.328 > STM05-2 28/02/2005 21:23 Poor -35.926 177.314 > " > > library(chron) > # in next line replace textConnection(Lines) with "myfile.dat" > DF <- read.table(textConnection(Lines), skip = 1, as.is = TRUE, > col.names = c("Id", "Date", "Time", "Quality", "Lat", "Long")) > > DF <- transform(DF, > Date = chron(Date, format = "d/m/y"), > Time = times(paste(Time, "00", sep = ":")), > Quality = factor(Quality, levels = c("Good", "Poor", "Unknown"))) > > o <- order(DF$Date, as.numeric(DF$Quality), abs(DF$Time - times("12:00:00"))) > DF <- DF[o,] > > DF[tapply(row.names(DF), DF$Date, head, 1), ] > > # The last line above could alternately be written like this: > do.call("rbind", by(DF, DF$Date, head, 1)) > > > > On 8/7/07, Tim Sippel <[EMAIL PROTECTED]> wrote: > > Thanks Ross. Yes, upon your remark it seems that maybe I need to be > > selecting first for the best available location, and then by the time stamp > > nearest to 12pm. It seems like tapply(), or aggregate() or aggregate.ts() > > might be applicable, but I'm working through trying different functions and > > have yet to find what works. Suggestions on what function to start with > > would be appreciated. > > > > > > > > Tim Sippel (MSc) > > > > School of Biological Sciences > > > > Auckland University > > > > Private Bag 92019 > > > > Auckland 1020 > > > > New Zealand > > > > +64-9-373-7599 ext. 84589 (work) > > > > +64-9-373-7668 (Fax) > > > > +64-21-593-001 (mobile) > > > > > > > > _____ > > > > From: Ross Darnell [mailto:[EMAIL PROTECTED] > > Sent: 08 August 2007 12:43 > > To: Tim Sippel; r-help@stat.math.ethz.ch > > Subject: RE: [R] Conditional subsetting > > > > > > > > > > > > Assuming *A* animal can only be in one location at any one time, I don't > > understand how once you have selected the location nearest 12pm how you can > > select based on the type of location? > > Do you mean to select on location before timei.e. from the best location > > visited that day which was closest to 12pm? > > > > Perhaps for condition 2 for a animal which.min(time-1200) (replace 1200 > > with properly defined timestamp representing 12:00pm) > > > > Ross Darnell > > > > > > > > -----Original Message----- > > From: [EMAIL PROTECTED] on behalf of Tim Sippel > > Sent: Wed 08-Aug-07 9:37 AM > > To: r-help@stat.math.ethz.ch > > Subject: [R] Conditional subsetting > > > > Hello- > > > > Upon searching the email archives and reading documentation I haven't found > > what I'm looking for. I have a dataset with a time/date stamp on a series > > of geographic locations (lat/lon) associated with the movements of animals. > > On some days there are numerous locations and some days there is only one > > location. Associated with each location is an indication that the quality > > of the location is either "good", "poor", or "unknown". For each animal, I > > want to extract only one location for each day. And I want the location > > extracted to be nearest to 12pm on any given day, and highest quality > > possible. > > > > So the order of priority for my extraction will be: 1. only one location per > > animal each day; 2. the location selected be as close to 12pm as possible; > > 3. the selected location comes from the best locations available (ie. take > > from pool of "good" locations first, or select from "poor" locations if a > > "good" location isn't available, or "unknown" if nothing else is available). > > > > > > > > > > I think aspect of this task that has me particularly stumped is how to > > select only one location for each day, and for that location to be a close > > to 12pm as possible. > > > > > > > > An example of my dataset follows: > > > > > > > > > > > > DeployID > > > > > > > > Date.Time > > > > > > > > LocationQuality > > > > Latitude > > > > Longitude > > > > > > STM05-1 > > > > > > > > 28/02/2005 17:35 > > > > > > > > Good > > > > -35.562 > > > > 177.158 > > > > > > STM05-1 > > > > > > > > 28/02/2005 19:44 > > > > > > > > Good > > > > -35.487 > > > > 177.129 > > > > > > STM05-1 > > > > > > > > 28/02/2005 23:01 > > > > > > > > Unknown > > > > -35.399 > > > > 177.064 > > > > > > STM05-1 > > > > > > > > 01/03/2005 07:28 > > > > > > > > Unknown > > > > -34.978 > > > > 177.268 > > > > > > STM05-1 > > > > > > > > 01/03/2005 18:06 > > > > > > > > Poor > > > > -34.799 > > > > 177.027 > > > > > > STM05-1 > > > > > > > > 01/03/2005 18:47 > > > > > > > > Poor > > > > -34.85 > > > > 177.059 > > > > > > STM05-2 > > > > > > > > 28/02/2005 12:49 > > > > > > > > Good > > > > -35.928 > > > > 177.328 > > > > > > STM05-2 > > > > > > > > 28/02/2005 21:23 > > > > > > > > Poor > > > > -35.926 > > > > 177.314 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Many thanks for your input. I'm using R 2.5.1 on Windows XP. > > > > > > > > Cheers, > > > > > > > > Tim Sippel (MSc) > > > > School of Biological Sciences > > > > Auckland University > > > > Private Bag 92019 > > > > Auckland 1020 > > > > New Zealand > > > > +64-9-373-7599 ext. 84589 (work) > > > > +64-9-373-7668 (Fax) > > > > +64-21-593-001 (mobile) > > > > > > > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help@stat.math.ethz.ch mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help@stat.math.ethz.ch mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.