> On Dec 4, 2015, at 1:10 PM, William Dunlap <wdun...@tibco.com> wrote: > > With a data.frame sorted by id, with ties broken by date, as in > your example, you can select rows that are either the start > of a new id group or the start of run of consecutive dates with: > >> w <- c(TRUE, diff(uci$date)>1) | c(TRUE, diff(uci$id)!=0) >> which(w) > [1] 1 4 5 7 >> uci[w,] > id date value > 1 1 2005-10-28 1 > 4 1 2005-11-07 3 > 5 1 2007-03-19 1 > 7 2 2004-06-02 2 > > I'll leave it to you to translate that R syntax into data.table syntax - > it just involves comparing the current row with the previous row. > > Bill Dunlap > TIBCO Software > wdunlap tibco.com > > > On Fri, Dec 4, 2015 at 12:53 PM, Frank S. <f_j_...@hotmail.com> wrote: >> Dear R users, >> >> I usually work with data.table package, but I'm sure that muy question can >> also be answered working with R data frame. >> Working with grouped data (by "id"), I wonder if it is possible to keep in >> a R data.frame (or R data.table): >> a) Only the first row if there is a row which belongs to a a group of rows >> (from same "id") that have consecutive dates. >> b) All the rows which do not belong to the above groups. >> >> As an example, I have "uci" data.frame: >> >> uci <- data.table(id=c(rep(1,6),2), >> date = >> as.Date(c("2005-10-28","2005-10-29","2005-10-30","2005-11-07","2007-03-19","2007-03-20","2004-06-02")), >> value = c(1, 2, 1, 3, 1, 2, 2)) >> >> id date value >> 1 2005-10-28 1 >> 1 2005-10-29 2 >> 1 2005-10-30 1 >> 1 2005-11-07 3 >> 1 2007-03-19 1 >> 1 2007-03-20 2 >> 2 2004-06-02 2 >> >> And the desired output would be: >> >> id date value >> 1 2005-10-28 1 >> 1 2005-11-07 3 >> 1 2007-03-19 1 >> 2 2004-06-02 2
The syntax of `[.data.table` is a bit odd; You can refer to columns by name; I never trust my intuition, though. Selection is usually done with a logical vector in the āiā-position. The diff operator does succeed in the āiā position with the obvious need to prepend with a starting value.. > uci[ c(0,diff(date))!=1, ] id date value 1: 1 2005-10-28 1 2: 1 2005-11-07 3 3: 1 2007-03-19 1 4: 2 2004-06-02 2 The other cases are handle with the converse-expression > uci[c(0,diff(date)) == 1, ] id date value 1: 1 2005-10-29 2 2: 1 2005-10-30 1 3: 1 2007-03-20 2 >> >> # From the following link, I have tried: >> http://stackoverflow.com/questions/32308636/r-how-to-sum-values-from-rows-only-if-the-key-value-is-the-same-and-also-if-the >> >> setDT(uci)[ ,list(date=date[1L], value = value[1L]), by = >> .(ind=rleid(date), id)][, ind:=NULL][] >> >> But I get the same data frame, and I do not know the reason. >> >> Thank you very much for any help!! >> >> Frank S. >> >> >> >> >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. David Winsemius Alameda, CA, USA ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.