RE: [R] subsetting like in SAS
I want to thank Petr Pikal, Robert Balshaw and Na Li for suggesting the use of "unique" or "!duplicated" on a subset of my data where unwanted variables have been removed. This worked perfectly. Denis Chabot On 13 Jan 2005 at 11:52, Denis Chabot wrote: Hi, Being in the process of translating some of my SAS programs to R, I encountered one difficulty. I have a solution, but it is not elegant (and not pleasant to implement). I have a large dataset with many variables needed to identify the origin of a sample, many to describe sample characteristics, others to describe site characteristics. I want only a (shorter) list of sites and their characteristics. If "origin", "ship_cat", "ship_nb", "trip" and "set" are needed to identify a site, in SAS you'd sort on those variables, then read the data with: data sites; set alldata; by origin ship_cat ship_nb trip set; if first.set; keep list-of-variables-detailing-sites; run; In R I did this with the Lag function of Hmisc, and the original data set also needs to be sorted first: oL <- Lag(origin) scL <- Lag(ship_cat) snL <- Lag(ship_nb) tL <- Lag(trip) sL <- Lag(set) same <- origin==oL & ship_cat==scL & ship_nb==snL & trip==tL & set==sL sites <- subset(alldata, !same, select=c(list-of-variables-detailing-sites) Could I do better than this? __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] subsetting like in SAS
Hi Denis maybe unique() can choose unique entries from your data set without need for sorting. Cheers Petr On 13 Jan 2005 at 11:52, Denis Chabot wrote: > Hi, > > Being in the process of translating some of my SAS programs to R, I > encountered one difficulty. I have a solution, but it is not elegant > (and not pleasant to implement). > > I have a large dataset with many variables needed to identify the > origin of a sample, many to describe sample characteristics, others to > describe site characteristics. > > I want only a (shorter) list of sites and their characteristics. > > If "origin", "ship_cat", "ship_nb", "trip" and "set" are needed to > identify a site, in SAS you'd sort on those variables, then read the > data with: > > data sites; > set alldata; > by origin ship_cat ship_nb trip set; > if first.set; > keep list-of-variables-detailing-sites; > run; > > In R I did this with the Lag function of Hmisc, and the original data > set also needs to be sorted first: > > oL <- Lag(origin) > scL <- Lag(ship_cat) > snL <- Lag(ship_nb) > tL <- Lag(trip) > sL <- Lag(set) > same <- origin==oL & ship_cat==scL & ship_nb==snL & trip==tL & set==sL > sites <- subset(alldata, !same, > select=c(list-of-variables-detailing-sites) > > Could I do better than this? > > Thanks in advance, > > Denis Chabot > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html Petr Pikal [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html