Hi,
Being in the process of translating some of my SAS programs to R, I encountered one difficulty. I have a solution, but it is not elegant (and not pleasant to implement).
I have a large dataset with many variables needed to identify the origin of a sample, many to describe sample characteristics, others to describe site characteristics.
I want only a (shorter) list of sites and their characteristics.
If "origin", "ship_cat", "ship_nb", "trip" and "set" are needed to identify a site, in SAS you'd sort on those variables, then read the data with:
data sites;
set alldata;
by origin ship_cat ship_nb trip set;
if first.set;
keep list-of-variables-detailing-sites;
run;In R I did this with the Lag function of Hmisc, and the original data set also needs to be sorted first:
oL <- Lag(origin)
scL <- Lag(ship_cat)
snL <- Lag(ship_nb)
tL <- Lag(trip)
sL <- Lag(set)
same <- origin==oL & ship_cat==scL & ship_nb==snL & trip==tL & set==sL
sites <- subset(alldata, !same, select=c(list-of-variables-detailing-sites)
Could I do better than this?
Thanks in advance,
Denis Chabot
______________________________________________ [email protected] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
