I want to thank Petr Pikal, Robert Balshaw and Na Li for suggesting the use of "unique" or "!duplicated" on a subset of my data where unwanted variables have been removed. This worked perfectly.

Denis Chabot
On 13 Jan 2005 at 11:52, Denis Chabot wrote:

Hi,

Being in the process of translating some of my SAS programs to R, I
encountered one difficulty. I have a solution, but it is not elegant
(and not pleasant to implement).

I have a large dataset with many variables needed to identify the
origin of a sample, many to describe sample characteristics, others to
describe site characteristics.

I want only a (shorter) list of sites and their characteristics.

If "origin", "ship_cat", "ship_nb", "trip" and "set" are needed to
identify a site, in SAS you'd sort on those variables, then read the
data with:

data sites;
 set alldata;
 by origin ship_cat ship_nb trip set;
 if first.set;
 keep list-of-variables-detailing-sites;
run;

In R I did this with the Lag function of Hmisc, and the original data
set also needs to be sorted first:

oL <- Lag(origin)
scL <- Lag(ship_cat)
snL <- Lag(ship_nb)
tL <- Lag(trip)
sL <- Lag(set)
same <- origin==oL & ship_cat==scL & ship_nb==snL & trip==tL & set==sL
sites <- subset(alldata, !same,
select=c(list-of-variables-detailing-sites)

Could I do better than this?

______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Reply via email to