RE: [R] subsetting like in SAS

2005-01-17 Thread Denis Chabot
I want to thank Petr Pikal, Robert Balshaw and Na Li for suggesting the 
use of "unique" or "!duplicated" on a subset of my data where unwanted 
variables have been removed. This worked perfectly.

Denis Chabot
On 13 Jan 2005 at 11:52, Denis Chabot wrote:
Hi,
Being in the process of translating some of my SAS programs to R, I
encountered one difficulty. I have a solution, but it is not elegant
(and not pleasant to implement).
I have a large dataset with many variables needed to identify the
origin of a sample, many to describe sample characteristics, others to
describe site characteristics.
I want only a (shorter) list of sites and their characteristics.
If "origin", "ship_cat", "ship_nb", "trip" and "set" are needed to
identify a site, in SAS you'd sort on those variables, then read the
data with:
data sites;
 set alldata;
 by origin ship_cat ship_nb trip set;
 if first.set;
 keep list-of-variables-detailing-sites;
run;
In R I did this with the Lag function of Hmisc, and the original data
set also needs to be sorted first:
oL <- Lag(origin)
scL <- Lag(ship_cat)
snL <- Lag(ship_nb)
tL <- Lag(trip)
sL <- Lag(set)
same <- origin==oL & ship_cat==scL & ship_nb==snL & trip==tL & set==sL
sites <- subset(alldata, !same,
select=c(list-of-variables-detailing-sites)
Could I do better than this?
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] subsetting like in SAS

2005-01-13 Thread Petr Pikal
Hi Denis

maybe unique() can choose unique entries from your data set 
without need for sorting.

Cheers
Petr

On 13 Jan 2005 at 11:52, Denis Chabot wrote:

> Hi,
> 
> Being in the process of translating some of my SAS programs to R, I
> encountered one difficulty. I have a solution, but it is not elegant
> (and not pleasant to implement).
> 
> I have a large dataset with many variables needed to identify the
> origin of a sample, many to describe sample characteristics, others to
> describe site characteristics.
> 
> I want only a (shorter) list of sites and their characteristics.
> 
> If "origin", "ship_cat", "ship_nb", "trip" and "set" are needed to
> identify a site, in SAS you'd sort on those variables, then read the
> data with:
> 
> data sites;
>  set alldata;
>  by origin ship_cat ship_nb trip set;
>  if first.set;
>  keep list-of-variables-detailing-sites;
> run;
> 
> In R I did this with the Lag function of Hmisc, and the original data
> set also needs to be sorted first:
> 
> oL <- Lag(origin)
> scL <- Lag(ship_cat)
> snL <- Lag(ship_nb)
> tL <- Lag(trip)
> sL <- Lag(set)
> same <- origin==oL & ship_cat==scL & ship_nb==snL & trip==tL & set==sL
> sites <- subset(alldata, !same,
> select=c(list-of-variables-detailing-sites)
> 
> Could I do better than this?
> 
> Thanks in advance,
> 
> Denis Chabot
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html

Petr Pikal
[EMAIL PROTECTED]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html