try > attach(dat) > dat<-dat[order(fam,wt),] #sort the data ,as the stata's byable command does > lis<-by(dat,fam,function(x) x[length(x$fam),]) #equall your stata command ,but return a list. > do.call(rbind,lis) #to make the list to be a matrix-like result. fam wt keep 1 1 1.0 1 2 2 1.0 1 3 3 0.4 0 4 4 0.4 0
======= 2005-08-01 22:24:27 您在来信中写道:======= >I am struggling with migrating some stata code to R. I have a data >frame containing, sometimes, repeat observations (rows) of the same >family. I want to keep only one observation per family, selecting >that observation according to some other variable. An example data >frame is: > ># construct example data >fam <- c(1,2,3,3,4,4,4) >wt <- c(1,1,0.6,0.4,0.4,0.4,0.2) >keep <- c(1,1,1,0,1,0,0) >dat <- as.data.frame(cbind(fam,wt,keep)) >dat > >I want to keep the observation for which wt is a maximum, and where >this doesn't identify a unique observation, to keep just one anyway, >not caring which. Those observations are indicated above by keep==1. >(Note, keep <- c(1,1,1,0,0,1,0) would be fine too, but not >c(1,1,1,0,0,0,1)). > >The stata code I would use is >bys fam (wt): keep if _n==_N > >This is my (long-winded) attempt in R: > ># first keep those rows where wt=max_fam(wt) >maxwt <- by(dat,dat$fam,function(x) max(x[,2])) >maxwt <- sapply(maxwt,"[[",1) >maxwt.dat <- data.frame("maxwt"=maxwt,"fam"=as.integer(names(maxwt))) >dat <- merge(dat,maxwt.dat) >dat <- dat[dat$wt==dat$maxwt,] >dat > >Now I am stuck - I want to keep either row with fam==4, and have tried >playing around with combinations of sample and apply or by, but with >no success. I can only find an inefficient for-loop solution: > ># identify those rows with >1 observation >more <- by(dat,dat$fam,function(x) dim(x)[1]) >more <- sapply(more,"[[",1) >more.dat <- data.frame("more"=more,"fam"=as.integer(names(more))) >dat <- merge(dat,more.dat) > ># sample from those for whom more>1 >result<-dat[dat$more==1,] >for(f in unique(dat$fam[dat$more>1])) { > rows <- rownames(dat[dat$fam==f,]) > result <- rbind(result,dat[sample(rows,1),]) >} >result > >I am sure that for something so simple in stata to be so complicated >in R must indicate ignorance of R on my part, but searches of help >files and RSiteSearch hasn't led to any better solution. > >Any suggestions would be most helpful! Thanks, C. > >______________________________________________ >R-help@stat.math.ethz.ch mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html = = = = = = = = = = = = = = = = = = = = 2005-08-01 ------ Deparment of Sociology Fudan University Blog:http://sociology.yculblog.com
______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html