Wadud, Zia wrote: > Hi > I have a panel dataset with large number of groups and differing number > of observations for each group. I want to randomly select say, 20% of > the groups or 200 groups, but along with all observations from the > selcted groups (with the corresponding data). > I guess it is possible to generate a random sample from the groups ids > and then match that with the entire dataset to have the intended > dataset, but it sounds cumbersome and possibly there is an easier way to > do this? checked the package 'sampling' or command 'sample', but they > cant do exactly the same thing. > I was wondering if someone on this list will be able to share his/her > knowldege?
How about something like this? df <- data.frame(GROUP = rep(1:5, c(2,3,4,2,2)), Y = runif(13)) # Sample Two of the Five Groups subset(df, GROUP %in% with(df, sample(unique(GROUP), 2))) > Thanks in advance, > Zia > ********************************************************** > Zia Wadud > PhD Student > Centre for Transport Studies > Department of Civil and Environmental Engineering > Imperial College London > London SW7 2AZ > Tel +44 (0) 207 594 6055 > > > [[alternative HTML version deleted]] > > ______________________________________________ > [email protected] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Chuck Cleland, Ph.D. NDRI, Inc. 71 West 23rd Street, 8th floor New York, NY 10010 tel: (212) 845-4495 (Tu, Th) tel: (732) 512-0171 (M, W, F) fax: (917) 438-0894 ______________________________________________ [email protected] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
