Hi Jeff, a & b) points taken. Thanks for the reference too. c) taking the zero's out did the trick.
Dan -----Original Message----- From: Jeff Newmiller [mailto:jdnew...@dcn.davis.ca.us] Sent: Sunday, April 28, 2013 12:15 AM To: Lopez, Dan Cc: R help (r-help@r-project.org) Subject: Re: [R] Stratified Random Sampling Proportional to Size a) Please post plain text b) Please make reproducible examples (e.g. telling us how you accessed a database that we have no access to is not helpful). See ?head, ?dput and [1] c) I don't know anything about the sampling package or the strata function, but I would recommend eliminating the rows that have zeros from the input data. E.g.: stratum_cp <- stratum_cp[ 0<stratum_cp$stratp, ] [1] http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example On Fri, 26 Apr 2013, Lopez, Dan wrote: > Hello R Experts, > > I kindly request your assistance on figuring out how to get a > stratified random sampling proportional to 100. > > Below is my r code showing what I did and the error I'm getting with > sampling::strata > > # FIRST I summarized count of records by the two variables I want to > use as strata > > Library(RODBC) > library(sqldf) > library(sampling) > #After establishing connection I query the data and sort it by strata > APPT_TYP_CD_LL and EMPL_TYPE and store it in a dataframe > CURRPOP<-sqlQuery(ch,"SELECT APPT_TYP_CD_LL, > EMPL_TYPE,ASOFDATE,EMPLID,NAME,DEPTID,JOBCODE,JOBTITLE,SAL_ADMIN_PLAN, > RET_TYP_CD_LL FROM PS_EMPLOYEES_LL WHERE EMPL_STATUS NOT IN('R','T') > ORDER BY APPT_TYP_CD_LL, EMPL_TYPE") #ROWID is a dummy ID I added and > repositioned after the strat columns for later use > CURRPOP$ROWID<-seq(nrow(CURRPOP)) > CURRPOP<-CURRPOP[,c(1:2,11,3:10)] > > # My strata. Stratp is how many I want to sampled from each strata. NOTE > THERE ARE SOME 0's which just means I won't sample from that group. > stratum_cp<-sqldf("SELECT APPT_TYP_CD_LL,EMPL_TYPE, count(*) HC FROM > CURRPOP GROUP BY APPT_TYP_CD_LL,EMPL_TYPE") > stratum_cp$stratp<-round(stratum_cp$HC/nrow(CURRPOP)*100) > >> stratum_cp > APPT_TYP_CD_LL EMPL_TYPE HC stratp > 1 FA S 1 0 > 2 FC S 5 0 > 3 FP S 173 3 > 4 FR H 170 3 > 5 FX H 49 1 > 6 FX S 57 1 > 7 IN H 1589 25 > 8 IN S 3987 63 > 9 IP H 7 0 > 10 IP S 53 1 > 11 SA H 8 0 > 12 SE S 43 1 > 13 SF H 14 0 > 14 SF S 1 0 > 15 SG S 10 0 > 16 ST H 107 2 > 17 ST S 6 0 > > #THEN I attempted to use sampling::strata using the instructions in > that package and got an error > > > #I use stratum_cp$stratp for my sizes. > > > >> s<-strata(CURRPOP,c("APPT_TYP_CD_LL","EMPL_TYPE"),size=stratum_cp$str >> atp,method="srswor") > > Error in data.frame(..., check.names = FALSE) : > > arguments imply differing number of rows: 0, 1 > >> traceback() > > 5: stop("arguments imply differing number of rows: ", > paste(unique(nrows), > > collapse = ", ")) > > 4: data.frame(..., check.names = FALSE) > > 3: cbind(deparse.level, ...) > > 2: cbind(r, i) > > 1: strata(CURRPOP, c("APPT_TYP_CD_LL", "EMPL_TYPE"), size = > stratum_cp$stratp, > > method = "srswor") > > > > #In lieu of a reproducible sample here is some info regarding most of > my data > dim(CURRPOP) > [1] 6280 11 > #Cols w/ personal info have been removed in this output > >> str(CURRPOP[,c(1:3,7:11)]) > > 'data.frame': 6280 obs. of 8 variables: > > $ APPT_TYP_CD_LL: Factor w/ 12 levels "FA","FC","FP",..: 1 2 2 2 2 2 3 3 3 3 > ... > > $ EMPL_TYPE : Factor w/ 2 levels "H","S": 2 2 2 2 2 2 2 2 2 2 ... > > $ ROWID : int 1 2 3 4 5 6 7 8 9 10 ... > > $ DEPTID : int 9825 9613 9613 9852 9772 9852 9853 9853 9853 9854 ... > > $ JOBCODE : Factor w/ 325 levels "055.2","055.3",..: 311 112 112 112 > 112 112 298 299 299 300 ... > > $ JOBTITLE : Factor w/ 325 levels "Accounting Assistant",..: 227 192 192 > 192 192 192 190 191 191 153 ... > > $ SAL_ADMIN_PLAN: Factor w/ 40 levels "ADE","AME","ASE",..: 36 38 38 38 38 38 > 31 31 31 31 ... > > $ RET_TYP_CD_LL : Factor w/ 2 levels "TCP1","TCP2": 2 2 2 2 2 2 2 2 2 2 ... > > Daniel Lopez > Workforce Analyst > HRIM - Workforce Analytics & Metrics > Strategic Human Resources Management > wf-analytics-metr...@lists.llnl.gov<mailto:wf-analytics-metrics@lists. > llnl.gov> > (925) 422-0814 > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > --------------------------------------------------------------------------- Jeff Newmiller The ..... ..... Go Live... DCN:<jdnew...@dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/Batteries O.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.