Re: [R] Survey package/svyby source code help
Thank you for responding. I am truly grateful. Apologies for omitting evident and pertinent information. I am using 3.36. I will update to 3.37. I did not notice the newer version. I realize I needed to be more specific. The attr(, "var") that I am interested in is displayed with str(results) after the results object is declared. First line of the subject code looks like: results <- (if (multicore) parallel::mcapply else lapply)(uniques, function(i){ How that line functions without error is beyond me, but minor to my objective. Disclosing my prime goal, I am trying to produce the same values generated by the confint function, which receives the standard error values from this svyby function. I am trying to replicate the standard error calculation in R and use it in a known program called Tableau. If any other information is needed please let me know. My sincerest gratitude. ‐‐‐ Original Message ‐‐‐ On Tuesday, February 11, 2020 9:42 AM, Ivan Krylov wrote: > On Tue, 11 Feb 2020 02:33:45 + > AndertechLLC--- via R-help r-help@r-project.org wrote: > > > When debugging the code I am not following the generation of values > > in the results object attr(*, "var")" after line 57 completes. These > > values are fed into line 74 (rval <- t(sapply(results, unwrap))). > > Which version of the survey package you have in mind? I took a look at > the latest version available on CRAN (3.37 at the time of this post, > []) and the R/surveyby.R file has somewhat different code on lines 57 > and 74. > In particular, attr(, "var") is assigned much later, from variablenamed > covmat.mat, which is also computed later than line 74. > > --- > > Best regards, > Ivan > > [*] https://cran.r-project.org/src/contrib/survey_3.37.tar.gz __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Survey package/svyby source code help
On Tue, 11 Feb 2020 15:23:14 + andertech...@protonmail.com wrote: > The attr(, "var") that I am interested in is displayed with > str(results) after the results object is declared. First line of the > subject code looks like: > > results <- (if (multicore) parallel::mcapply else lapply)(uniques, > function(i){ 'if' returns whatever the taken branch returns (it is documented in ?'if', section "Value"). The rest of the statement applies the function FUN (passed as argument to svyby) to the data (passed as `formula` argument), so the "var" attribute you are looking for in fact comes from the FUN invocation and not from svyby itself. Most examples from ?svyby use FUN=svymean, so perhaps this is where you should be going next. Make sure to check out R-Intro [*] to avoid common pitfalls when working with methods in R. -- Best regards, Ivan [*] https://cran.r-project.org/doc/manuals/r-release/R-intro.html#Object-orientation __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Survey package/svyby source code help
On Tue, 11 Feb 2020 02:33:45 + AndertechLLC--- via R-help wrote: > When debugging the code I am not following the generation of values > in the results object attr(*, "var")" after line 57 completes. These > values are fed into line 74 (rval <- t(sapply(results, unwrap))). Which version of the survey package you have in mind? I took a look at the latest version available on CRAN (3.37 at the time of this post, [*]) and the R/surveyby.R file has somewhat different code on lines 57 and 74. In particular, attr(*, "var") is assigned much later, from variable named covmat.mat, which is also computed later than line 74. -- Best regards, Ivan [*] https://cran.r-project.org/src/contrib/survey_3.37.tar.gz __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Survey package/svyby source code help
Good day, I was looking for some help with understanding a particular portion of the svyby source code. When debugging the code I am not following the generation of values in the results object attr(*, "var")" after line 57 completes. These values are fed into line 74 (rval <- t(sapply(results, unwrap))). Alternatively I invite explanation of how the SE values are created in the rval object on line 74. the data entering svyby is from a svydesign.default object. I am especially confused by the syntax in 57 - 59, which is not the primary question, but would appreciate any input. I am new to R and this mailing list and do apologize for committing faux-pas. Thanks in advance. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] survey package rake results in very large weights
Hi, I'm trying to use both the survey package and the anesrake package to perform raking on my sample dataset, to get the proportions to match up with population data from the census. I'm trying the following, but the resulting weights are very large: svy.unweighted - svydesign(ids=~1, data=cleanPanelistData) svy.rake - rake(design = svy.unweighted, sample.margins = list(~age, ~education, ~ethnicity, ~gender, ~income, ~location, ~size), population.margins = list(censusAge, censusEducation, censusEthnicity, censusGender, censusIncome, censusLocation, censusSize)) summary(weights(svy.rake)) Min. 1st Qu. Median Mean 3rd Qu. Max. 133.9567.6 1062.0 2606.0 2305.0 396800.0 When I use the anesrake package, the weights are capped at 5 and I'm able to get a design effect of around 3. My data set in .Rdata format is available here http://we.tl/bUAuW3ajVs . Am I doing something wrong? Should I be using calibrate with calfun='raking' instead of rake? This was run on: RStudio on Mac Mavericks, version 3.02 with survey_3.30-3 thanks, imran [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] survey package -- doesn't appear to match svy
could you provide a minimal reproducible example? perhaps use ?dput. in general the survey package matches all other languages http://journal.r-project.org/archive/2009-2/RJournal_2009-2_Damico.pdf here's an example of a minimal reproducible example that does match http://www.ats.ucla.edu/stat/stata/faq/svy_stata_subpop.htm library(foreign) library(survey) x - read.dta( http://www.ats.ucla.edu/stat/stata/seminars/svy_stata_intro/strsrs.dta; ) y - svydesign( ~ 1 , strat = ~strat , data = x , weights = ~ pw , fpc = ~fpc ) svytable( ~ yr_rnd , y ) z - subset( y , yr_rnd == 1 ) svymean( ~ ell , z ) there are lots of examples of R matching official published statistics (often computed with stata) here-- https://github.com/ajdamico/usgsd/search?utf8=%E2%9C%93q=replication On Mon, Oct 27, 2014 at 11:11 AM, Stephen Amrock stephen.amr...@gmail.com wrote: Hi, I'm new to R and have encountered two issues in coding using the survey package: (1) Code from *svytable* using survey package does not correspond to Stata estimates from *svy: tab*. I call svyd.nation - svydesign(ids = ~1, probs = ~wt_national, strata = ~stratum, data=nats.sub) svytable(formula = ~wpev, design = svyd.nation, Ntotal = 100) where the equivalent Stata would be: svyset [pw=wt_national], strata(stratum) svy: tab wpev These are inverse probability weights but Stata and R give different %s for the tabulations. I know from published results that the Stata code is correct. Any ideas as to what I've done incorrectly in R? (2) Alternative weights---which I've verified in R have the same distribution as in Stata---produce a strange inf output. svyd.st - svydesign(ids = ~1, probs = ~wt_state, strata = ~stratum, data=nats.sub) svytable(~wpev, design = svyd.st, Ntotal=100) This produces the following output: wpev 0 1 The code svytable(~wpev, svyd.st) produces the output: wpev 0 1 Inf Inf Why are these alternative weights -- which work in Stata when resetting svyset -- not working in R? Any insights would be much appreciated!! Many thanks! - S [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] survey package -- doesn't appear to match svy
Hi, I'm new to R and have encountered two issues in coding using the survey package: (1) Code from *svytable* using survey package does not correspond to Stata estimates from *svy: tab*. I call svyd.nation - svydesign(ids = ~1, probs = ~wt_national, strata = ~stratum, data=nats.sub) svytable(formula = ~wpev, design = svyd.nation, Ntotal = 100) where the equivalent Stata would be: svyset [pw=wt_national], strata(stratum) svy: tab wpev These are inverse probability weights but Stata and R give different %s for the tabulations. I know from published results that the Stata code is correct. Any ideas as to what I've done incorrectly in R? (2) Alternative weights---which I've verified in R have the same distribution as in Stata---produce a strange inf output. svyd.st - svydesign(ids = ~1, probs = ~wt_state, strata = ~stratum, data=nats.sub) svytable(~wpev, design = svyd.st, Ntotal=100) This produces the following output: wpev 0 1 The code svytable(~wpev, svyd.st) produces the output: wpev 0 1 Inf Inf Why are these alternative weights -- which work in Stata when resetting svyset -- not working in R? Any insights would be much appreciated!! Many thanks! - S [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Survey package help with svystandardize
I am trying to age standardize using the svystandardize package in R. I have successfully managed to hit my SUDAAN based targets for estimates by sex, but not the total. The total is only a little different, but I'd like some help knowing why it isn't exact. I've included the SUDAAN code that generates the targets and my R script (and output) that I have so far. I can't supply the data since that would violate usage agreements. * Sorry if the output isn't formatted nicely. SUDAAN results: Variance Estimation Method: Taylor Series (WR) Standardized estimates by: Variable, Sex of respondent. | || Sex of respondent | | Variable| |-| | || Total| Male | Female | | | | || | | Current Smoker: | Sample Size | 14732|5582 |9150| | At risk | Weighted Size | 18073876.32 | 8917534.53 | 9156341.79 | | | Total| 3477083.28 | 2105598.41 | 1371484.87 | | | Percent | 18.84 | 22.78| 14.87 | | | SE Percent |0.60 |0.95 |0.73 | | | Lower 95% Limit | | | | | | Percent | 17.69 | 20.97 | 13.50| | | Upper 95% Limit | || | | | Percent | 20.05 | 24.69 | 16.34 | - R Output: Estimate for the total is off. svyciprop(~I(rfsmok==At risk), stdes, na.rm=TRUE, ci=TRUE, vartype=ci) 2.5% 97.5% I(rfsmok == At risk) 0.187685 0.176309 0.19962 Estimates by sex are a match. svyby(~I(rfsmok==At risk), ~sex, stdes, svyciprop, na.rm=TRUE, ci=TRUE, vartype=ci) sex I(rfsmok == At risk) ci_l ci_u Male Male 0.2277632725 0.2097154285 0.2468790907 Female Female 0.1486524831 0.13498320660.163481 - R Code: sample.brfss - svydesign(id=~brfss.2011$SEQNO, strata=~brfss.2011$ststr, weights=~brfss.2011$LLCPWT, data=brfss.2011, nest=T) popage - c(0.128810, 0.182648, 0.219077, 0.299194, 0.170271) options(survey.lonely.psu=adjust) testing_dataset - subset(sample.brfss, (!is.na(agegr5) !is.na(sex) ! is.na(rfsmok))) stdes-svystandardize(testing_dataset, by=~agegr5, over=~sex, population=popage) - SUDAAN code: PROC DESCRIPT DATA = h:\\brfss_data\\brfss11\\pudf11\\txbrfss_pudf11 DESIGN = WR FILETYPE= SUDXPORT NOTSORTED; NEST STSTR SEQNO/MISSUNIT; WEIGHT LLCPWT; SETENV LINESIZE = 106 PAGESIZE = 45 COLWIDTH = 11 TOPMGN = 1; VAR RFSMOK; CATLEVEL 2; STDVAR AGEGR5; STDWGT .128810 .182648 .219077 .299194 .170271; SUBGROUP SEX AGEGR5; LEVELS 2 5 ; TABLES (SEX); [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] survey package question
Hello, I have got a cluster sample using an election dataset where I already had the final results of a county-specific election. I am trying to figure out what would be the best sampling design for my data. The structure of the dataset is: 1) polling station (in general schools where people vote, for a county, for example, there are 15 polling stations) 2) inside each polling station, there are voting units, where people actually vote (on average there are about 40 voting units for polling station) 3) for each voting unit I have the total votes by candidate (e.g., candidate 1 =322, candidate 2=122, candidate 3= 89) The initial sampling design is: 1) selection of 5 polling stations PPS (based on number of voters) 2) selection of 10 voting units (SRS) I am interested in estimating the proportion of votes by candidate (let's assume we have 3 candidates). My naive estimate would be: votes for candidate 1 / all valid votes = proportion e.g. candidate 1= 2132 / 10874= .1906 candidate 2= 5323 / 10874= .4895 candidate 3= 3419 / 10874= .3144 In this case, the unit of analysis is voters (or votes). If I specify the sampling design using the survey package in this way... design -svydesign(id=~station + unit fpc=~probstation +probunit, data=sample, pps=brewer) svyciprop(~I(candidate1/totalVotes), design) ... I am assuming that the unit of analysis is the voting unit, right? and I am estimating an average among voting units? I should expand my database at individual level (voters), or I just have to include a unit weight according to the number of voters for voting unit? In other words, is there a way to estimate, for instance, votes for candidate 1 / all valid votes = proportion, directly from the survey package or I have to expand the database at people level (voters), and then estimate the proportion using svymean and the respective design. I would appreciate any advice or help. Sebastian __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] survey package question
On Fri, Oct 12, 2012 at 6:56 AM, Sebastián Daza sebastian.d...@gmail.com wrote: Hello, I have got a cluster sample using an election dataset where I already had the final results of a county-specific election. I am trying to figure out what would be the best sampling design for my data. The structure of the dataset is: 1) polling station (in general schools where people vote, for a county, for example, there are 15 polling stations) 2) inside each polling station, there are voting units, where people actually vote (on average there are about 40 voting units for polling station) 3) for each voting unit I have the total votes by candidate (e.g., candidate 1 =322, candidate 2=122, candidate 3= 89) The initial sampling design is: 1) selection of 5 polling stations PPS (based on number of voters) 2) selection of 10 voting units (SRS) I am interested in estimating the proportion of votes by candidate (let's assume we have 3 candidates). My naive estimate would be: votes for candidate 1 / all valid votes = proportion e.g. candidate 1= 2132 / 10874= .1906 candidate 2= 5323 / 10874= .4895 candidate 3= 3419 / 10874= .3144 In this case, the unit of analysis is voters (or votes). If I specify the sampling design using the survey package in this way... design -svydesign(id=~station + unit fpc=~probstation +probunit, data=sample, pps=brewer) svyciprop(~I(candidate1/totalVotes), design) ... I am assuming that the unit of analysis is the voting unit, right? and I am estimating an average among voting units? You want a ratio estimator svyratio(~candidate1, ~totalVotes, design) -thomas -- Thomas Lumley Professor of Biostatistics University of Auckland __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] survey package question
Hello Thomas, I use both svymean (with the expanded sample = people), and svyratio (voting unit level), using the same design: design -svydesign(id=~station + unit, fpc=~probstation+probunits, data=sample, pps=brewer) I got different results using the same sample: svyratio (voting unit) Ratio 2.5% 97.5% Result Cand1 0.05252871 0.04537301 0.05968441 0.05181146 Cand20.47226973 0.45215097 0.49238849 0.49041590 Cand3 0.47520156 0.45460831 0.49579482 0.45777264 svymean (expanded sample, individuals or votes) Mean SE 2.5 % 97.5 %Results Cand1 0.0528433 0.004562755 0.04390047 0.06178614 0.05181146 Cand2 0.4717504 0.010201398 0.45175605 0.49174480 0.49041590 Cand30.4754063 0.010429222 0.45496538 0.49584718 0.45777264 Point estimators are different, and confidence intervals are more narrow using svyratio. Could you give me any clue about what is going on? Thank you in advance. Sebastian On Thu, Oct 11, 2012 at 7:50 PM, Sebastián Daza sebastian.d...@gmail.com wrote: Hello Thomas, I use both svymean (with the expanded sample = people), and svyratio (voting unit level), using the same design: design -svydesign(id=~station + unit, fpc=~probstation+probunits, data=sample, pps=brewer) I got different results using the same sample: svyratio (voting unit) Ratio 2.5% 97.5% Result Cand1 0.05252871 0.04537301 0.05968441 0.05181146 Cand20.47226973 0.45215097 0.49238849 0.49041590 Cand3 0.47520156 0.45460831 0.49579482 0.45777264 svymean (expanded sample, individuals or votes) Mean SE 2.5 % 97.5 %Results Cand1 0.0528433 0.004562755 0.04390047 0.06178614 0.05181146 Cand2 0.4717504 0.010201398 0.45175605 0.49174480 0.49041590 Cand30.4754063 0.010429222 0.45496538 0.49584718 0.45777264 Point estimators are different, and confidence intervals are more narrow using svyratio. Could you give me any clue about what is going on? Thank you in advance. Sebastian On Thu, Oct 11, 2012 at 3:56 PM, Sebastián Daza sebastian.d...@gmail.com wrote: Thank you Thomas! On Thu, Oct 11, 2012 at 2:33 PM, Thomas Lumley tlum...@uw.edu wrote: On Fri, Oct 12, 2012 at 6:56 AM, Sebastián Daza sebastian.d...@gmail.com wrote: Hello, I have got a cluster sample using an election dataset where I already had the final results of a county-specific election. I am trying to figure out what would be the best sampling design for my data. The structure of the dataset is: 1) polling station (in general schools where people vote, for a county, for example, there are 15 polling stations) 2) inside each polling station, there are voting units, where people actually vote (on average there are about 40 voting units for polling station) 3) for each voting unit I have the total votes by candidate (e.g., candidate 1 =322, candidate 2=122, candidate 3= 89) The initial sampling design is: 1) selection of 5 polling stations PPS (based on number of voters) 2) selection of 10 voting units (SRS) I am interested in estimating the proportion of votes by candidate (let's assume we have 3 candidates). My naive estimate would be: votes for candidate 1 / all valid votes = proportion e.g. candidate 1= 2132 / 10874= .1906 candidate 2= 5323 / 10874= .4895 candidate 3= 3419 / 10874= .3144 In this case, the unit of analysis is voters (or votes). If I specify the sampling design using the survey package in this way... design -svydesign(id=~station + unit fpc=~probstation +probunit, data=sample, pps=brewer) svyciprop(~I(candidate1/totalVotes), design) ... I am assuming that the unit of analysis is the voting unit, right? and I am estimating an average among voting units? You want a ratio estimator svyratio(~candidate1, ~totalVotes, design) -thomas -- Thomas Lumley Professor of Biostatistics University of Auckland -- Sebastián Daza -- Sebastián Daza -- Sebastián Daza __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Survey package: using subset option with svyrepdesign
Hi. I'm using American Housing Survey (AHS) data with replicate weights. I want subpopulation estimates. When I try to subset the survey design, I get an error message. I don't get the error message when not using the replicate weights (using the regular svydesign function). Any help would be appreciated. My code and error message are below. dsn - svrepdesign(weights=~wgt90geo,repweights=~repwgt,type=Fay,data=temp2,combined.weights=TRUE,rho=.5) dsub - subset(dsn,subpop==TRUE) Error in x$variables[i, , drop = FALSE] : incorrect number of dimensions Thanks, Brent Mast HUD -- View this message in context: http://r.789695.n4.nabble.com/Survey-package-using-subset-option-with-svyrepdesign-tp4639802.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Survey package: using subset option with svyrepdesign
I found the problem. My data were not in a data.frame. Thanks, Brent -- View this message in context: http://r.789695.n4.nabble.com/Survey-package-using-subset-option-with-svyrepdesign-tp4639802p4639858.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Survey package GLM vs. Linear Mixed Models?
Hi there, haven't heard from anyone and just wondering if anyone had any insight! Thanks. :) -- View this message in context: http://r.789695.n4.nabble.com/Survey-package-GLM-vs-Linear-Mixed-Models-tp4636207p4637797.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Survey package GLM vs. Linear Mixed Models?
Hi there, I'm new to some of these more advanced regression techniques and also new to R. This looks like a great forum. I am trying to examine the association with membership in a group and some different variables, most of which are (approximately) normally distributed. Would just do an ANOVA, but I have correlated data. Everyone in my dataset has a sibling in the sample. The correlation is a nuisance for the purpose of what I'm interested in. I know that some individuals have used linear mixed models and modeled the nuisance correlation as a random effect. My question is -- I've seen quite a few people write about using the GLM with the survey package, which can apparently also handle correlated data, and it seems like there is really detailed documentation for the package, which is a must for me, as I'm still learning. What are the advantages/disadvantages to using this survey package (and specifying a normal distribution) instead of a linear mixed model package? I'm interested in practical issues with implementation in R, as well as statistical concerns that I should be aware of... If LMM is better, which package to people recommend? -- View this message in context: http://r.789695.n4.nabble.com/Survey-package-GLM-vs-Linear-Mixed-Models-tp4636207.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] survey package svystat objects from predict()
Hello, I'm running R 2.14.1 on OS X (x86_64-apple-darwin9.8.0/x86_64 (64-bit)), with version 3.28 of Thomas Lumley's survey package. I was using predict() from svyglm(). E.g.: data(api) dstrat-svydesign(id=~1,strata=~stype, weights=~pw, data=apistrat, fpc=~fpc) out - svyglm(sch.wide~ell+mobility, design=dstrat, family=quasibinomial()) pred.df - expand.grid(ell=c(20,50,80), mobility=20) out.pred - predict(out, pred.df) From the console out.pred looks like this: class(out.pred) [1] svystat print(out.pred) # or just 'out.pred' link SE 1 1.8504 0.2414 2 1.6814 0.3033 3 1.5124 0.5197 From here I wanted to conveniently access the predicted values and SEs. I thought that I might be able to do this using either ftable() or as.data.frame(), as methods for these exist for the objects of class svystat. But while the predicted values come through fine, the SE only gets calculated for the first element and is then repeated: ftable(out.pred) A B 1 1.8504120 0.2413889 2 1.6814293 0.2413889 3 1.5124466 0.2413889 as.data.frame(out.pred) linkSE 1 1.850412 0.2413889 2 1.681429 0.2413889 3 1.512447 0.2413889 I think what's happening is that as.data.frame.svystat() method in the survey package ends up calling the wrong function to calculate the standard errors. From the survey package: as.data.frame.svystat-function(x,...){ rval-data.frame(statistic=coef(x),SE=SE(x)) names(rval)[1]-attr(x,statistic) if (!is.null(attr(x,deff))) rval-cbind(rval,deff=deff(x)) rval } The relevant SE method seems to be: SE.svrepstat-function(object,...){ if (is.list(object)){ object-object[[1]] } vv-as.matrix(attr(object,var)) if (!is.null(dim(object)) length(object)==length(vv)) sqrt(vv) else sqrt(diag(vv)) } Instead of returning sqrt(vv) on each element, it calculates sort(diag(vv)) instead. At least I think this is what's happening. I apologize in advance if all this is the result of some elementary error on my part. Thanks, Kieran -- Kieran Healy :: http://kieranhealy.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] survey package svystat objects from predict()
On Tue, Feb 14, 2012 at 11:45 AM, Kieran Healy kjhe...@gmail.com wrote: Hello, I'm running R 2.14.1 on OS X (x86_64-apple-darwin9.8.0/x86_64 (64-bit)), with version 3.28 of Thomas Lumley's survey package. I was using predict() from svyglm(). E.g.: data(api) dstrat-svydesign(id=~1,strata=~stype, weights=~pw, data=apistrat, fpc=~fpc) out - svyglm(sch.wide~ell+mobility, design=dstrat, family=quasibinomial()) pred.df - expand.grid(ell=c(20,50,80), mobility=20) out.pred - predict(out, pred.df) From the console out.pred looks like this: class(out.pred) [1] svystat print(out.pred) # or just 'out.pred' link SE 1 1.8504 0.2414 2 1.6814 0.3033 3 1.5124 0.5197 From here I wanted to conveniently access the predicted values and SEs. I thought that I might be able to do this using either ftable() or as.data.frame(), as methods for these exist for the objects of class svystat. But while the predicted values come through fine, the SE only gets calculated for the first element and is then repeated: ftable(out.pred) A B 1 1.8504120 0.2413889 2 1.6814293 0.2413889 3 1.5124466 0.2413889 as.data.frame(out.pred) link SE 1 1.850412 0.2413889 2 1.681429 0.2413889 3 1.512447 0.2413889 I think what's happening is that as.data.frame.svystat() method in the survey package ends up calling the wrong function to calculate the standard errors. From the survey package: as.data.frame.svystat-function(x,...){ rval-data.frame(statistic=coef(x),SE=SE(x)) names(rval)[1]-attr(x,statistic) if (!is.null(attr(x,deff))) rval-cbind(rval,deff=deff(x)) rval } The relevant SE method seems to be: SE.svrepstat-function(object,...){ if (is.list(object)){ object-object[[1]] } vv-as.matrix(attr(object,var)) if (!is.null(dim(object)) length(object)==length(vv)) sqrt(vv) else sqrt(diag(vv)) } Mostly correct, but the relevant SE method is actually SE.default survey:::SE.default function (object, ...) { sqrt(diag(vcov(object, ...))) } It can't be SE.svrepstat, because the class is wrong. If you define SE.svystat-function(object,...){ v-vcov(object) if (!is.matrix(v) || NCOL(v)==1) sqrt(v) else sqrt(diag(v)) } it should work. -thomas -- Thomas Lumley Professor of Biostatistics University of Auckland __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] survey package: weights used in svycoxph()
On Mon, 17 May 2010, Vinh Nguyen wrote: Dear R-help, Let me know if I should email r-devel instead of this list. This message is addressed to Professor Lumley or anyone familiar with the survey package. Does svycoxph() implement the method outlined in Binder 1992 as referenced in the help file? Yes. That's why it's referenced. That is, are weights incorporated in the ratio term (numerator and denominator) of the estimating equation? Yes. I don't believe so since svycoxph() calls coxph() of the survival package and weights are applied once in the estimating equation. If the weights are implemented in the ratio, could you point me to where in the code this is done? I would like to estimate as in Binder but with custom weights. Thanks. It happens inside the C code called by coxph(), eg, in survival/src/coxfit2.c Binder's estimating equations are the usual way of applying weights to a Cox model, so nothing special is done apart from calling coxph(). To quote the author of the survival package, Terry Therneau, Other formulae change in the obvious way, eg, the weighted mean $\bar Z$ is changed to include both the risk weights $r$ and the external weights $w$. [Mayo Clinic Biostatistics technical report #52, section 6.2.2] This is mentioned in the help file but I don't quite understand: The main difference between svycoxph function and the robust=TRUE option to coxph in the survival package is that this function accounts for the reduction in variance from stratified sampling and the increase in variance from having only a small number of clusters. The point estimates from coxph() are the same as those from svycoxph() (with the same weights). The standard errors are almost the same. There are two differences. The first is the use of 1/(nclusters -1) rather than 1/nclusters as a divisor. The second is that svycoxph() computes variances using estimating functions centered at zero in each *sampling* stratum whereas coxph() centers them at zero in each baseline hazard stratum, as supplied in the strata() argument to coxph(). -thomas Thomas Lumley Assoc. Professor, Biostatistics tlum...@u.washington.eduUniversity of Washington, Seattle __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] survey package: weights used in svycoxph()
On Tue, May 18, 2010 at 8:50 AM, Thomas Lumley tlum...@u.washington.edu wrote: I don't believe so since svycoxph() calls coxph() of the survival package and weights are applied once in the estimating equation. If the weights are implemented in the ratio, could you point me to where in the code this is done? I would like to estimate as in Binder but with custom weights. Thanks. It happens inside the C code called by coxph(), eg, in survival/src/coxfit2.c Thank you for your clarification. I mistakenly assumed weights only appeared once in the estimating equation, creating a weighted sum of the score equation. Thinking in retrospect if the weights are to be used as case weights they better be in the ratio term as well (wherever there is an at risk indicator). Binder's estimating equations are the usual way of applying weights to a Cox model, so nothing special is done apart from calling coxph(). To quote the author of the survival package, Terry Therneau, Other formulae change in the obvious way, eg, the weighted mean $\bar Z$ is changed to include both the risk weights $r$ and the external weights $w$. [Mayo Clinic Biostatistics technical report #52, section 6.2.2] Don't see a section 6.2.2 in this technical report. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] survey package: weights used in svycoxph()
On Tue, 18 May 2010, Vinh Nguyen wrote: Binder's estimating equations are the usual way of applying weights to a Cox model, so nothing special is done apart from calling coxph(). To quote the author of the survival package, Terry Therneau, Other formulae change in the obvious way, eg, the weighted mean $\bar Z$ is changed to include both the risk weights $r$ and the external weights $w$. [Mayo Clinic Biostatistics technical report #52, section 6.2.2] Don't see a section 6.2.2 in this technical report. Sorry, #58 -thomas Thomas Lumley Assoc. Professor, Biostatistics tlum...@u.washington.eduUniversity of Washington, Seattle __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] survey package: weights used in svycoxph()
Dear R-help, Let me know if I should email r-devel instead of this list. This message is addressed to Professor Lumley or anyone familiar with the survey package. Does svycoxph() implement the method outlined in Binder 1992 as referenced in the help file? That is, are weights incorporated in the ratio term (numerator and denominator) of the estimating equation? I don't believe so since svycoxph() calls coxph() of the survival package and weights are applied once in the estimating equation. If the weights are implemented in the ratio, could you point me to where in the code this is done? I would like to estimate as in Binder but with custom weights. Thanks. This is mentioned in the help file but I don't quite understand: The main difference between svycoxph function and the robust=TRUE option to coxph in the survival package is that this function accounts for the reduction in variance from stratified sampling and the increase in variance from having only a small number of clusters. Vinh __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] survey package: weights used in svycoxph()
Dear R-help, Let me know if I should email r-devel instead of this list. This message is addressed to Professor Lumley or anyone familiar with the survey package. Does svycoxph() implement the method outlined in Binder 1992 as referenced in the help file? That is, are weights incorporated in the ratio term (numerator and denominator) of the estimating equation? I don't believe so since svycoxph() calls coxph() of the survival package and weights are applied once in the estimating equation. If the weights are implemented in the ratio, could you point me to where in the code this is done? I would like to estimate as in Binder but with custom weights. Thanks. This is mentioned in the help file but I don't quite understand: The main difference between svycoxph function and the robust=TRUE option to coxph in the survival package is that this function accounts for the reduction in variance from stratified sampling and the increase in variance from having only a small number of clusters. Vinh __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] survey package question
Should the svyby function be able to work with svyquantile? I get the error below ... data(api) dclus1-svydesign(id=~dnum, weights=~pw, data=apiclus1, fpc=~fpc) svyby(~api00, design=dclus1, by = ~stype, quantiles=c(.25,.5,.75), FUN=svyquantile, na.rm=T ) Error in object$coefficients : $ operator is invalid for atomic vectors A more general question is: can quantiles and their SEs be computed for subgroups? The same syntax works using svymean. Thanks R. Valliant Univ. of Maryland, US __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] survey package question
On Thu, 18 Feb 2010, Richard Valliant wrote: Should the svyby function be able to work with svyquantile? I get the error below ... It works, but you need to either specify ci=TRUE or keep.var=FALSE. The problem is that svyquantile() by default does not produce standard errors. svyby(~api00, ~stype, design=dclus1,svyquantile, quantile=c(0.25,0.5,.75),ci=TRUE) stype 0.25 0.5 0.75 se.0.25 se.0.5 se.0.75 E E 553.00 652.0 729.0 27.74745 37.81025 17.12898 H H 523.00 608.0 699.5 46.01881 65.82488 33.38789 M M 532.75 636.5 696.5 60.78990 43.12310 55.27555 svyby(~api00, ~stype, design=dclus1,svyquantile, quantile=c(0.25,0.5,.75), keep.var=FALSE) stype statistic1 statistic2 statistic3 E E 553.00 652.0 729.0 H H 523.00 608.0 699.5 M M 532.75 636.5 696.5 A more general question is: can quantiles and their SEs be computed for subgroups? You can also use subset(), which is what svyby() does internally svyquantile(~api00, quantile=c(0.25,0.5,0.75), subset(dclus1, stype==E)) 0.25 0.5 0.75 api00 553 652 729 -thomas Thomas Lumley Assoc. Professor, Biostatistics tlum...@u.washington.eduUniversity of Washington, Seattle __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Survey Package with Binary Data (no Standard Errors reported)
On Fri, 3 Apr 2009, Paul Jones wrote: Hi, I'm trying to get standard errors for some of the variables in my data frame. One of the questions on my survey is whether faculty coordinate across curriculum to include Arts Education as subject matter. All the responses are coded in zeros and ones obviously. For some of the other variables I have a 2 for those that responded with Don't Know. I'm getting NA for mean and standard deviations from svymean. Am I doing something wrong of can the survey package not handle this type of data? Are you sure you don't have any NA values in the data? If you do, the na.rm=TRUE option to svymean() will fix your problem. If you don't, something mysterious is happening. You could try svytable(~Curriculum, survey), which will give a tabulation and might show up what is strange about your data. As a separate issue, you might want to look at svyciprop() if some of your proportions are close to 0 or 1, to get better confidence intervals. Here's my code. Nothing obviously wrong with it. -thomas survey - svydesign(id=~1, data=General, strata=~Grade.Level) Warning message: In svydesign.default(id = ~1, data = General, strata = ~Grade.Level) : No weights or probabilities supplied, assuming equal probability summary(survey) Stratified Independent Sampling design (with replacement) svydesign(id = ~1, data = General, strata = ~Grade.Level) Probabilities: Min. 1st Qu. MedianMean 3rd Qu.Max. 1 1 1 1 1 1 Stratum Sizes: Elementary High Middle obs 312 236156 design.PSU312 236156 actual.PSU312 236156 Data variables: [1] Grade.Level Curriculum [3] Field.Trips Residencies[5] PTA.Support Community.Open.Performances [7] Visual.Arts.Attendance Literary.Arts.Attendance [9] Arts.Organization.Membership Arts.Essential svymean(~Curriculum, survey) mean SE Curriculum NA NA ??? PJ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Thomas Lumley Assoc. Professor, Biostatistics tlum...@u.washington.eduUniversity of Washington, Seattle __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Survey Package with Binary Data (no Standard Errors reported)
Hi, I'm trying to get standard errors for some of the variables in my data frame. One of the questions on my survey is whether faculty coordinate across curriculum to include Arts Education as subject matter. All the responses are coded in zeros and ones obviously. For some of the other variables I have a 2 for those that responded with Don't Know. I'm getting NA for mean and standard deviations from svymean. Am I doing something wrong of can the survey package not handle this type of data? Here's my code. survey - svydesign(id=~1, data=General, strata=~Grade.Level) Warning message: In svydesign.default(id = ~1, data = General, strata = ~Grade.Level) : No weights or probabilities supplied, assuming equal probability summary(survey) Stratified Independent Sampling design (with replacement) svydesign(id = ~1, data = General, strata = ~Grade.Level) Probabilities: Min. 1st Qu. MedianMean 3rd Qu.Max. 1 1 1 1 1 1 Stratum Sizes: Elementary High Middle obs 312 236156 design.PSU312 236156 actual.PSU312 236156 Data variables: [1] Grade.Level Curriculum [3] Field.Trips Residencies [5] PTA.Support Community.Open.Performances [7] Visual.Arts.Attendance Literary.Arts.Attendance [9] Arts.Organization.Membership Arts.Essential svymean(~Curriculum, survey) mean SE Curriculum NA NA ??? PJ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] survey package
Hello I have a problem using the package survey: I'm trying to calculate the prevalence of a disease in animals sampled using a 2 stages sampling system: first level: farm randomly chosen within 551 farms second level: animals randomly chosen in the farms My data base has this aspect: num esp Quarters Totcat Totshp Totgt Tbtpos fpc1 Totanim Id_An 115 2045 G01-Q1 1 112 0 551 14 115 116 2045 G01-Q1 1 112 0 551 14 116 117 2045 G01-Q1 1 112 0 551 14 117 118 2045 G01-Q1 1 112 0 551 14 118 where num is the farm ID and where Id_An is the animal ID R accept to design my sampling with the function svydesign clustot-svydesign(id=~num+Id_An, fpc=~fpc1+Totanim, data=tab1) summary(clustot) 2 - level Cluster Sampling design With (419, 11593) clusters. svydesign(id = ~num + Id_An, fpc = ~fpc1 + Totanim, data = tab1) Probabilities: Min. 1st Qu. MedianMean 3rd Qu.Max. 0.1086 0.3802 0.4986 0.4952 0.6145 0.7476 Population size (PSUs): 551 Data variables: [1] num esp Quarters Totcat Totshp TotgtTbtpos fpc1 Totanim Id_An but when I try to get some stat on it I obtain this error message svymean(~Tbtpos,clustot,na.rm = TRUE) Erreur dans switch(lonely.psu, certainty = scale * crossprod(x), remove = scale * : Stratum (1.629) has only one PSU at stage 2 svytotal(~Tbtpos, clustot) Erreur dans switch(lonely.psu, certainty = scale * crossprod(x), remove = scale * : Stratum (1.629) has only one PSU at stage 2 and with options(error=recover) I obtain svymean(~Tbtpos,clustot,na.rm = TRUE) Erreur dans switch(lonely.psu, certainty = scale * crossprod(x), remove = scale * : Stratum (1.629) has only one PSU at stage 2 Enter a frame number, or 0 to exit 1: svymean(~Tbtpos, clustot, na.rm = TRUE) 2: svymean.survey.design2(~Tbtpos, clustot, na.rm = TRUE) 3: svyrecvar(x * pweights/psum, design$cluster, design$strata, design$fpc, postStrata = design$postStrata) 4: multistage(x, clusters, stratas, fpcs$sampsize, fpcs$popsize, lonely.psu = getOption(survey.lonely.psu), one.stage = one.stage, stage = 1, cal = cal) 5: by(1:n, list(as.numeric(clusters[, 1])), function(index) { 6: by.default(1:n, list(as.numeric(clusters[, 1])), function(index) { 7: by(as.data.frame(data), INDICES, FUN, ...) 8: by.data.frame(as.data.frame(data), INDICES, FUN, ...) 9: eval(substitute(tapply(1:nd, IND, FUNx)), data) 10: eval(expr, envir, enclos) 11: tapply(1:11593, list(c(2045, 2046, 2070, 2070, 2070, 2071, 2230, 2280, 2304, 2045, 2045, 2045, 2045, 2045, 2045, 2046, 2046, 2046, 2046, 2046, 2046, 20 12: lapply(split(X, group), FUN, ...) 13: FUN(X[[23]], ...) 14: FUN(data[x, ], ...) 15: multistage(x[index, , drop = FALSE], clusters[index, -1, drop = FALSE], stratas[index, -1, drop = FALSE], nPSUs[index, -1, drop = FALSE], fpcs[index, - 16: onestage(x, stratas[, 1], clusters[, 1], nPSUs[, 1], fpcs[, 1], lonely.psu = lonely.psu, stage = stage, cal = cal) 17: tapply(1:NROW(x), list(factor(strata)), function(index) { 18: lapply(split(X, group), FUN, ...) 19: FUN(X[[1]], ...) 20: onestrat(x[index, , drop = FALSE], clusters[index], nPSU[index][1], fpc[index][1], lonely.psu = lonely.psu, stratum = strata[index][1], stage = stage, 21: switch(lonely.psu, certainty = scale * crossprod(x), remove = scale * crossprod(x), adjust = scale * crossprod(x), average = NA * crossprod(x), fail = I don't understand the error message and I don't know how to ask R to give me the stratum which has a problem or the problematic stage either ...??? Thanks in advance AHOUSSOU Sylvie Vétérinaire Epidémiologiste CIRAD Domaine Duclos 97 170 Petit-Bourg tel : 05 90 25 59 47 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] survey package: proportion estimate confidence intervals using svymean
Using the survey package I find it is convenient and easy to get estimated proportions using svymean, and their corresponding estimated standard errors. But is there any elegant/simple way to calculate corresponding confidence intervals for those proportions? Of course +/- 1.96 s.e. is a reasonable approximation for a 95% CI, but (incorrectly) assumes symmetrical distribution for a proportion. About all I can think of is to transform onto some scale like the logit scale, use the delta method to estimate standard errors, and transform back for appropriate confidence intervals. Or perhaps use the machinery of the survey package svyglm function with logit link (family=quasibinomial) to do the calculations for me. Still, seems rather cumbersome. Is there a better solution? Thanks for any suggestions or pointing out the errors in my logic! Peter Holck __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] survey package: proportion estimate confidence intervals using svymean
On Tue, 12 Feb 2008, Peter Holck wrote: Using the survey package I find it is convenient and easy to get estimated proportions using svymean, and their corresponding estimated standard errors. But is there any elegant/simple way to calculate corresponding confidence intervals for those proportions? Of course +/- 1.96 s.e. is a reasonable approximation for a 95% CI, but (incorrectly) assumes symmetrical distribution for a proportion. 'Incorrectly' is a bit strong, I think. About all I can think of is to transform onto some scale like the logit scale, use the delta method to estimate standard errors, and transform back for appropriate confidence intervals. Or perhaps use the machinery of the survey package svyglm function with logit link (family=quasibinomial) to do the calculations for me. Still, seems rather cumbersome. Is there a better solution? I think the most straightforward solution would be svyglm(). You could also use svycontrast() to do the delta-method calculations for you. -thomas Thomas Lumley Assoc. Professor, Biostatistics [EMAIL PROTECTED] University of Washington, Seattle __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.