Re: [R] Survey package/svyby source code help

2020-02-11 Thread AndertechLLC--- via R-help


Thank you for responding.
I am truly grateful.
Apologies for omitting evident and pertinent information.

I am using 3.36.  I will update to 3.37.  I did not notice the newer version.

I realize I needed to be more specific.  The attr(, "var") that I am interested 
in is displayed with str(results) after the results object is declared.  First 
line of the subject code looks like:

results <- (if (multicore) parallel::mcapply else lapply)(uniques, 
function(i){

How that line functions without error is beyond me, but minor to my objective.

Disclosing my prime goal, I am trying to produce the same values generated by 
the confint function, which receives the standard error values from this svyby 
function.  I am trying to replicate the standard error calculation in R and use 
it in a known program called Tableau.

If any other information is needed please let me know.

My sincerest gratitude.





‐‐‐ Original Message ‐‐‐
On Tuesday, February 11, 2020 9:42 AM, Ivan Krylov  
wrote:

> On Tue, 11 Feb 2020 02:33:45 +
> AndertechLLC--- via R-help r-help@r-project.org wrote:
>
> > When debugging the code I am not following the generation of values
> > in the results object attr(*, "var")" after line 57 completes. These
> > values are fed into line 74 (rval <- t(sapply(results, unwrap))).
>
> Which version of the survey package you have in mind? I took a look at
> the latest version available on CRAN (3.37 at the time of this post,
> []) and the R/surveyby.R file has somewhat different code on lines 57
> and 74.
> In particular, attr(, "var") is assigned much later, from variablenamed 
> covmat.mat, which is also computed later than line 74.
>
> ---
>
> Best regards,
> Ivan
>
> [*] https://cran.r-project.org/src/contrib/survey_3.37.tar.gz

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Survey package/svyby source code help

2020-02-11 Thread Ivan Krylov
On Tue, 11 Feb 2020 15:23:14 +
andertech...@protonmail.com wrote:

> The attr(, "var") that I am interested in is displayed with
> str(results) after the results object is declared.  First line of the
> subject code looks like:
> 
> results <- (if (multicore) parallel::mcapply else lapply)(uniques,
> function(i){

'if' returns whatever the taken branch returns (it is documented
in ?'if', section "Value").

The rest of the statement applies the function FUN (passed as argument
to svyby) to the data (passed as `formula` argument), so the "var"
attribute you are looking for in fact comes from the FUN invocation and
not from svyby itself. Most examples from ?svyby use FUN=svymean, so
perhaps this is where you should be going next. Make sure to check out
R-Intro [*] to avoid common pitfalls when working with methods in R.

-- 
Best regards,
Ivan

[*]
https://cran.r-project.org/doc/manuals/r-release/R-intro.html#Object-orientation

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Survey package/svyby source code help

2020-02-11 Thread Ivan Krylov
On Tue, 11 Feb 2020 02:33:45 +
AndertechLLC--- via R-help  wrote:

> When debugging the code I am not following the generation of values
> in the results object attr(*, "var")" after line 57 completes. These
> values are fed into line 74 (rval <- t(sapply(results, unwrap))).

Which version of the survey package you have in mind? I took a look at
the latest version available on CRAN (3.37 at the time of this post,
[*]) and the R/surveyby.R file has somewhat different code on lines 57
and 74.

In particular, attr(*, "var") is assigned much later, from variable
named covmat.mat, which is also computed later than line 74.

-- 
Best regards,
Ivan

[*] https://cran.r-project.org/src/contrib/survey_3.37.tar.gz

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Survey package/svyby source code help

2020-02-11 Thread AndertechLLC--- via R-help
Good day,

I was looking for some help with understanding a particular portion of the 
svyby source code.
When debugging the code I am not following the generation of values in the 
results object attr(*, "var")" after line 57 completes.
These values are fed into line 74 (rval <- t(sapply(results, unwrap))).
Alternatively I invite explanation of how the SE values are created in the rval 
object on line 74.

the data entering svyby is from a svydesign.default object.

I am especially confused by the syntax in 57 - 59, which is not the primary 
question, but would appreciate any input.

I am new to R and this mailing list and do apologize for committing faux-pas.

Thanks in advance.
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] survey package rake results in very large weights

2015-01-19 Thread Imran Akbar
Hi,
I'm trying to use both the survey package and the anesrake package to
perform raking on my sample dataset, to get the proportions to match up
with population data from the census.

I'm trying the following, but the resulting weights are very large:

 svy.unweighted - svydesign(ids=~1, data=cleanPanelistData)
 svy.rake - rake(design = svy.unweighted,
   sample.margins = list(~age, ~education, ~ethnicity,
~gender, ~income, ~location, ~size),
   population.margins = list(censusAge,
censusEducation, censusEthnicity, censusGender, censusIncome,
censusLocation, censusSize))

 summary(weights(svy.rake))
Min.  1st Qu.   Median Mean  3rd Qu. Max.
   133.9567.6   1062.0   2606.0   2305.0 396800.0

When I use the anesrake package, the weights are capped at 5 and I'm able
to get a design effect of around 3.

My data set in .Rdata format is available here http://we.tl/bUAuW3ajVs
.
Am I doing something wrong?  Should I be using calibrate with
calfun='raking' instead of rake?

This was run on:
RStudio on Mac Mavericks, version 3.02
with survey_3.30-3

thanks,
imran

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] survey package -- doesn't appear to match svy

2014-10-28 Thread Anthony Damico
could you provide a minimal reproducible example?  perhaps use ?dput.


in general the survey package matches all other languages
http://journal.r-project.org/archive/2009-2/RJournal_2009-2_Damico.pdf


here's an example of a minimal reproducible example that does match
http://www.ats.ucla.edu/stat/stata/faq/svy_stata_subpop.htm


library(foreign)
library(survey)
x - read.dta( 
http://www.ats.ucla.edu/stat/stata/seminars/svy_stata_intro/strsrs.dta; )
y - svydesign( ~ 1 , strat = ~strat , data = x , weights = ~ pw , fpc =
~fpc )
svytable( ~ yr_rnd , y )
z - subset( y , yr_rnd == 1 )
svymean( ~ ell , z )



there are lots of examples of R matching official published statistics
(often computed with stata) here--
https://github.com/ajdamico/usgsd/search?utf8=%E2%9C%93q=replication





On Mon, Oct 27, 2014 at 11:11 AM, Stephen Amrock stephen.amr...@gmail.com
wrote:

 Hi,

 I'm new to R and have encountered two issues in coding using the survey
 package:

 (1) Code from *svytable* using survey package does not correspond to
 Stata estimates from *svy: tab*. I call

 svyd.nation - svydesign(ids = ~1, probs = ~wt_national, strata =
 ~stratum, data=nats.sub)
 svytable(formula = ~wpev, design = svyd.nation, Ntotal = 100)

 where the equivalent Stata would be:

 svyset [pw=wt_national], strata(stratum)
 svy: tab wpev

 These are inverse probability weights but Stata and R give different %s for
 the tabulations. I know from published results that the Stata code is
 correct.  Any ideas as to what I've done incorrectly in R?

 (2) Alternative weights---which I've verified in R have the same
 distribution as in Stata---produce a strange inf output.

 svyd.st - svydesign(ids = ~1, probs = ~wt_state, strata = ~stratum,
 data=nats.sub)
 svytable(~wpev, design = svyd.st, Ntotal=100)

 This produces the following output:

 wpev
 0 1

 The code
 svytable(~wpev, svyd.st)

 produces the output:

 wpev
 0   1
 Inf Inf

 Why are these alternative weights -- which work in Stata when resetting
 svyset -- not working in R?

 Any insights would be much appreciated!! Many thanks!
 - S

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] survey package -- doesn't appear to match svy

2014-10-27 Thread Stephen Amrock
Hi,

I'm new to R and have encountered two issues in coding using the survey
package:

(1) Code from *svytable* using survey package does not correspond to
Stata estimates from *svy: tab*. I call

svyd.nation - svydesign(ids = ~1, probs = ~wt_national, strata =
~stratum, data=nats.sub)
svytable(formula = ~wpev, design = svyd.nation, Ntotal = 100)

where the equivalent Stata would be:

svyset [pw=wt_national], strata(stratum)
svy: tab wpev

These are inverse probability weights but Stata and R give different %s for
the tabulations. I know from published results that the Stata code is
correct.  Any ideas as to what I've done incorrectly in R?

(2) Alternative weights---which I've verified in R have the same
distribution as in Stata---produce a strange inf output.

svyd.st - svydesign(ids = ~1, probs = ~wt_state, strata = ~stratum,
data=nats.sub)
svytable(~wpev, design = svyd.st, Ntotal=100)

This produces the following output:

wpev
0 1

The code
svytable(~wpev, svyd.st)

produces the output:

wpev
0   1
Inf Inf

Why are these alternative weights -- which work in Stata when resetting
svyset -- not working in R?

Any insights would be much appreciated!! Many thanks!
- S

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Survey package help with svystandardize

2013-01-03 Thread Chris Webb
I am trying to age standardize using the svystandardize package in R. I
have successfully managed to hit my SUDAAN based targets for estimates by
sex, but not the total. The total is only a little different, but I'd like
some help knowing why it isn't exact. I've included the SUDAAN code that
generates the targets and my R script (and output) that I have so far. I
can't supply the data since that would violate usage agreements.

* Sorry if the output isn't formatted nicely.

SUDAAN results:

Variance Estimation Method: Taylor Series (WR)

Standardized estimates

by: Variable, Sex of respondent.





| ||
Sex of respondent   |
| Variable|
 |-|
| ||
Total| Male  | Female  |



|   |  |
||
|
| Current Smoker: | Sample Size  |   14732|5582
  |9150|
| At risk | Weighted Size   | 18073876.32 |
 8917534.53 |  9156341.79 |
|  | Total|
 3477083.28   |  2105598.41 |  1371484.87 |
|  | Percent   |
18.84 |   22.78|   14.87   |

|  | SE Percent  |0.60
 |0.95 |0.73 |

|  | Lower 95% Limit  |
 | | |

|  |  Percent  |
17.69  |   20.97   |   13.50|

|  | Upper 95% Limit  |
  || |

|  |  Percent |   20.05
  |   24.69   |   16.34  |




-


R Output:

Estimate for the total is off.

 svyciprop(~I(rfsmok==At risk), stdes, na.rm=TRUE, ci=TRUE,
vartype=ci)
  2.5%
 97.5%
I(rfsmok == At risk) 0.187685   0.176309   0.19962


Estimates by sex are a match.

 svyby(~I(rfsmok==At risk), ~sex, stdes, svyciprop, na.rm=TRUE, ci=TRUE,
vartype=ci)
  sex I(rfsmok == At risk)  ci_l
ci_u
Male Male   0.2277632725  0.2097154285   0.2468790907
Female Female  0.1486524831 0.13498320660.163481


-

R Code:

sample.brfss - svydesign(id=~brfss.2011$SEQNO, strata=~brfss.2011$ststr,
weights=~brfss.2011$LLCPWT, data=brfss.2011, nest=T)
popage - c(0.128810, 0.182648, 0.219077, 0.299194, 0.170271)
options(survey.lonely.psu=adjust)
testing_dataset - subset(sample.brfss, (!is.na(agegr5)  !is.na(sex)  !
is.na(rfsmok)))
stdes-svystandardize(testing_dataset, by=~agegr5, over=~sex,
population=popage)

-


SUDAAN code:

PROC DESCRIPT DATA = h:\\brfss_data\\brfss11\\pudf11\\txbrfss_pudf11
DESIGN = WR FILETYPE= SUDXPORT NOTSORTED;
NEST STSTR SEQNO/MISSUNIT;
WEIGHT LLCPWT;
SETENV LINESIZE = 106 PAGESIZE = 45 COLWIDTH = 11 TOPMGN = 1;
VAR  RFSMOK;
CATLEVEL  2;
STDVAR AGEGR5;
STDWGT .128810  .182648  .219077  .299194  .170271;
SUBGROUP  SEX AGEGR5;
LEVELS  2  5  ;
TABLES  (SEX);

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] survey package question

2012-10-11 Thread Sebastián Daza
Hello,

I have got a cluster sample using an election dataset where I already
had the final results of a county-specific election. I am trying to
figure out what would be the best sampling design for my data.

The  structure of the dataset is:

1) polling station (in general schools where people vote, for a
county, for example, there are 15 polling stations)
2) inside each polling station, there are voting units, where people
actually vote (on average there are about 40 voting units for polling
station)
3) for each voting unit I have the total votes by candidate (e.g.,
candidate 1 =322, candidate 2=122, candidate 3= 89)

The initial sampling design is:
1) selection of 5 polling stations PPS (based on number of voters)
2) selection of 10 voting units (SRS)

I am interested in estimating the proportion of votes by candidate
(let's assume we have 3 candidates). My naive estimate would be:

votes for candidate 1 / all valid votes = proportion

e.g.

candidate 1= 2132 / 10874= .1906
candidate 2= 5323 / 10874= .4895
candidate 3= 3419 / 10874= .3144

In this case, the unit of analysis is voters (or votes).

 If I specify the sampling design using the survey package in this way...

design -svydesign(id=~station + unit  fpc=~probstation +probunit,
data=sample, pps=brewer)

svyciprop(~I(candidate1/totalVotes), design)

... I am assuming that the unit of analysis is the voting unit, right?
and I am estimating an average among voting units?

I should expand my database at individual level (voters), or I just
have to include a unit weight according to the number of voters for
voting unit? In other words, is there a way to estimate, for instance,
votes for candidate 1 / all valid votes = proportion, directly from
the survey package or I have to expand  the database at people level
(voters), and then estimate the proportion using svymean and the
respective design.

I would appreciate any advice or help.

Sebastian

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] survey package question

2012-10-11 Thread Thomas Lumley
On Fri, Oct 12, 2012 at 6:56 AM, Sebastián Daza
sebastian.d...@gmail.com wrote:
 Hello,

 I have got a cluster sample using an election dataset where I already
 had the final results of a county-specific election. I am trying to
 figure out what would be the best sampling design for my data.

 The  structure of the dataset is:

 1) polling station (in general schools where people vote, for a
 county, for example, there are 15 polling stations)
 2) inside each polling station, there are voting units, where people
 actually vote (on average there are about 40 voting units for polling
 station)
 3) for each voting unit I have the total votes by candidate (e.g.,
 candidate 1 =322, candidate 2=122, candidate 3= 89)

 The initial sampling design is:
 1) selection of 5 polling stations PPS (based on number of voters)
 2) selection of 10 voting units (SRS)

 I am interested in estimating the proportion of votes by candidate
 (let's assume we have 3 candidates). My naive estimate would be:

 votes for candidate 1 / all valid votes = proportion

 e.g.

 candidate 1= 2132 / 10874= .1906
 candidate 2= 5323 / 10874= .4895
 candidate 3= 3419 / 10874= .3144

 In this case, the unit of analysis is voters (or votes).

  If I specify the sampling design using the survey package in this way...

 design -svydesign(id=~station + unit  fpc=~probstation +probunit,
 data=sample, pps=brewer)

 svyciprop(~I(candidate1/totalVotes), design)

 ... I am assuming that the unit of analysis is the voting unit, right?
 and I am estimating an average among voting units?


You want a ratio estimator

svyratio(~candidate1, ~totalVotes, design)


   -thomas

-- 
Thomas Lumley
Professor of Biostatistics
University of Auckland

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] survey package question

2012-10-11 Thread Sebastián Daza
Hello Thomas,

I use both svymean (with the expanded sample = people), and svyratio
(voting unit level), using the same design:

design -svydesign(id=~station + unit, fpc=~probstation+probunits,
data=sample, pps=brewer)

I got different results using the same sample:

svyratio (voting unit)

Ratio   2.5%  97.5% Result
Cand1  0.05252871 0.04537301 0.05968441 0.05181146
Cand20.47226973 0.45215097 0.49238849 0.49041590
Cand3   0.47520156 0.45460831 0.49579482 0.45777264

svymean (expanded sample, individuals or votes)

   Mean  SE  2.5 % 97.5 %Results
Cand1   0.0528433 0.004562755 0.04390047 0.06178614 0.05181146
Cand2 0.4717504 0.010201398 0.45175605 0.49174480 0.49041590
Cand30.4754063 0.010429222 0.45496538 0.49584718 0.45777264

Point estimators are different, and confidence intervals are more
narrow using svyratio.
Could you give me any clue about what is going on?

Thank you in advance.
Sebastian

On Thu, Oct 11, 2012 at 7:50 PM, Sebastián Daza
sebastian.d...@gmail.com wrote:
 Hello Thomas,

 I use both svymean (with the expanded sample = people), and svyratio
 (voting unit level), using the same design:

 design -svydesign(id=~station + unit, fpc=~probstation+probunits,
 data=sample, pps=brewer)

 I got different results using the same sample:

 svyratio (voting unit)

 Ratio   2.5%  97.5% Result
 Cand1  0.05252871 0.04537301 0.05968441 0.05181146
 Cand20.47226973 0.45215097 0.49238849 0.49041590
 Cand3   0.47520156 0.45460831 0.49579482 0.45777264

 svymean (expanded sample, individuals or votes)

Mean  SE  2.5 % 97.5 %Results
 Cand1   0.0528433 0.004562755 0.04390047 0.06178614 0.05181146
 Cand2 0.4717504 0.010201398 0.45175605 0.49174480 0.49041590
 Cand30.4754063 0.010429222 0.45496538 0.49584718 0.45777264

 Point estimators are different, and confidence intervals are more
 narrow using svyratio.
 Could you give me any clue about what is going on?

 Thank you in advance.
 Sebastian

 On Thu, Oct 11, 2012 at 3:56 PM, Sebastián Daza
 sebastian.d...@gmail.com wrote:
 Thank you Thomas!

 On Thu, Oct 11, 2012 at 2:33 PM, Thomas Lumley tlum...@uw.edu wrote:
 On Fri, Oct 12, 2012 at 6:56 AM, Sebastián Daza
 sebastian.d...@gmail.com wrote:
 Hello,

 I have got a cluster sample using an election dataset where I already
 had the final results of a county-specific election. I am trying to
 figure out what would be the best sampling design for my data.

 The  structure of the dataset is:

 1) polling station (in general schools where people vote, for a
 county, for example, there are 15 polling stations)
 2) inside each polling station, there are voting units, where people
 actually vote (on average there are about 40 voting units for polling
 station)
 3) for each voting unit I have the total votes by candidate (e.g.,
 candidate 1 =322, candidate 2=122, candidate 3= 89)

 The initial sampling design is:
 1) selection of 5 polling stations PPS (based on number of voters)
 2) selection of 10 voting units (SRS)

 I am interested in estimating the proportion of votes by candidate
 (let's assume we have 3 candidates). My naive estimate would be:

 votes for candidate 1 / all valid votes = proportion

 e.g.

 candidate 1= 2132 / 10874= .1906
 candidate 2= 5323 / 10874= .4895
 candidate 3= 3419 / 10874= .3144

 In this case, the unit of analysis is voters (or votes).

  If I specify the sampling design using the survey package in this way...

 design -svydesign(id=~station + unit  fpc=~probstation +probunit,
 data=sample, pps=brewer)

 svyciprop(~I(candidate1/totalVotes), design)

 ... I am assuming that the unit of analysis is the voting unit, right?
 and I am estimating an average among voting units?


 You want a ratio estimator

 svyratio(~candidate1, ~totalVotes, design)


-thomas

 --
 Thomas Lumley
 Professor of Biostatistics
 University of Auckland



 --
 Sebastián Daza



 --
 Sebastián Daza



-- 
Sebastián Daza

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Survey package: using subset option with svyrepdesign

2012-08-09 Thread BrentMast
Hi. I'm using American Housing Survey (AHS) data with replicate weights. I
want subpopulation estimates. When I try to subset the survey design, I get
an error message. I don't get the error message when not using the replicate
weights (using  the regular svydesign function). 

Any help would be appreciated. My code and error message are below.

 dsn -
 svrepdesign(weights=~wgt90geo,repweights=~repwgt,type=Fay,data=temp2,combined.weights=TRUE,rho=.5)
 dsub - subset(dsn,subpop==TRUE)
Error in x$variables[i, , drop = FALSE] : incorrect number of dimensions

Thanks,
Brent Mast
HUD



--
View this message in context: 
http://r.789695.n4.nabble.com/Survey-package-using-subset-option-with-svyrepdesign-tp4639802.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Survey package: using subset option with svyrepdesign

2012-08-09 Thread BrentMast
I found the problem. My data were not in a data.frame.

Thanks,
Brent



--
View this message in context: 
http://r.789695.n4.nabble.com/Survey-package-using-subset-option-with-svyrepdesign-tp4639802p4639858.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Survey package GLM vs. Linear Mixed Models?

2012-07-25 Thread RNewby
Hi there,  haven't heard from anyone and just wondering if anyone had any
insight!  Thanks.  :)



--
View this message in context: 
http://r.789695.n4.nabble.com/Survey-package-GLM-vs-Linear-Mixed-Models-tp4636207p4637797.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Survey package GLM vs. Linear Mixed Models?

2012-07-11 Thread RNewby
Hi there,

I'm new to some of these more advanced regression techniques and also new to
R.  This looks like a great forum.

I am trying to examine the association with membership in a group and some
different variables, most of which are (approximately) normally distributed. 
Would just do an ANOVA, but I have correlated data.  Everyone in my dataset
has a sibling in the sample.  The correlation is a nuisance for the purpose
of what I'm interested in.

I know that some individuals have used linear mixed models and modeled the
nuisance correlation as a random effect.

My question is -- I've seen quite a few people write about using the GLM
with the survey package, which can apparently also handle correlated data,
and it seems like there is really detailed documentation for the package,
which is a must for me, as I'm still learning.  What are the
advantages/disadvantages to using this survey package (and specifying a
normal distribution) instead of a linear mixed model package?  I'm
interested in practical issues with implementation in R, as well as
statistical concerns that I should be aware of...

If LMM is better, which package to people recommend?

--
View this message in context: 
http://r.789695.n4.nabble.com/Survey-package-GLM-vs-Linear-Mixed-Models-tp4636207.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] survey package svystat objects from predict()

2012-02-13 Thread Kieran Healy
Hello, 

I'm running R 2.14.1 on OS X (x86_64-apple-darwin9.8.0/x86_64 (64-bit)), with 
version 3.28 of Thomas Lumley's survey package. I was using predict() from 
svyglm(). E.g.:

data(api)
dstrat-svydesign(id=~1,strata=~stype, weights=~pw, data=apistrat, fpc=~fpc)
out - svyglm(sch.wide~ell+mobility, design=dstrat,
family=quasibinomial())
pred.df - expand.grid(ell=c(20,50,80), mobility=20)
out.pred - predict(out, pred.df)

From the console out.pred looks like this:

 class(out.pred)
[1] svystat

 print(out.pred) # or just 'out.pred'
link SE
1 1.8504 0.2414
2 1.6814 0.3033
3 1.5124 0.5197

From here I wanted to conveniently access the predicted values and SEs. I 
thought that I might be able to do this using either ftable() or 
as.data.frame(), as methods for these exist for the objects of class 
svystat. But while the predicted values come through fine, the SE only gets 
calculated for the first element and is then repeated: 

 ftable(out.pred)
  A B
1 1.8504120 0.2413889
2 1.6814293 0.2413889
3 1.5124466 0.2413889

 as.data.frame(out.pred)
  linkSE
1 1.850412 0.2413889
2 1.681429 0.2413889
3 1.512447 0.2413889

I think what's happening is that as.data.frame.svystat() method in the survey 
package ends up calling the wrong function to calculate the standard errors.  
From the survey package:

as.data.frame.svystat-function(x,...){
  rval-data.frame(statistic=coef(x),SE=SE(x))
  names(rval)[1]-attr(x,statistic)
  if (!is.null(attr(x,deff)))
rval-cbind(rval,deff=deff(x))
  rval
}

The relevant SE method seems to be: 

SE.svrepstat-function(object,...){
  if (is.list(object)){
object-object[[1]]
  }
  vv-as.matrix(attr(object,var))
  if (!is.null(dim(object))  length(object)==length(vv))
sqrt(vv)
  else
sqrt(diag(vv))
}

Instead of returning sqrt(vv) on each element, it calculates sort(diag(vv)) 
instead. At least I think this is what's happening. 

I apologize in advance if all this is the result of some elementary error on my 
part. 

Thanks,

Kieran
  
-- 
Kieran Healy :: http://kieranhealy.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] survey package svystat objects from predict()

2012-02-13 Thread Thomas Lumley
On Tue, Feb 14, 2012 at 11:45 AM, Kieran Healy kjhe...@gmail.com wrote:
 Hello,

 I'm running R 2.14.1 on OS X (x86_64-apple-darwin9.8.0/x86_64 (64-bit)), with 
 version 3.28 of Thomas Lumley's survey package. I was using predict() from 
 svyglm(). E.g.:

 data(api)
 dstrat-svydesign(id=~1,strata=~stype, weights=~pw, data=apistrat, fpc=~fpc)
 out - svyglm(sch.wide~ell+mobility, design=dstrat,
        family=quasibinomial())
 pred.df - expand.grid(ell=c(20,50,80), mobility=20)
 out.pred - predict(out, pred.df)

 From the console out.pred looks like this:

 class(out.pred)
 [1] svystat

 print(out.pred) # or just 'out.pred'
    link     SE
 1 1.8504 0.2414
 2 1.6814 0.3033
 3 1.5124 0.5197

 From here I wanted to conveniently access the predicted values and SEs. I 
 thought that I might be able to do this using either ftable() or 
 as.data.frame(), as methods for these exist for the objects of class 
 svystat. But while the predicted values come through fine, the SE only gets 
 calculated for the first element and is then repeated:

 ftable(out.pred)
          A         B
 1 1.8504120 0.2413889
 2 1.6814293 0.2413889
 3 1.5124466 0.2413889

 as.data.frame(out.pred)
      link        SE
 1 1.850412 0.2413889
 2 1.681429 0.2413889
 3 1.512447 0.2413889

 I think what's happening is that as.data.frame.svystat() method in the survey 
 package ends up calling the wrong function to calculate the standard errors.  
 From the survey package:

 as.data.frame.svystat-function(x,...){
  rval-data.frame(statistic=coef(x),SE=SE(x))
  names(rval)[1]-attr(x,statistic)
  if (!is.null(attr(x,deff)))
    rval-cbind(rval,deff=deff(x))
  rval
 }

 The relevant SE method seems to be:

 SE.svrepstat-function(object,...){
  if (is.list(object)){
    object-object[[1]]
  }
  vv-as.matrix(attr(object,var))
  if (!is.null(dim(object))  length(object)==length(vv))
    sqrt(vv)
  else
    sqrt(diag(vv))
 }

Mostly correct, but the relevant SE method is actually SE.default
survey:::SE.default
function (object, ...)
{
sqrt(diag(vcov(object, ...)))
}

It can't be SE.svrepstat, because the class is wrong.

If you define
SE.svystat-function(object,...){
   v-vcov(object)
   if (!is.matrix(v) || NCOL(v)==1) sqrt(v) else sqrt(diag(v))
 }

it should work.

-thomas

-- 
Thomas Lumley
Professor of Biostatistics
University of Auckland

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] survey package: weights used in svycoxph()

2010-05-18 Thread Thomas Lumley

On Mon, 17 May 2010, Vinh Nguyen wrote:


Dear R-help,

Let me know if I should email r-devel instead of this list.  This
message is addressed to Professor Lumley or anyone familiar with the
survey package.

Does svycoxph() implement the method outlined in Binder 1992 as
referenced in the help file?


Yes. That's why it's referenced.


That is, are weights incorporated in the
ratio term (numerator and denominator) of the estimating equation?


Yes.

  I

don't believe so since svycoxph() calls coxph() of the survival
package and weights are applied once in the estimating equation.  If
the weights are implemented in the ratio, could you point me to where
in the code this is done?  I would like to estimate as in Binder but
with custom weights.  Thanks.


It happens inside the C code called by coxph(), eg, in survival/src/coxfit2.c

Binder's estimating equations are the usual way of applying weights to a Cox model, so 
nothing special is done apart from calling coxph(). To quote the author of the survival 
package, Terry Therneau, Other formulae change in the obvious way, eg, the weighted 
mean $\bar Z$ is changed to include both the risk weights $r$ and the external weights 
$w$. [Mayo Clinic Biostatistics technical report #52, section 6.2.2]



This is mentioned in the help file but I don't quite understand:
The main difference between svycoxph function and the robust=TRUE
option to coxph in the
survival package is that this function accounts for the reduction in
variance from stratified sampling
and the increase in variance from having only a small number of clusters.


The point estimates from coxph() are the same as those from svycoxph() (with 
the same weights).  The standard errors are almost the same.  There are two 
differences.  The first is the use of 1/(nclusters -1) rather than 1/nclusters 
as a divisor.  The second is that svycoxph() computes variances using 
estimating functions centered at zero in each *sampling* stratum whereas 
coxph() centers them at zero in each baseline hazard stratum, as supplied in 
the strata() argument to coxph().

 -thomas

Thomas Lumley   Assoc. Professor, Biostatistics
tlum...@u.washington.eduUniversity of Washington, Seattle

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] survey package: weights used in svycoxph()

2010-05-18 Thread Vinh Nguyen
On Tue, May 18, 2010 at 8:50 AM, Thomas Lumley tlum...@u.washington.edu wrote:

   I

 don't believe so since svycoxph() calls coxph() of the survival
 package and weights are applied once in the estimating equation.  If
 the weights are implemented in the ratio, could you point me to where
 in the code this is done?  I would like to estimate as in Binder but
 with custom weights.  Thanks.

 It happens inside the C code called by coxph(), eg, in
 survival/src/coxfit2.c


Thank you for your clarification.  I mistakenly assumed weights only
appeared once in the estimating equation, creating a weighted sum of
the score equation.  Thinking in retrospect if the weights are to be
used as case weights they better be in the ratio term as well
(wherever there is an at risk indicator).

 Binder's estimating equations are the usual way of applying weights to a Cox
 model, so nothing special is done apart from calling coxph(). To quote the
 author of the survival package, Terry Therneau, Other formulae change in
 the obvious way, eg, the weighted mean $\bar Z$ is changed to include both
 the risk weights $r$ and the external weights $w$. [Mayo Clinic
 Biostatistics technical report #52, section 6.2.2]

Don't see a section 6.2.2 in this technical report.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] survey package: weights used in svycoxph()

2010-05-18 Thread Thomas Lumley

On Tue, 18 May 2010, Vinh Nguyen wrote:


Binder's estimating equations are the usual way of applying weights to a Cox
model, so nothing special is done apart from calling coxph(). To quote the
author of the survival package, Terry Therneau, Other formulae change in
the obvious way, eg, the weighted mean $\bar Z$ is changed to include both
the risk weights $r$ and the external weights $w$. [Mayo Clinic
Biostatistics technical report #52, section 6.2.2]


Don't see a section 6.2.2 in this technical report.


Sorry, #58

   -thomas


Thomas Lumley   Assoc. Professor, Biostatistics
tlum...@u.washington.eduUniversity of Washington, Seattle

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] survey package: weights used in svycoxph()

2010-05-17 Thread Vinh Nguyen
Dear R-help,

Let me know if I should email r-devel instead of this list.  This
message is addressed to Professor Lumley or anyone familiar with the
survey package.

Does svycoxph() implement the method outlined in Binder 1992 as
referenced in the help file?  That is, are weights incorporated in the
ratio term (numerator and denominator) of the estimating equation?  I
don't believe so since svycoxph() calls coxph() of the survival
package and weights are applied once in the estimating equation.  If
the weights are implemented in the ratio, could you point me to where
in the code this is done?  I would like to estimate as in Binder but
with custom weights.  Thanks.

This is mentioned in the help file but I don't quite understand:
The main difference between svycoxph function and the robust=TRUE
option to coxph in the
survival package is that this function accounts for the reduction in
variance from stratified sampling
and the increase in variance from having only a small number of clusters.

Vinh

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] survey package: weights used in svycoxph()

2010-05-17 Thread Vinh Nguyen
Dear R-help,

Let me know if I should email r-devel instead of this list.  This
message is addressed to Professor Lumley or anyone familiar with the
survey package.

Does svycoxph() implement the method outlined in Binder 1992 as
referenced in the help file?  That is, are weights incorporated in the
ratio term (numerator and denominator) of the estimating equation?  I
don't believe so since svycoxph() calls coxph() of the survival
package and weights are applied once in the estimating equation.  If
the weights are implemented in the ratio, could you point me to where
in the code this is done?  I would like to estimate as in Binder but
with custom weights.  Thanks.

This is mentioned in the help file but I don't quite understand:
The main difference between svycoxph function and the robust=TRUE
option to coxph in the
survival package is that this function accounts for the reduction in
variance from stratified sampling
and the increase in variance from having only a small number of clusters.

Vinh

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] survey package question

2010-02-18 Thread Richard Valliant
Should the svyby function be able to work with svyquantile?  I get the
error below ...

data(api)
dclus1-svydesign(id=~dnum, weights=~pw, data=apiclus1, fpc=~fpc)
 
svyby(~api00, 
  design=dclus1,
  by = ~stype, 
  quantiles=c(.25,.5,.75), 
  FUN=svyquantile, 
  na.rm=T 
  )
 
 Error in object$coefficients : $ operator is invalid for atomic
vectors

A more general question is: can quantiles and their SEs be computed for
subgroups?
 
The same syntax works using svymean.

Thanks
R. Valliant
Univ. of Maryland, US

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] survey package question

2010-02-18 Thread Thomas Lumley

On Thu, 18 Feb 2010, Richard Valliant wrote:


Should the svyby function be able to work with svyquantile?  I get the
error below ...


It works, but you need to either specify ci=TRUE or keep.var=FALSE.  The 
problem is that svyquantile() by default does not produce standard errors.


svyby(~api00, ~stype, design=dclus1,svyquantile, 
quantile=c(0.25,0.5,.75),ci=TRUE)

  stype   0.25   0.5  0.75  se.0.25   se.0.5  se.0.75
E E 553.00 652.0 729.0 27.74745 37.81025 17.12898
H H 523.00 608.0 699.5 46.01881 65.82488 33.38789
M M 532.75 636.5 696.5 60.78990 43.12310 55.27555

svyby(~api00, ~stype, design=dclus1,svyquantile, quantile=c(0.25,0.5,.75), 
keep.var=FALSE)

  stype statistic1 statistic2 statistic3
E E 553.00  652.0  729.0
H H 523.00  608.0  699.5
M M 532.75  636.5  696.5



A more general question is: can quantiles and their SEs be computed for
subgroups?



You can also use subset(), which is what svyby() does internally


svyquantile(~api00, quantile=c(0.25,0.5,0.75), subset(dclus1, stype==E))

  0.25 0.5 0.75
api00  553 652  729


   -thomas


Thomas Lumley   Assoc. Professor, Biostatistics
tlum...@u.washington.eduUniversity of Washington, Seattle

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Survey Package with Binary Data (no Standard Errors reported)

2009-04-06 Thread Thomas Lumley

On Fri, 3 Apr 2009, Paul Jones wrote:


Hi,

I'm trying to get standard errors for some of the variables in my data frame. 
One of the questions on my survey is whether faculty coordinate across 
curriculum to include Arts Education as subject matter. All the responses are 
coded in zeros and ones obviously. For some of the other variables I have a 2 
for those that responded with Don't Know.


I'm getting NA for mean and standard deviations from svymean. Am I doing 
something wrong of can the survey package not handle this type of data?


Are you sure you don't have any NA values in the data? If you do, the 
na.rm=TRUE option to svymean() will fix your problem. If you don't, 
something mysterious is happening. You could try svytable(~Curriculum, 
survey), which will give a tabulation and might show up what is strange 
about your data.


As a separate issue, you might want to look at svyciprop() if some of your 
proportions are close to 0 or 1, to get better confidence intervals.




Here's my code.


Nothing obviously wrong with it.

-thomas


survey - svydesign(id=~1, data=General, strata=~Grade.Level)

Warning message:
In svydesign.default(id = ~1, data = General, strata = ~Grade.Level) :
No weights or probabilities supplied, assuming equal probability


summary(survey)

Stratified Independent Sampling design (with replacement)
svydesign(id = ~1, data = General, strata = ~Grade.Level)
Probabilities:
 Min. 1st Qu.  MedianMean 3rd Qu.Max.
1   1   1   1   1   1
Stratum Sizes:
 Elementary High Middle
obs   312  236156
design.PSU312  236156
actual.PSU312  236156
Data variables:
[1] Grade.Level  Curriculum [3] 
Field.Trips  Residencies[5] PTA.Support 
Community.Open.Performances
[7] Visual.Arts.Attendance   Literary.Arts.Attendance   [9] 
Arts.Organization.Membership Arts.Essential

 svymean(~Curriculum, survey)
 mean SE
Curriculum   NA NA

???

PJ

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Thomas Lumley   Assoc. Professor, Biostatistics
tlum...@u.washington.eduUniversity of Washington, Seattle

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Survey Package with Binary Data (no Standard Errors reported)

2009-04-03 Thread Paul Jones

Hi,

I'm trying to get standard errors for some of the variables in my data 
frame. One of the questions on my survey is whether faculty coordinate 
across curriculum to include Arts Education as subject matter. All the 
responses are coded in zeros and ones obviously. For some of the other 
variables I have a 2 for those that responded with Don't Know.


I'm getting NA for mean and standard deviations from svymean. Am I doing 
something wrong of can the survey package not handle this type of data?


Here's my code.

 survey - svydesign(id=~1, data=General, strata=~Grade.Level)
Warning message:
In svydesign.default(id = ~1, data = General, strata = ~Grade.Level) :
 No weights or probabilities supplied, assuming equal probability

 summary(survey)
Stratified Independent Sampling design (with replacement)
svydesign(id = ~1, data = General, strata = ~Grade.Level)
Probabilities:
  Min. 1st Qu.  MedianMean 3rd Qu.Max.
 1   1   1   1   1   1
Stratum Sizes:
  Elementary High Middle
obs   312  236156
design.PSU312  236156
actual.PSU312  236156
Data variables:
[1] Grade.Level  Curriculum 
[3] Field.Trips  Residencies
[5] PTA.Support  Community.Open.Performances
[7] Visual.Arts.Attendance   Literary.Arts.Attendance   
[9] Arts.Organization.Membership Arts.Essential
 
 svymean(~Curriculum, survey)

  mean SE
Curriculum   NA NA

???

PJ

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] survey package

2008-09-25 Thread Sylvie Ahoussou
Hello 

I have a problem using the package survey:

I'm trying to calculate the prevalence of a disease in animals sampled using a 
2 stages sampling system: 

first level: farm randomly chosen within 551 farms
second level: animals randomly chosen in the farms

My data base has this aspect:

num esp Quarters Totcat Totshp Totgt Tbtpos fpc1 Totanim Id_An
115 2045   G01-Q1  1  112  0  551  14   
115
116 2045   G01-Q1  1  112  0  551  14   
116
117 2045   G01-Q1  1  112  0  551  14   
117
118 2045   G01-Q1  1  112  0  551  14   
118


where num is the farm ID and where Id_An is the animal ID
 
R accept to design my sampling with the function svydesign 
 clustot-svydesign(id=~num+Id_An, fpc=~fpc1+Totanim, data=tab1)
 summary(clustot)

2 - level Cluster Sampling design
With (419, 11593) clusters.
svydesign(id = ~num + Id_An, fpc = ~fpc1 + Totanim, data = tab1)
Probabilities:
   Min. 1st Qu.  MedianMean 3rd Qu.Max. 
 0.1086  0.3802  0.4986  0.4952  0.6145  0.7476 
Population size (PSUs): 551 
Data variables:
 [1] num  esp  Quarters Totcat   Totshp   TotgtTbtpos 
  fpc1 Totanim  Id_An  

but when I try to get some stat on it I obtain this error message

svymean(~Tbtpos,clustot,na.rm = TRUE)

Erreur dans switch(lonely.psu, certainty = scale * crossprod(x), remove = scale 
*  : 
  Stratum (1.629) has only one PSU at stage 2
 svytotal(~Tbtpos, clustot)
Erreur dans switch(lonely.psu, certainty = scale * crossprod(x), remove = scale 
*  : 
  Stratum (1.629) has only one PSU at stage 2

and with options(error=recover) I obtain 

  svymean(~Tbtpos,clustot,na.rm = TRUE)

Erreur dans switch(lonely.psu, certainty = scale * crossprod(x), remove = scale 
*  : 
  Stratum (1.629) has only one PSU at stage 2

Enter a frame number, or 0 to exit   

 1: svymean(~Tbtpos, clustot, na.rm = TRUE)
 2: svymean.survey.design2(~Tbtpos, clustot, na.rm = TRUE)
 3: svyrecvar(x * pweights/psum, design$cluster, design$strata, design$fpc, 
postStrata = design$postStrata)
 4: multistage(x, clusters, stratas, fpcs$sampsize, fpcs$popsize, lonely.psu = 
getOption(survey.lonely.psu), one.stage = one.stage, stage = 1, cal = cal)
 5: by(1:n, list(as.numeric(clusters[, 1])), function(index) {
 6: by.default(1:n, list(as.numeric(clusters[, 1])), function(index) {
 7: by(as.data.frame(data), INDICES, FUN, ...)
 8: by.data.frame(as.data.frame(data), INDICES, FUN, ...)
 9: eval(substitute(tapply(1:nd, IND, FUNx)), data)
10: eval(expr, envir, enclos)
11: tapply(1:11593, list(c(2045, 2046, 2070, 2070, 2070, 2071, 2230, 2280, 
2304, 2045, 2045, 2045, 2045, 2045, 2045, 2046, 2046, 2046, 2046, 2046, 2046, 20
12: lapply(split(X, group), FUN, ...)
13: FUN(X[[23]], ...)
14: FUN(data[x, ], ...)
15: multistage(x[index, , drop = FALSE], clusters[index, -1, drop = FALSE], 
stratas[index, -1, drop = FALSE], nPSUs[index, -1, drop = FALSE], fpcs[index, -
16: onestage(x, stratas[, 1], clusters[, 1], nPSUs[, 1], fpcs[, 1], lonely.psu 
= lonely.psu, stage = stage, cal = cal)
17: tapply(1:NROW(x), list(factor(strata)), function(index) {
18: lapply(split(X, group), FUN, ...)
19: FUN(X[[1]], ...)
20: onestrat(x[index, , drop = FALSE], clusters[index], nPSU[index][1], 
fpc[index][1], lonely.psu = lonely.psu, stratum = strata[index][1], stage = 
stage, 
21: switch(lonely.psu, certainty = scale * crossprod(x), remove = scale * 
crossprod(x), adjust = scale * crossprod(x), average = NA * crossprod(x), fail 
= 

I don't understand the error message and I don't know how to ask R to give me 
the stratum which has a problem or the problematic stage either ...??? 

Thanks in advance

AHOUSSOU Sylvie
Vétérinaire Epidémiologiste
CIRAD Domaine Duclos
97 170 Petit-Bourg
tel : 05 90 25 59 47
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] survey package: proportion estimate confidence intervals using svymean

2008-02-13 Thread Peter Holck
Using the survey package I find it is convenient and easy to get estimated
proportions using svymean, and their corresponding estimated standard
errors.  But is there any elegant/simple way to calculate corresponding
confidence intervals for those proportions?  Of course +/- 1.96 s.e. is a
reasonable approximation for a 95% CI, but (incorrectly) assumes symmetrical
distribution for a proportion.  

About all I can think of is to transform onto some scale like the logit
scale, use the delta method to estimate standard errors, and transform back
for appropriate confidence intervals.  Or perhaps use the machinery of the
survey package svyglm function with logit link (family=quasibinomial) to do
the calculations for me.  Still, seems rather cumbersome.  Is there a better
solution?

Thanks for any suggestions or pointing out the errors in my logic!
Peter Holck

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] survey package: proportion estimate confidence intervals using svymean

2008-02-13 Thread Thomas Lumley
On Tue, 12 Feb 2008, Peter Holck wrote:

 Using the survey package I find it is convenient and easy to get estimated
 proportions using svymean, and their corresponding estimated standard
 errors.  But is there any elegant/simple way to calculate corresponding
 confidence intervals for those proportions?  Of course +/- 1.96 s.e. is a
 reasonable approximation for a 95% CI, but (incorrectly) assumes symmetrical
 distribution for a proportion.

'Incorrectly' is a bit strong, I think.

 About all I can think of is to transform onto some scale like the logit
 scale, use the delta method to estimate standard errors, and transform back
 for appropriate confidence intervals.  Or perhaps use the machinery of the
 survey package svyglm function with logit link (family=quasibinomial) to do
 the calculations for me.  Still, seems rather cumbersome.  Is there a better
 solution?


I think the most straightforward solution would be svyglm().

You could also use svycontrast() to do the delta-method calculations for you.

-thomas

Thomas Lumley   Assoc. Professor, Biostatistics
[EMAIL PROTECTED]   University of Washington, Seattle

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.