Re: [R] Conditional logistic regression for "events/trials" format
On Thu, 31 May 2007, Strickland, Matthew (CDC/CCHP/NCBDDD) (CTR) wrote: > Thanks for your reply Charles. I do indeed have other variables. I > apologize for being vague, here is my study in more detail: > > I have a cohort of births. My outcome is a dichotomous variable for > presence/absence of a birth defect. For each cohort member I estimate > the date of conception, and assign a pollution level during the relevant > period of gestation. All cohort members conceived on the same day are > assigned the same pollution level. These cohort members also have a > covariate, t, which indicates the day of follow-up. For example, if the > first day of my study is Jan 1, 1987, the data would look like: > > Date t Conceptions Cases > Pollution Stratum > Jan 1, 1987 1 100 1 > 101 > Jan 2, 1987 2 105 0 > 8 2 > Jan 3, 1987 3 101 1 > 113 > . > . > Jan 1, 1988 366 109 1 > 131 > Jan 2, 1988 367 111 2 > 192 > Jan 3, 1988 368 103 0 > 143 > . > . > . > > I make matched pairs of days (Strata) to control for the influence of > season. I also want to account for long-term trends, eg increasing birth > defects ascertainment and decreasing pollution levels over time, so I > want to fit a cubic spline using the variable t. > Rather than matching, you might control for season by fitting a periodic spline of your 'Stratum' variable. If you do that, then a generalized additive logistic regression model could be used. Something like fit <- gam( cbind( Cases, Conceptions - Cases ) ~ te( Stratum, bs="cc" ) + te( t, bs="cs" ) + Pollution, your.data.frame, family=binomial ) see ?gam, ?te > I have already analyzed this data as a time series (I don't use the > Stratum variable in the time-series analyses), but now I am exploring > some alternatives. My full dataset has 3,115 strata. > > So my final model would look like: clogit(Cases/Conceptions ~ Pollution > + f(t) + strata(Stratum)). > > So, just to reiterate, my goal is to make this model without having to > bring in the individual-level data. I would be just as happy to do a > conditional Poisson as I would be to do a conditional logistic > regression - either would seem to be appropriate here - if that opens up > some other options. > > Thanks very much for your time and interest, > Matt Strickland > Epidemiologist > Birth Defects Branch > U.S. Centers for Disease Control and Prevention > > > > -Original Message----- > From: Charles C. Berry [mailto:[EMAIL PROTECTED] > Sent: Thursday, May 31, 2007 1:12 PM > To: Strickland, Matthew (CDC/CCHP/NCBDDD) (CTR) > Cc: r-help@stat.math.ethz.ch; [EMAIL PROTECTED] > Subject: Re: [R] Conditional logistic regression for "events/trials" > format > > On Thu, 31 May 2007, Strickland, Matthew (CDC/CCHP/NCBDDD) (CTR) wrote: > >> Dear R users, >> >> I have a large individual-level dataset (~700,000 records) which I am >> performing a conditional logistic regression on. Key variables include > >> the dichotomous outcome, dichotomous exposure, and the stratum to >> which each person belongs. >> >> Using this individual-level dataset I can successfully use clogit to >> create the model I want. However reading this large .csv file into R >> and running the models takes a fair amount of time. >> >> Alternatively, I could choose to "collapse" the dataset so that each >> row has the number of events, number of individuals, and the exposure >> and stratum. In SAS they call this the "events/trials" format. This >> would make my dataset much smaller and presumably speed things up. >> > > I think you have described the data for forming a 2 by 2 by K table of > counts. > > In which case, loglin(), loglm(), mantelhaen.test(), and - if K is not > too large - glm(... , family=poisson) would be suitable. > > But you say 'models' above suggesting that there are some other > variables. If so, you need to be a bit more specific in describing your > setup. > > >> So my question is: can I use clogit (or possibly another function) to >> perform a conditional logistic regression when the data is in this >> "events/trials" format? I am using R version 2.5.0. >> >> Thank you very much, >> Matt Strickland >> Birth Defects Branch >> U.S. Centers fo
Re: [R] Conditional logistic regression for "events/trials" format
Thanks for your reply Charles. I do indeed have other variables. I apologize for being vague, here is my study in more detail: I have a cohort of births. My outcome is a dichotomous variable for presence/absence of a birth defect. For each cohort member I estimate the date of conception, and assign a pollution level during the relevant period of gestation. All cohort members conceived on the same day are assigned the same pollution level. These cohort members also have a covariate, t, which indicates the day of follow-up. For example, if the first day of my study is Jan 1, 1987, the data would look like: Datet Conceptions Cases Pollution Stratum Jan 1, 1987 1 100 1 10 1 Jan 2, 1987 2 105 0 8 2 Jan 3, 1987 3 101 1 11 3 . . Jan 1, 1988 366 109 1 13 1 Jan 2, 1988 367 111 2 19 2 Jan 3, 1988 368 103 0 14 3 . . . I make matched pairs of days (Strata) to control for the influence of season. I also want to account for long-term trends, eg increasing birth defects ascertainment and decreasing pollution levels over time, so I want to fit a cubic spline using the variable t. I have already analyzed this data as a time series (I don't use the Stratum variable in the time-series analyses), but now I am exploring some alternatives. My full dataset has 3,115 strata. So my final model would look like: clogit(Cases/Conceptions ~ Pollution + f(t) + strata(Stratum)). So, just to reiterate, my goal is to make this model without having to bring in the individual-level data. I would be just as happy to do a conditional Poisson as I would be to do a conditional logistic regression - either would seem to be appropriate here - if that opens up some other options. Thanks very much for your time and interest, Matt Strickland Epidemiologist Birth Defects Branch U.S. Centers for Disease Control and Prevention -Original Message- From: Charles C. Berry [mailto:[EMAIL PROTECTED] Sent: Thursday, May 31, 2007 1:12 PM To: Strickland, Matthew (CDC/CCHP/NCBDDD) (CTR) Cc: r-help@stat.math.ethz.ch; [EMAIL PROTECTED] Subject: Re: [R] Conditional logistic regression for "events/trials" format On Thu, 31 May 2007, Strickland, Matthew (CDC/CCHP/NCBDDD) (CTR) wrote: > Dear R users, > > I have a large individual-level dataset (~700,000 records) which I am > performing a conditional logistic regression on. Key variables include > the dichotomous outcome, dichotomous exposure, and the stratum to > which each person belongs. > > Using this individual-level dataset I can successfully use clogit to > create the model I want. However reading this large .csv file into R > and running the models takes a fair amount of time. > > Alternatively, I could choose to "collapse" the dataset so that each > row has the number of events, number of individuals, and the exposure > and stratum. In SAS they call this the "events/trials" format. This > would make my dataset much smaller and presumably speed things up. > I think you have described the data for forming a 2 by 2 by K table of counts. In which case, loglin(), loglm(), mantelhaen.test(), and - if K is not too large - glm(... , family=poisson) would be suitable. But you say 'models' above suggesting that there are some other variables. If so, you need to be a bit more specific in describing your setup. > So my question is: can I use clogit (or possibly another function) to > perform a conditional logistic regression when the data is in this > "events/trials" format? I am using R version 2.5.0. > > Thank you very much, > Matt Strickland > Birth Defects Branch > U.S. Centers for Disease Control > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:[EMAIL PROTECTED] UC San Diego http://biostat.ucsd.edu/~cberry/ La Jolla, San Diego 92093-0901 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Conditional logistic regression for "events/trials" format
On Thu, 31 May 2007, Strickland, Matthew (CDC/CCHP/NCBDDD) (CTR) wrote: > Dear R users, > > I have a large individual-level dataset (~700,000 records) which I am > performing a conditional logistic regression on. Key variables include > the dichotomous outcome, dichotomous exposure, and the stratum to which > each person belongs. > > Using this individual-level dataset I can successfully use clogit to > create the model I want. However reading this large .csv file into R and > running the models takes a fair amount of time. > > Alternatively, I could choose to "collapse" the dataset so that each row > has the number of events, number of individuals, and the exposure and > stratum. In SAS they call this the "events/trials" format. This would > make my dataset much smaller and presumably speed things up. > I think you have described the data for forming a 2 by 2 by K table of counts. In which case, loglin(), loglm(), mantelhaen.test(), and - if K is not too large - glm(... , family=poisson) would be suitable. But you say 'models' above suggesting that there are some other variables. If so, you need to be a bit more specific in describing your setup. > So my question is: can I use clogit (or possibly another function) to > perform a conditional logistic regression when the data is in this > "events/trials" format? I am using R version 2.5.0. > > Thank you very much, > Matt Strickland > Birth Defects Branch > U.S. Centers for Disease Control > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:[EMAIL PROTECTED] UC San Diego http://biostat.ucsd.edu/~cberry/ La Jolla, San Diego 92093-0901 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Conditional logistic regression for "events/trials" format
Dear R users, I have a large individual-level dataset (~700,000 records) which I am performing a conditional logistic regression on. Key variables include the dichotomous outcome, dichotomous exposure, and the stratum to which each person belongs. Using this individual-level dataset I can successfully use clogit to create the model I want. However reading this large .csv file into R and running the models takes a fair amount of time. Alternatively, I could choose to "collapse" the dataset so that each row has the number of events, number of individuals, and the exposure and stratum. In SAS they call this the "events/trials" format. This would make my dataset much smaller and presumably speed things up. So my question is: can I use clogit (or possibly another function) to perform a conditional logistic regression when the data is in this "events/trials" format? I am using R version 2.5.0. Thank you very much, Matt Strickland Birth Defects Branch U.S. Centers for Disease Control __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.