Re: [R] Conditional logistic regression for "events/trials" format

2007-06-01 Thread Charles C. Berry
On Thu, 31 May 2007, Strickland, Matthew (CDC/CCHP/NCBDDD) (CTR) wrote:

> Thanks for your reply Charles. I do indeed have other variables. I
> apologize for being vague, here is my study in more detail:
>
> I have a cohort of births. My outcome is a dichotomous variable for
> presence/absence of a birth defect. For each cohort member I estimate
> the date of conception, and assign a pollution level during the relevant
> period of gestation. All cohort members conceived on the same day are
> assigned the same pollution level. These cohort members also have a
> covariate, t, which indicates the day of follow-up. For example, if the
> first day of my study is Jan 1, 1987, the data would look like:
>
> Date  t   Conceptions Cases
> Pollution Stratum
> Jan 1, 1987   1   100 1
> 101
> Jan 2, 1987   2   105 0
> 8 2
> Jan 3, 1987   3   101 1
> 113
> .
> .
> Jan 1, 1988   366 109 1
> 131
> Jan 2, 1988   367 111 2
> 192
> Jan 3, 1988   368 103 0
> 143
> .
> .
> .
>
> I make matched pairs of days (Strata) to control for the influence of
> season. I also want to account for long-term trends, eg increasing birth
> defects ascertainment and decreasing pollution levels over time, so I
> want to fit a cubic spline using the variable t.
>

Rather than matching, you might control for season by fitting a periodic 
spline of your 'Stratum' variable. If you do that, then a generalized 
additive logistic regression model could be used.


Something like

fit <- gam( cbind( Cases, Conceptions - Cases ) ~ te( Stratum, bs="cc" ) +
te( t, bs="cs" ) + Pollution, your.data.frame,
family=binomial )

see ?gam, ?te



> I have already analyzed this data as a time series (I don't use the
> Stratum variable in the time-series analyses), but now I am exploring
> some alternatives. My full dataset has 3,115 strata.
>
> So my final model would look like: clogit(Cases/Conceptions ~ Pollution
> + f(t) + strata(Stratum)).
>
> So, just to reiterate, my goal is to make this model without having to
> bring in the individual-level data. I would be just as happy to do a
> conditional Poisson as I would be to do a conditional logistic
> regression - either would seem to be appropriate here - if that opens up
> some other options.
>
> Thanks very much for your time and interest,
> Matt Strickland
> Epidemiologist
> Birth Defects Branch
> U.S. Centers for Disease Control and Prevention
>
>
>
> -Original Message-----
> From: Charles C. Berry [mailto:[EMAIL PROTECTED]
> Sent: Thursday, May 31, 2007 1:12 PM
> To: Strickland, Matthew (CDC/CCHP/NCBDDD) (CTR)
> Cc: r-help@stat.math.ethz.ch; [EMAIL PROTECTED]
> Subject: Re: [R] Conditional logistic regression for "events/trials"
> format
>
> On Thu, 31 May 2007, Strickland, Matthew (CDC/CCHP/NCBDDD) (CTR) wrote:
>
>> Dear R users,
>>
>> I have a large individual-level dataset (~700,000 records) which I am
>> performing a conditional logistic regression on. Key variables include
>
>> the dichotomous outcome, dichotomous exposure, and the stratum to
>> which each person belongs.
>>
>> Using this individual-level dataset I can successfully use clogit to
>> create the model I want. However reading this large .csv file into R
>> and running the models takes a fair amount of time.
>>
>> Alternatively, I could choose to "collapse" the dataset so that each
>> row has the number of events, number of individuals, and the exposure
>> and stratum. In SAS they call this the "events/trials" format. This
>> would make my dataset much smaller and presumably speed things up.
>>
>
> I think you have described the data for forming a 2 by 2 by K table of
> counts.
>
> In which case, loglin(), loglm(), mantelhaen.test(), and - if K is not
> too large - glm(... , family=poisson)  would be suitable.
>
> But you say 'models' above suggesting that there are some other
> variables. If so, you need to be a bit more specific in describing your
> setup.
>
>
>> So my question is: can I use clogit (or possibly another function) to
>> perform a conditional logistic regression when the data is in this
>> "events/trials" format? I am using R version 2.5.0.
>>
>> Thank you very much,
>> Matt Strickland
>> Birth Defects Branch
>> U.S. Centers fo

Re: [R] Conditional logistic regression for "events/trials" format

2007-05-31 Thread Strickland, Matthew (CDC/CCHP/NCBDDD) (CTR)
Thanks for your reply Charles. I do indeed have other variables. I
apologize for being vague, here is my study in more detail:

I have a cohort of births. My outcome is a dichotomous variable for
presence/absence of a birth defect. For each cohort member I estimate
the date of conception, and assign a pollution level during the relevant
period of gestation. All cohort members conceived on the same day are
assigned the same pollution level. These cohort members also have a
covariate, t, which indicates the day of follow-up. For example, if the
first day of my study is Jan 1, 1987, the data would look like:

Datet   Conceptions Cases
Pollution   Stratum
Jan 1, 1987 1   100 1
10  1
Jan 2, 1987 2   105 0
8   2
Jan 3, 1987 3   101 1
11  3
.
.
Jan 1, 1988 366 109 1
13  1
Jan 2, 1988 367 111 2
19  2
Jan 3, 1988 368 103 0
14  3
.
.
.

I make matched pairs of days (Strata) to control for the influence of
season. I also want to account for long-term trends, eg increasing birth
defects ascertainment and decreasing pollution levels over time, so I
want to fit a cubic spline using the variable t. 

I have already analyzed this data as a time series (I don't use the
Stratum variable in the time-series analyses), but now I am exploring
some alternatives. My full dataset has 3,115 strata.

So my final model would look like: clogit(Cases/Conceptions ~ Pollution
+ f(t) + strata(Stratum)). 

So, just to reiterate, my goal is to make this model without having to
bring in the individual-level data. I would be just as happy to do a
conditional Poisson as I would be to do a conditional logistic
regression - either would seem to be appropriate here - if that opens up
some other options.

Thanks very much for your time and interest,
Matt Strickland
Epidemiologist
Birth Defects Branch
U.S. Centers for Disease Control and Prevention

 

-Original Message-
From: Charles C. Berry [mailto:[EMAIL PROTECTED] 
Sent: Thursday, May 31, 2007 1:12 PM
To: Strickland, Matthew (CDC/CCHP/NCBDDD) (CTR)
Cc: r-help@stat.math.ethz.ch; [EMAIL PROTECTED]
Subject: Re: [R] Conditional logistic regression for "events/trials"
format

On Thu, 31 May 2007, Strickland, Matthew (CDC/CCHP/NCBDDD) (CTR) wrote:

> Dear R users,
>
> I have a large individual-level dataset (~700,000 records) which I am 
> performing a conditional logistic regression on. Key variables include

> the dichotomous outcome, dichotomous exposure, and the stratum to 
> which each person belongs.
>
> Using this individual-level dataset I can successfully use clogit to 
> create the model I want. However reading this large .csv file into R 
> and running the models takes a fair amount of time.
>
> Alternatively, I could choose to "collapse" the dataset so that each 
> row has the number of events, number of individuals, and the exposure 
> and stratum. In SAS they call this the "events/trials" format. This 
> would make my dataset much smaller and presumably speed things up.
>

I think you have described the data for forming a 2 by 2 by K table of
counts.

In which case, loglin(), loglm(), mantelhaen.test(), and - if K is not
too large - glm(... , family=poisson)  would be suitable.

But you say 'models' above suggesting that there are some other
variables. If so, you need to be a bit more specific in describing your
setup.


> So my question is: can I use clogit (or possibly another function) to 
> perform a conditional logistic regression when the data is in this 
> "events/trials" format? I am using R version 2.5.0.
>
> Thank you very much,
> Matt Strickland
> Birth Defects Branch
> U.S. Centers for Disease Control
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Charles C. Berry(858) 534-2098
  Dept of Family/Preventive
Medicine
E mailto:[EMAIL PROTECTED]   UC San Diego
http://biostat.ucsd.edu/~cberry/ La Jolla, San Diego 92093-0901

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Conditional logistic regression for "events/trials" format

2007-05-31 Thread Charles C. Berry
On Thu, 31 May 2007, Strickland, Matthew (CDC/CCHP/NCBDDD) (CTR) wrote:

> Dear R users,
>
> I have a large individual-level dataset (~700,000 records) which I am
> performing a conditional logistic regression on. Key variables include
> the dichotomous outcome, dichotomous exposure, and the stratum to which
> each person belongs.
>
> Using this individual-level dataset I can successfully use clogit to
> create the model I want. However reading this large .csv file into R and
> running the models takes a fair amount of time.
>
> Alternatively, I could choose to "collapse" the dataset so that each row
> has the number of events, number of individuals, and the exposure and
> stratum. In SAS they call this the "events/trials" format. This would
> make my dataset much smaller and presumably speed things up.
>

I think you have described the data for forming a 2 by 2 by K table of 
counts.

In which case, loglin(), loglm(), mantelhaen.test(), and - if K is not too 
large - glm(... , family=poisson)  would be suitable.

But you say 'models' above suggesting that there are some other 
variables. If so, you need to be a bit more specific in describing your 
setup.


> So my question is: can I use clogit (or possibly another function) to
> perform a conditional logistic regression when the data is in this
> "events/trials" format? I am using R version 2.5.0.
>
> Thank you very much,
> Matt Strickland
> Birth Defects Branch
> U.S. Centers for Disease Control
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Charles C. Berry(858) 534-2098
  Dept of Family/Preventive Medicine
E mailto:[EMAIL PROTECTED]   UC San Diego
http://biostat.ucsd.edu/~cberry/ La Jolla, San Diego 92093-0901

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Conditional logistic regression for "events/trials" format

2007-05-31 Thread Strickland, Matthew (CDC/CCHP/NCBDDD) (CTR)
Dear R users,

I have a large individual-level dataset (~700,000 records) which I am
performing a conditional logistic regression on. Key variables include
the dichotomous outcome, dichotomous exposure, and the stratum to which
each person belongs. 

Using this individual-level dataset I can successfully use clogit to
create the model I want. However reading this large .csv file into R and
running the models takes a fair amount of time.

Alternatively, I could choose to "collapse" the dataset so that each row
has the number of events, number of individuals, and the exposure and
stratum. In SAS they call this the "events/trials" format. This would
make my dataset much smaller and presumably speed things up.

So my question is: can I use clogit (or possibly another function) to
perform a conditional logistic regression when the data is in this
"events/trials" format? I am using R version 2.5.0.

Thank you very much, 
Matt Strickland
Birth Defects Branch
U.S. Centers for Disease Control

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.