[R] Stats question: Comparison of the same individuals during two exposure times

2012-07-17 Thread natalie.vanzuydam
Hi,

I'm hoping that someone will be able to help.  I would like to compare how
covariates associate with the risk of a binary outcome during two periods. 
Period 1 will be non-exposure to a treatment and period 2 will be exposure
to a treatment.  The same individuals will be examined in each group but I
want to be able to compare the association of certain covariates between the
two groups to see if there is a treatment interaction.  I've looked at
case-crossover designs and time series analysis and don't think that they
are suitable.  The cohort has longitudinal data so individuals will go onto
treatment at different times and the effect of the treatment needs to be
administered for a while before it has an effect.  The reason why I cannot
just go ahead with an exposed vs unexposed design is that most individuals
in the cohort end up on the treatment eventually and the unexposed group is
very small and lacks power for a meaningful comparison.  

Is there anyway to compare the same individuals during different exposure
times and to look at the effect of different covariates under the exposed
and unexposed conditions?

Thanks for you help,
Natalie

-
Natalie Van Zuydam

PhD Student
University of Dundee
nvanzuy...@dundee.ac.uk
--
View this message in context: 
http://r.789695.n4.nabble.com/Stats-question-Comparison-of-the-same-individuals-during-two-exposure-times-tp4636732.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Subsetting a data frame

2011-12-05 Thread natalie.vanzuydam
Hi R users,

I really need help with subsetting  data frames:

I have a large database of medical records and I want to be able to match
patterns from a list of search terms .

I've used this simplified data frame in a previous example:


db - structure(list(ind = c(ind1, ind2, ind3, ind4), test1 = c(1, 
2, 1.3, 3), test2 = c(56L, 27L, 58L, 2L), test3 = c(1.1, 28, 
9, 1.2)), .Names = c(ind, test1, test2, test3), class =
data.frame, row.names = c(NA, 
-4L)) 

terms_include - c(1,2,3) 
terms_exclude - c(1.1,1.2,1.3) 


So in this example I want to include all the terms from terms include as
long as they don't occur with terms exclude in the same row of the data
frame.

Previously I was given this function which works very well if you want to
match exactly:


f - function(x)  !any(x %in% terms_exclude)  any(x %in% terms_include) 
db[apply(db[, -1], 1, f), ] 

   ind test1 test2 test3 
2 ind2 227  28.0 
4 ind4 3 2   1.2 


I would like to know if there is a way to write a similar function that
looks for matches that start with the query string:  as in
grepl(^pattern,x)  

I started writing a function but am not sure how to get it to return the
dataframe or matrix:


for (i in 1:length(terms_include)){
db_new - apply(db,2, grepl,pattern=i)
}

Applying this function gives me:

db_new - structure(c(FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, TRUE, TRUE, 
FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE), .Dim = c(4L, 
4L), .Dimnames = list(NULL, c(ind, test1, test2, test3
)))

So the above is searching the pattern anywhere in the dataframe instead of
just at the beginning of the string.  

How would I incorporate look for terms to include but don't return the row
of the data frame if it also includes one of the terms to exclude while
using partial matching?

I hope that this makes sense.

Many thanks,
Natalie

-
Natalie Van Zuydam

PhD Student
University of Dundee
nvanzuy...@dundee.ac.uk
--
View this message in context: 
http://r.789695.n4.nabble.com/Subsetting-a-data-frame-tp4160127p4160127.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Subsetting a data frame with multiple values and exclusions.

2011-10-06 Thread natalie.vanzuydam
Thanks.  Such a short and sweet answer that does what it should.

-
Natalie Van Zuydam

PhD Student
University of Dundee
nvanzuy...@dundee.ac.uk
--
View this message in context: 
http://r.789695.n4.nabble.com/Subsetting-a-data-frame-with-multiple-values-and-exclusions-tp3874967p3877472.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Subsetting a data frame with multiple values and exclusions.

2011-10-05 Thread natalie.vanzuydam
Hi all,

I realise that the convention is to provide a working example of my problem
but the data are  of a sensitive nature so I'm not able to do that in this
case.

I need to query a database for multiple search terms:

db - structure(list(ind = c(ind1, ind2, ind3, ind4), test1 = c(1, 
2, 1.3, 3), test2 = c(56L, 27L, 58L, 2L), test3 = c(1.1, 28, 
9, 1.2)), .Names = c(ind, test1, test2, test3), class =
data.frame, row.names = c(NA, 
-4L))

terms_include - c(1,2,3)
terms_exclude - c(1.1,1.2,1.3)

So I need to write a loop where the search of each value in the list of
terms_include is searched over the entire data frame.  I thought of using
apply with grepl and subset?  At the same time if the value of terms_include
occurs in the same row as values from terms_exclude then that row must be
excluded from the output dataframe.

I'm not sure where to even begin.  I've only worked very basically with
subset.  The final database is much larger and the number of search terms is
many more than are presented here so I would really need to be able to loop
over the data frame successively to return a final df with my searched
values in at least one of the columns.

Your help and assistance is much appreciated,
Natalie



-
Natalie Van Zuydam

PhD Student
University of Dundee
nvanzuy...@dundee.ac.uk
--
View this message in context: 
http://r.789695.n4.nabble.com/Subsetting-a-data-frame-with-multiple-values-and-exclusions-tp3874967p3874967.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Multiple events Cox's model and proportional hazards

2011-09-01 Thread natalie.vanzuydam
Hi,

I am using the survival package to perform a Cox's regression analysis on
multiple events of myocardial infarctions.  I have been using the Andersen
and Gill model: 
coxph(Surv(time1,time2,status)~factor(treatment)+age+sex+cluster(id).

I was just wondering if this model should satisfy proportional hazards
assumptions.  I have run  the cox.zph function and the age parameter
violates the proportional hazards?  What would be the best way to construct
this model.  Should I include time dependent covariates?

Thanks,
Natalie

-
Natalie Van Zuydam

PhD Student
University of Dundee
nvanzuy...@dundee.ac.uk
--
View this message in context: 
http://r.789695.n4.nabble.com/Multiple-events-Cox-s-model-and-proportional-hazards-tp3783031p3783031.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Cox's regression analysis with Left truncated data

2011-07-25 Thread natalie.vanzuydam
Hi,

I have a fairly simple question.  I would like to use the survival package
to perform an analysis on data where an event can have occurred before
individuals were recruited into a study.  I'm not sure how to do this using
the Surv() function.  I would have a date of an event and then the enrolment
date would be after that.  How do I put these two dates into the survival
function?  

Thank you,
Natalie

-
Natalie Van Zuydam

PhD Student
University of Dundee
nvanzuy...@dundee.ac.uk
--
View this message in context: 
http://r.789695.n4.nabble.com/Cox-s-regression-analysis-with-Left-truncated-data-tp3692114p3692114.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Cleaning date columns

2011-03-10 Thread natalie.vanzuydam
Dear Bill,

Thanks very much for the reply and for the code.  I have amended my personal
details for future posts.  I was wondering if there were any good books or
tutorials for writing code similar to what you have provided above?

Best wishes,
Natalie Van Zuydam

-
Natalie Van Zuydam

PhD Student
University of Dundee
nvanzuy...@dundee.ac.uk
--
View this message in context: 
http://r.789695.n4.nabble.com/Cleaning-date-columns-tp3343359p3345482.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] within group sequential subtraction

2011-03-10 Thread natalie.vanzuydam
Hi Everyone,

I would like to do sequential subtractions within a group so that I know the
time between separate observations for a group of individuals.  

My data:

data - structure(list(group = c(IND1, IND1, IND2, 
IND2, IND2, IND3, IND4, IND5, 
IND6, IND6), date_obs = structure(c(6468, 
7063, 9981, 14186, 14372, 5129, 9767, 11168, 10243, 10647), class =
Date)), .Names = c(group, 
date_obs), row.names = c(NA, 10L), class = data.frame)

So I start with:

 group   date_obs
1   IND1 1987-09-17
2   IND1 1989-05-04
3   IND2 1997-04-30
4   IND2 2008-11-03
5   IND2 2009-05-08
6   IND3 1984-01-17
7   IND4 1996-09-28
8   IND5 2000-07-30
9   IND6 1998-01-17
10  IND6 1999-02-25

what I would like:

 group   date_obs time
1   IND1 1987-09-17 NA  
2   IND1 1989-05-04 595
3   IND2 1997-04-30 NA
4   IND2 2008-11-03 4205
5   IND2 2009-05-08 186
6   IND3 1984-01-17 NA
7   IND4 1996-09-28 NA
8   IND5 2000-07-30 NA
9   IND6 1998-01-17 NA
10  IND6 1999-02-25 404

So that if there is one entry/individual a 0/NA would be acceptable and if
there is more than one entry/individual the sequential difference would be
calculated.

I started with some code but it I cannot edit it appropriately.

x - do.call(rbind, lapply(split(data, data$group), 
function(dat) { 
dat - dat[order(dat$date_obs), ] 
d-diff(dat$date_obs)
 dat - rbind(dat,d)
}))

I get this error: Error in as.Date.numeric(value) : 'origin' must be
supplied so I'm not sure if it does what I need it to do.  In addition to
this the vector lengths won't match up as the first date in the sequence
won't be subtracted from itself.

I'm not sure if anyone knows an easier way to achieve this.  

Thanks for the help,
Natalie




-
Natalie Van Zuydam

PhD Student
University of Dundee
nvanzuy...@dundee.ac.uk
--
View this message in context: 
http://r.789695.n4.nabble.com/within-group-sequential-subtraction-tp3346033p3346033.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.