[R] Locate Patients who have multiple high blood pressure readings
On Thu, Jan 31, 2013 at 10:29 AM, Weijia Wang wwang@gmail.com wrote: Hi, I have a new question about subsetting in R. Say we have this data frame: PT_ID Blood_Pressure OBS_TYPE 92 1900 90.0 DBP 94 1900 90.0 DBP 174 2900 140.0 SBP 176 2900 130.0 SBP 180 3900 120.0 SBP 268 3900 150.0 SBP 268 3900 90.0 DBP I need to obtain those with 2+ DBP=90 or 2+ SBP=140. PT_ID=1900, he has 2 DBP=90, so he will be included. PT_ID=2900, he has 1 SBP=140, so he will NOT be included. PT_ID=3900, he has 1 SBP=140 and 1 DBP=90, so he will still NOT be included. So, the condition requires TWO OR MORE values higher than the threshold. It could be either SBP or DBP or both of them. I have tried ddply, but I dont know how to add the condition 2+ inside ddply. Any help is appreciated!! Weijia [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Locate Patients who have multiple high blood pressure readings
Well, since no one has responded Please use ?dput to provide data in your posts. There are likely zillions of way to go about this. Following is one way based on ?duplicated that I think works, but I make no claims for either elegance or efficiency. Others may do lots better. But maybe it suffices. ## Untested ## I assume the data is provided in a data frame named dd. ## All PT_ID's with =1 high readings in SBP or in DBP hiS - with(dd,PT_ID[OBS_TYPE == SBP Blood_Pressure = 140]) hiD - with(dd,PT_ID[OBS_TYPE == DBP Blood_Pressure =90]) ## id's that appear more than once in either union(unique(hiS[duplicated(hiS)]), unique(hiD[duplicated(hiD)]) ## you can subset your data frame to match just these, e.g. via %in%, if you like. Cheers, Bert On Thu, Jan 31, 2013 at 7:51 AM, Weijia Wang wwang@gmail.com wrote: On Thu, Jan 31, 2013 at 10:29 AM, Weijia Wang wwang@gmail.com wrote: Hi, I have a new question about subsetting in R. Say we have this data frame: PT_ID Blood_Pressure OBS_TYPE 92 1900 90.0 DBP 94 1900 90.0 DBP 174 2900 140.0 SBP 176 2900 130.0 SBP 180 3900 120.0 SBP 268 3900 150.0 SBP 268 3900 90.0 DBP I need to obtain those with 2+ DBP=90 or 2+ SBP=140. PT_ID=1900, he has 2 DBP=90, so he will be included. PT_ID=2900, he has 1 SBP=140, so he will NOT be included. PT_ID=3900, he has 1 SBP=140 and 1 DBP=90, so he will still NOT be included. So, the condition requires TWO OR MORE values higher than the threshold. It could be either SBP or DBP or both of them. I have tried ddply, but I don’t know how to add the condition 2+ inside ddply. Any help is appreciated!! Weijia [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Locate Patients who have multiple high blood pressure readings
dd - # from dput() structure(list(ColA = c(92L, 94L, 174L, 176L, 180L, 268L, 268L ), PT_ID = c(1900L, 1900L, 2900L, 2900L, 3900L, 3900L, 3900L), Blood_Pressure = c(90, 90, 140, 130, 120, 150, 90), OBS_TYPE = structure(c(1L, 1L, 2L, 2L, 2L, 2L, 1L), .Label = c(DBP, SBP), class = factor)), .Names = c(ColA, PT_ID, Blood_Pressure, OBS_TYPE), class = data.frame, row.names = c(NA, -7L)) library(plyr) ddply(dd, .(PT_ID), summarize, Include=sum(OBS_TYPE==DBP Blood_Pressure=90)=2 || sum(OBS_TYPE==SBP Blood_Pressure=140)=2) PT_ID Include 1 1900TRUE 2 2900 FALSE 3 3900 FALSE sum(logicalVector) tells how many TRUE's are in logicalVector. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Bert Gunter Sent: Thursday, January 31, 2013 9:52 AM To: Weijia Wang Cc: r-help@r-project.org Subject: Re: [R] Locate Patients who have multiple high blood pressure readings Well, since no one has responded Please use ?dput to provide data in your posts. There are likely zillions of way to go about this. Following is one way based on ?duplicated that I think works, but I make no claims for either elegance or efficiency. Others may do lots better. But maybe it suffices. ## Untested ## I assume the data is provided in a data frame named dd. ## All PT_ID's with =1 high readings in SBP or in DBP hiS - with(dd,PT_ID[OBS_TYPE == SBP Blood_Pressure = 140]) hiD - with(dd,PT_ID[OBS_TYPE == DBP Blood_Pressure =90]) ## id's that appear more than once in either union(unique(hiS[duplicated(hiS)]), unique(hiD[duplicated(hiD)]) ## you can subset your data frame to match just these, e.g. via %in%, if you like. Cheers, Bert On Thu, Jan 31, 2013 at 7:51 AM, Weijia Wang wwang@gmail.com wrote: On Thu, Jan 31, 2013 at 10:29 AM, Weijia Wang wwang@gmail.com wrote: Hi, I have a new question about subsetting in R. Say we have this data frame: PT_ID Blood_Pressure OBS_TYPE 92 1900 90.0 DBP 94 1900 90.0 DBP 174 2900 140.0 SBP 176 2900 130.0 SBP 180 3900 120.0 SBP 268 3900 150.0 SBP 268 3900 90.0 DBP I need to obtain those with 2+ DBP=90 or 2+ SBP=140. PT_ID=1900, he has 2 DBP=90, so he will be included. PT_ID=2900, he has 1 SBP=140, so he will NOT be included. PT_ID=3900, he has 1 SBP=140 and 1 DBP=90, so he will still NOT be included. So, the condition requires TWO OR MORE values higher than the threshold. It could be either SBP or DBP or both of them. I have tried ddply, but I don’t know how to add the condition 2+ inside ddply. Any help is appreciated!! Weijia [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb- biostatistics/pdb-ncb-home.htm __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Locate Patients who have multiple high blood pressure readings
On Thu, Jan 31, 2013 at 10:51 AM, Weijia Wang wwang@gmail.com wrote: On Thu, Jan 31, 2013 at 10:29 AM, Weijia Wang wwang@gmail.com wrote: Hi, I have a new question about subsetting in R. Say we have this data frame: PT_ID Blood_Pressure OBS_TYPE 92 1900 90.0 DBP 94 1900 90.0 DBP 174 2900 140.0 SBP 176 2900 130.0 SBP 180 3900 120.0 SBP 268 3900 150.0 SBP 268 3900 90.0 DBP I need to obtain those with 2+ DBP=90 or 2+ SBP=140. PT_ID=1900, he has 2 DBP=90, so he will be included. PT_ID=2900, he has 1 SBP=140, so he will NOT be included. PT_ID=3900, he has 1 SBP=140 and 1 DBP=90, so he will still NOT be included. So, the condition requires TWO OR MORE values higher than the threshold. It could be either SBP or DBP or both of them. I have tried ddply, but I don’t know how to add the condition 2+ inside ddply. This can be specified in a reasonably natural fashion using SQL. Here DF is the input data frame.: library(sqldf) sqldf(select + PT_ID, + sum(Blood_Pressure = 90 and OBS_TYPE == 'DBP') DBP, + sum(Blood_Pressure = 140 and OBS_TYPE == 'SBP') SBP +from DF +group by PT_ID +having DBP = 2 or SBP = 2) PT_ID DBP SBP 1 1900 2 0 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Locate Patients who have multiple high blood pressure readings
Hi, May be this helps: #dd res-data.frame(Include=with(subset(dd,OBS_TYPE == SBP Blood_Pressure = 140|OBS_TYPE==DBP Blood_Pressure=90),apply(tapply(Blood_Pressure,list(PT_ID,OBS_TYPE),length)=2,1,any,na.rm=T))) res # Include #1900 TRUE #2900 FALSE #3900 FALSE A.K. - Original Message - From: Weijia Wang wwang@gmail.com To: r-help@r-project.org Cc: Sent: Thursday, January 31, 2013 10:51 AM Subject: [R] Locate Patients who have multiple high blood pressure readings On Thu, Jan 31, 2013 at 10:29 AM, Weijia Wang wwang@gmail.com wrote: Hi, I have a new question about subsetting in R. Say we have this data frame: PT_ID Blood_Pressure OBS_TYPE 92 1900 90.0 DBP 94 1900 90.0 DBP 174 2900 140.0 SBP 176 2900 130.0 SBP 180 3900 120.0 SBP 268 3900 150.0 SBP 268 3900 90.0 DBP I need to obtain those with 2+ DBP=90 or 2+ SBP=140. PT_ID=1900, he has 2 DBP=90, so he will be included. PT_ID=2900, he has 1 SBP=140, so he will NOT be included. PT_ID=3900, he has 1 SBP=140 and 1 DBP=90, so he will still NOT be included. So, the condition requires TWO OR MORE values higher than the threshold. It could be either SBP or DBP or both of them. I have tried ddply, but I don’t know how to add the condition 2+ inside ddply. Any help is appreciated!! Weijia [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.