[R] Locate Patients who have multiple high blood pressure readings

2013-01-31 Thread Weijia Wang
On Thu, Jan 31, 2013 at 10:29 AM, Weijia Wang wwang@gmail.com wrote:

 Hi,



 I have a new question about subsetting in R.



 Say we have this data frame:



 PT_ID Blood_Pressure OBS_TYPE

 92   1900  90.0  DBP

 94   1900  90.0  DBP

 174  2900 140.0  SBP

 176  2900 130.0  SBP

 180  3900 120.0  SBP

 268  3900 150.0  SBP

 268  3900  90.0  DBP



 I need to obtain those with 2+ DBP=90 or 2+ SBP=140.



 PT_ID=1900, he has 2 DBP=90, so he will be included.

 PT_ID=2900, he has 1 SBP=140, so he will NOT be included.

 PT_ID=3900, he has 1 SBP=140 and 1 DBP=90, so he will still NOT be
 included.



 So, the condition requires TWO OR MORE values higher than the threshold.
 It could be either SBP or DBP or both of them.



 I have tried ddply, but I don’t know how to add the condition 2+ inside
 ddply.



 Any help is appreciated!!



 Weijia




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Locate Patients who have multiple high blood pressure readings

2013-01-31 Thread Bert Gunter
Well, since no one has responded


Please use ?dput to provide data in your posts.

 There are likely  zillions of way to go about this. Following is one
way based on ?duplicated that I think works, but I make no claims for
either elegance or efficiency. Others may do lots better. But maybe it
suffices.


## Untested
## I assume the data is provided in a data frame named dd.

## All PT_ID's with =1 high readings in SBP or in DBP
 hiS - with(dd,PT_ID[OBS_TYPE == SBP  Blood_Pressure = 140])
 hiD -  with(dd,PT_ID[OBS_TYPE == DBP  Blood_Pressure  =90])

## id's that appear more than once in either
 union(unique(hiS[duplicated(hiS)]), unique(hiD[duplicated(hiD)])

## you can subset your data frame to match just these,  e.g. via
%in%, if you like.


Cheers,
Bert




On Thu, Jan 31, 2013 at 7:51 AM, Weijia Wang wwang@gmail.com wrote:
 On Thu, Jan 31, 2013 at 10:29 AM, Weijia Wang wwang@gmail.com wrote:

 Hi,



 I have a new question about subsetting in R.



 Say we have this data frame:



 PT_ID Blood_Pressure OBS_TYPE

 92   1900  90.0  DBP

 94   1900  90.0  DBP

 174  2900 140.0  SBP

 176  2900 130.0  SBP

 180  3900 120.0  SBP

 268  3900 150.0  SBP

 268  3900  90.0  DBP



 I need to obtain those with 2+ DBP=90 or 2+ SBP=140.



 PT_ID=1900, he has 2 DBP=90, so he will be included.

 PT_ID=2900, he has 1 SBP=140, so he will NOT be included.

 PT_ID=3900, he has 1 SBP=140 and 1 DBP=90, so he will still NOT be
 included.



 So, the condition requires TWO OR MORE values higher than the threshold.
 It could be either SBP or DBP or both of them.



 I have tried ddply, but I don’t know how to add the condition 2+ inside
 ddply.



 Any help is appreciated!!



 Weijia




 [[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Locate Patients who have multiple high blood pressure readings

2013-01-31 Thread William Dunlap
 dd -  # from dput() 
structure(list(ColA = c(92L, 94L, 174L, 176L, 180L, 268L, 268L
), PT_ID = c(1900L, 1900L, 2900L, 2900L, 3900L, 3900L, 3900L),
Blood_Pressure = c(90, 90, 140, 130, 120, 150, 90), OBS_TYPE = 
structure(c(1L,
1L, 2L, 2L, 2L, 2L, 1L), .Label = c(DBP, SBP), class = factor)), 
.Names = c(ColA,
PT_ID, Blood_Pressure, OBS_TYPE), class = data.frame, row.names = c(NA,
-7L))
 library(plyr)
 ddply(dd, .(PT_ID), summarize, Include=sum(OBS_TYPE==DBP  
 Blood_Pressure=90)=2 || sum(OBS_TYPE==SBP  Blood_Pressure=140)=2)
  PT_ID Include
1  1900TRUE
2  2900   FALSE
3  3900   FALSE

sum(logicalVector) tells how many TRUE's are in logicalVector.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
 Behalf
 Of Bert Gunter
 Sent: Thursday, January 31, 2013 9:52 AM
 To: Weijia Wang
 Cc: r-help@r-project.org
 Subject: Re: [R] Locate Patients who have multiple high blood pressure 
 readings
 
 Well, since no one has responded
 
 
 Please use ?dput to provide data in your posts.
 
  There are likely  zillions of way to go about this. Following is one
 way based on ?duplicated that I think works, but I make no claims for
 either elegance or efficiency. Others may do lots better. But maybe it
 suffices.
 
 
 ## Untested
 ## I assume the data is provided in a data frame named dd.
 
 ## All PT_ID's with =1 high readings in SBP or in DBP
  hiS - with(dd,PT_ID[OBS_TYPE == SBP  Blood_Pressure = 140])
  hiD -  with(dd,PT_ID[OBS_TYPE == DBP  Blood_Pressure  =90])
 
 ## id's that appear more than once in either
  union(unique(hiS[duplicated(hiS)]), unique(hiD[duplicated(hiD)])
 
 ## you can subset your data frame to match just these,  e.g. via
 %in%, if you like.
 
 
 Cheers,
 Bert
 
 
 
 
 On Thu, Jan 31, 2013 at 7:51 AM, Weijia Wang wwang@gmail.com wrote:
  On Thu, Jan 31, 2013 at 10:29 AM, Weijia Wang wwang@gmail.com wrote:
 
  Hi,
 
 
 
  I have a new question about subsetting in R.
 
 
 
  Say we have this data frame:
 
 
 
  PT_ID Blood_Pressure OBS_TYPE
 
  92   1900  90.0  DBP
 
  94   1900  90.0  DBP
 
  174  2900 140.0  SBP
 
  176  2900 130.0  SBP
 
  180  3900 120.0  SBP
 
  268  3900 150.0  SBP
 
  268  3900  90.0  DBP
 
 
 
  I need to obtain those with 2+ DBP=90 or 2+ SBP=140.
 
 
 
  PT_ID=1900, he has 2 DBP=90, so he will be included.
 
  PT_ID=2900, he has 1 SBP=140, so he will NOT be included.
 
  PT_ID=3900, he has 1 SBP=140 and 1 DBP=90, so he will still NOT be
  included.
 
 
 
  So, the condition requires TWO OR MORE values higher than the threshold.
  It could be either SBP or DBP or both of them.
 
 
 
  I have tried ddply, but I don’t know how to add the condition 2+ inside
  ddply.
 
 
 
  Any help is appreciated!!
 
 
 
  Weijia
 
 
 
 
  [[alternative HTML version deleted]]
 
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
 
 
 --
 
 Bert Gunter
 Genentech Nonclinical Biostatistics
 
 Internal Contact Info:
 Phone: 467-7374
 Website:
 http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-
 biostatistics/pdb-ncb-home.htm
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Locate Patients who have multiple high blood pressure readings

2013-01-31 Thread Gabor Grothendieck
On Thu, Jan 31, 2013 at 10:51 AM, Weijia Wang wwang@gmail.com wrote:
 On Thu, Jan 31, 2013 at 10:29 AM, Weijia Wang wwang@gmail.com wrote:

 Hi,



 I have a new question about subsetting in R.



 Say we have this data frame:



 PT_ID Blood_Pressure OBS_TYPE

 92   1900  90.0  DBP

 94   1900  90.0  DBP

 174  2900 140.0  SBP

 176  2900 130.0  SBP

 180  3900 120.0  SBP

 268  3900 150.0  SBP

 268  3900  90.0  DBP



 I need to obtain those with 2+ DBP=90 or 2+ SBP=140.



 PT_ID=1900, he has 2 DBP=90, so he will be included.

 PT_ID=2900, he has 1 SBP=140, so he will NOT be included.

 PT_ID=3900, he has 1 SBP=140 and 1 DBP=90, so he will still NOT be
 included.



 So, the condition requires TWO OR MORE values higher than the threshold.
 It could be either SBP or DBP or both of them.



 I have tried ddply, but I don’t know how to add the condition 2+ inside
 ddply.


This can be specified in a reasonably natural fashion using SQL. Here
DF is the input data frame.:

 library(sqldf)
 sqldf(select
+   PT_ID,
+   sum(Blood_Pressure = 90 and OBS_TYPE == 'DBP') DBP,
+   sum(Blood_Pressure = 140 and OBS_TYPE == 'SBP') SBP
+from DF
+group by PT_ID
+having DBP = 2 or SBP = 2)
  PT_ID DBP SBP
1  1900   2   0

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Locate Patients who have multiple high blood pressure readings

2013-01-31 Thread arun
Hi,

May be this helps:

#dd
res-data.frame(Include=with(subset(dd,OBS_TYPE == SBP  Blood_Pressure = 
140|OBS_TYPE==DBP  
Blood_Pressure=90),apply(tapply(Blood_Pressure,list(PT_ID,OBS_TYPE),length)=2,1,any,na.rm=T)))
res
 #    Include
#1900    TRUE
#2900   FALSE
#3900   FALSE
A.K.




- Original Message -
From: Weijia Wang wwang@gmail.com
To: r-help@r-project.org
Cc: 
Sent: Thursday, January 31, 2013 10:51 AM
Subject: [R] Locate Patients who have multiple high blood pressure readings

On Thu, Jan 31, 2013 at 10:29 AM, Weijia Wang wwang@gmail.com wrote:

 Hi,



 I have a new question about subsetting in R.



 Say we have this data frame:



     PT_ID Blood_Pressure OBS_TYPE

 92   1900      90.0      DBP

 94   1900      90.0      DBP

 174  2900     140.0      SBP

 176  2900     130.0      SBP

 180  3900     120.0      SBP

 268  3900     150.0      SBP

 268  3900      90.0      DBP



 I need to obtain those with 2+ DBP=90 or 2+ SBP=140.



 PT_ID=1900, he has 2 DBP=90, so he will be included.

 PT_ID=2900, he has 1 SBP=140, so he will NOT be included.

 PT_ID=3900, he has 1 SBP=140 and 1 DBP=90, so he will still NOT be
 included.



 So, the condition requires TWO OR MORE values higher than the threshold.
 It could be either SBP or DBP or both of them.



 I have tried ddply, but I don’t know how to add the condition 2+ inside
 ddply.



 Any help is appreciated!!



 Weijia




    [[alternative HTML version deleted]]


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.