from:"Muhuri, Pradip \(SAMHSA\/CBHSQ\)"

[R] R Error: wrong result size (...), expected ... or 1 (minimal example provided)

2015-05-01 Thread Muhuri, Pradip (SAMHSA/CBHSQ)

Hello,

I am reposting my question with a reproducible example/minimal dataset (6 rows) 
this time.

I have written a user-defined function (myFunc below) with ten arguments. When 
calling the function, I get the following message: �Error: wrong result size 
(0), expected 2 or 1�.
I am not getting the desired output dataset that will have 2 rows. How would I 
resolve the issue?   Any hints would be appreciated.


These results are from the following code chunk  outside myFunc:

addmargins(table(xanloid_set$cohort_type))



NMPR_Cohort  OID_Cohort   Other Sum
  2   1   3   6

.

Thanks,

Pradip Muhuri





# myFunc_rev.R
setwd (H:/R/cis_data)
library(dplyr)
rm(list = ls())
# data object - description
temp - id  intdate anldate oiddate herdate cohort_type
1 2004-11-04 2002-07-18 2001-07-07 2003-11-03  NMPR_Cohort
2 2004-10-24 NA 2002-10-13 NA  OID_Cohort
3 2004-10-10 NA NA NA  Other
4 2004-09-01 1999-08-10 NA 2002-11-04  NMPR_Cohort
5 2004-09-04 1997-10-05 NA NA  Other
6 2004-10-25 NA NA 2011-11-04  Other
# read the data object
xanloid_set - read.table(textConnection(temp),
   colClasses=c(character, Date, Date, Date, 
Date, character),
   header=TRUE, as.is=TRUE
)
# print the data object
xanloid_set
# Define user-defined function
myFunc - function (newdata,
oridata,
cohort,
value,
xdate_to_int_time,
xflag,
idate,
xdate,
xdate_to_int_time_cat,
year) {

newdata  -filter (oridata, cohort== value ) %%
   mutate(xdate_to_int_time = ifelse(xflag==1, 
(idate-xdate)/365.25, NA),
   xdate_to_int_time_cat = 
cut(xdate_to_int_time, breaks=c(0,1,2,3,4,5,6,7),
 
include.lowest=TRUE, stringsAsFactors = FALSE) )
addmargins(with(newdata, table(year, xdate_to_int_time_cat, 
na.rm=TRUE)))
}
# invoke user defined function
myFunc (  newdata=nmpr_nmproid,
oridata=xanloid_set,
cohort=xanloid_set$cohort_type,
value= NMPR_Cohort,
xdate_to_int_time=anl_to_int_time,
xflag=xanloid_set$anlflag,
idate=xanloid_set$intdate,
xdate=xanloid_set$anldate,
xdate_to_int_time_cat=xanloid_set$anl_to_int_time_cat,
year=xanloid_set$xyear
)
# tabulate cohort_type
  addmargins(table(xanloid_set$cohort_type))



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] R Error: wrong result size (...), expected ... or 1”

2015-04-28 Thread Muhuri, Pradip (SAMHSA/CBHSQ)

Hello,

I have written a user-defined function (myFunc below) with ten arguments. When 
calling the function, I get the following message: �Error: wrong result size 
(816841), expected 52939 or 1�.
myFunc involves a data frame (named xanloid_set), which has 816841 rows.  R is 
correct to say that I was expecting only 52939 rows because of the filter() 
function.

These results are from the following code outside myFunc:
addmargins(table(xanloid_set$cohort_type))

NMPR_Cohort  OID_Cohort  Others Sum
  52939  158192  605710  816841


How would I resolve the issue: error message from the muFunc?  Any hints would 
be appreciated.

Thanks,

Pradip Muhuri






#count_nmpr_oid_nmproid_by_year.R
setwd (H:/R/cis_data)
library(dplyr)
library(knitr)
rm(list = ls())


myFunc - function (newdata,
oridata,
cohort,
value,
xdate_to_int_time,
xflag,
idate,
xdate,
xdate_to_int_time_cat,
year) {

newdata  -filter (oridata, cohort== value ) %%
   mutate(xdate_to_int_time = ifelse(xflag==1, 
(idate-xdate)/365.25, NA),
   xdate_to_int_time_cat = 
cut(xdate_to_int_time, breaks=c(0,1,2,3,4,5,6,7),
 
include.lowest=TRUE, stringsAsFactors = FALSE) )
addmargins(with(newdata, table(year, 
xdate_to_int_time_cat)))
}

load(xanloid_set.rdata)
myFunc (  newdata=nmpr_nmproid,
oridata=xanloid_set,
cohort=xanloid_set$cohort_type,
value= NMPR_Cohort,
xdate_to_int_time=anl_to_int_time,
xflag=xanloid_set$anlflag,
idate=xanloid_set$intdate,
xdate=xanloid_set$anldate,
xdate_to_int_time_cat=xanloid_set$anl_to_int_time_cat,
year=xanloid_set$xyear
)

Pradip K. Muhuri, PhD
SAMHSA/CBHSQ
1 Choke Cherry Road, Room 2-1071
Rockville, MD 20857
Tel: 240-276-1070
Fax: 240-276-1260


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R example codes for direct standardization of rates

2015-01-07 Thread Muhuri, Pradip (SAMHSA/CBHSQ)

Hello Terry,

Thank you so much for sending me this reference.

Pradip



Pradip K. Muhuri, PhD
SAMHSA/CBHSQ
1 Choke Cherry Road, Room 2-1071
Rockville, MD 20857
Tel: 240-276-1070
Fax: 240-276-1260

From: Therneau, Terry M., Ph.D. [mailto:thern...@mayo.edu]
Sent: Wednesday, January 07, 2015 4:39 PM
To: Muhuri, Pradip (SAMHSA/CBHSQ); r-help@R-project.org
Subject: Re: R example codes for direct standardization of rates

The pyears() and survexp() routines in the survival package are designed for 
these calculations.
See the technical report #63 of the Mayo Biostat group for examples


http://www.mayo.edu/research/departments-divisions/department-health-sciences-research/division-biomedical-statistics-infomatics/technical-reportshttp://www.mayo.edu/research/departments-divisions/department-health-sciences-research/division-biomedical-statistics-informatics/technical-reports

Terry Therneau

-- begin included message ---

I am looking for R  example codes to compute age-standardized death rates by 
smoking and psychological distress status using person-years of observation 
created from the National Health Interview Survey Linked Mortality Files.  Any 
help with the example codes or references will be appreciated.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R function to convert person-level observations to person-period observations

2015-01-03 Thread Muhuri, Pradip (SAMHSA/CBHSQ)

Hello David,

Thank you so much for your advice.The revision of the code as reve - 
data[, event] in the function (but with no changing of the example data) seems 
to provide the desired results (shown below).   These 3 subjects are followed 
for 5 years.  Subject A experienced the event in year 2, and subject C 
experienced the event in year 3 while subject B were censored at the end 
follow-up period (i.e., year 5).  The person-period observations now seem to be 
consistent with the person-level observations.  Do you see any issues? 

Regards,

Pradip

###
## person-level observations   
 ID dead studyyrs
1  A12
2  B05
3  C13

## person-period observation
   ID dead studyyrs
1   A01
2   A12
3   B01
4   B02
5   B03
6   B04
7   B05
8   C01
9   C02
10  C13

Pradip K. Muhuri, PhD
SAMHSA/CBHSQ
1 Choke Cherry Road, Room 2-1071
Rockville, MD 20857
Tel: 240-276-1070
Fax: 240-276-1260

-Original Message-
From: David Barron [mailto:dnbar...@gmail.com] 
Sent: Saturday, January 03, 2015 10:19 AM
To: Muhuri, Pradip (SAMHSA/CBHSQ)
Cc: r-help@r-project.org
Subject: Re: [R] R function to convert person-level observations to 
person-period observations

Your data are wrong. The 'event' variable (dead in your example) needs to be 1 
for cases that end in an event and 0 for spells that are
censored: yours is the other way around.  If you change the 'dead'
variable to c(1,0,1) you will get the desired result.

If you really need to reverse the behaviour of the function, change the line 
reve - !data[, event] to reve - data[, event]

David

On 3 January 2015 at 13:20, Muhuri, Pradip (SAMHSA/CBHSQ) 
pradip.muh...@samhsa.hhs.gov wrote:
 Hello,

 I was trying to convert person-level observations to person-period 
 observations using an R custom function obtained from the UCLA web site 
 (http://www.ats.ucla.edu/stat/r/faq/person_period.htm).  Please see my 
 reproducible example below.  The function (PLPP) in the R script takes five 
 arguments.


 1)  data (i.e., the data set to be converted)

 2)  id (i.e., the identifier for each observation)

 3)  period (i.e., number pf periods the person or observation was 
 followed-up)

 4)  event (i.e., the variable that indicates whether the event occurred or 
 not or whether the observation was censored (depending on which direction you 
 are converting).

 5)  direction which indicates whether the function should go from 
 person-level to person-period or from person-period to person-level.
 On my example data set, the R script ran successfully.  Based on 3 
 person-level observations (A died in year 2, B is censored in year 5, C died 
 in year 3), I get 10 period-level observations - correct results.   But the 
 issue is that the value of the dead indicator variable is incorrect.  I 
 have a gut feeling that the function needs to tweaked a bit to get desired 
 results.


 Correct results
   ID dead   studyyrs
 1  A12
 2  B05
 3  C13

 Incorrect results - the dead column

ID deadstudyyrs

 1   A01

 2   A02

 3   B01

 4   B02

 5   B03

 6   B04

 7   B15

 8   C01

 9   C02

 10  C03




 Desired results

ID deadstudyyrs

 1   A01

 2   A12

 3   B01

 4   B02

 5   B03

 6   B04

 7   B05

 8   C01

 9   C02

 10  C13


 I would appreciate receiving your help or hints for resolving the 
 issue.  Thanks,



 ##  Below is my reproducible code is shown below)

 ## Below is my data frame (3 observations) df - data.frame( 
 ID=LETTERS[1:3], dead=c(1,0,1), studyyrs=c(2,5,3) ) df

 ## Person-Level Person-Period Converter Function - Source: 
 http://www.ats.ucla.edu/stat/r/faq/person_period.htm
 PLPP - function(data, id, period, event, direction = c(period, level)) {
   ## Data Checking and Verification Steps
   stopifnot(is.matrix(data) || is.data.frame(data))
   stopifnot(c(id, period, event) %in% c(colnames(data), 1:ncol(data)))

   if (any(is.na(data[, c(id, period, event)]))) {
 stop(PLPP cannot currently handle missing data in the id, period, or 
 event variables)
   }

   ## Do the conversion - Source: 
 http://www.ats.ucla.edu/stat/r/faq/person_period.htm
   switch(match.arg(direction),
  period = {
index - rep(1:nrow(data), data[, period])
idmax - cumsum(data[, period])
reve - !data[, event]
dat - data[index, ]
dat[, period] - ave(dat[, period], dat[, id], FUN = seq_along)
dat[, event] - 0
dat[idmax

[R] R function to convert person-level observations to person-period observations

2015-01-03 Thread Muhuri, Pradip (SAMHSA/CBHSQ)

Hello,

I was trying to convert person-level observations to person-period observations 
using an R custom function obtained from the UCLA web site 
(http://www.ats.ucla.edu/stat/r/faq/person_period.htm).  Please see my 
reproducible example below.  The function (PLPP) in the R script takes five 
arguments.


1)  data (i.e., the data set to be converted)

2)  id (i.e., the identifier for each observation)

3)  period (i.e., number pf periods the person or observation was followed-up)

4)  event (i.e., the variable that indicates whether the event occurred or not 
or whether the observation was censored (depending on which direction you are 
converting).

5)  direction which indicates whether the function should go from person-level 
to person-period or from person-period to person-level.
On my example data set, the R script ran successfully.  Based on 3 person-level 
observations (A died in year 2, B is censored in year 5, C died in year 3), I 
get 10 period-level observations - correct results.   But the issue is that the 
value of the dead indicator variable is incorrect.  I have a gut feeling that 
the function needs to tweaked a bit to get desired results.


Correct results
  ID dead   studyyrs
1  A12
2  B05
3  C13

Incorrect results - the dead column

   ID deadstudyyrs

1   A01

2   A02

3   B01

4   B02

5   B03

6   B04

7   B15

8   C01

9   C02

10  C03




Desired results

   ID deadstudyyrs

1   A01

2   A12

3   B01

4   B02

5   B03

6   B04

7   B05

8   C01

9   C02

10  C13


I would appreciate receiving your help or hints for resolving the issue.  
Thanks,



##  Below is my reproducible code is shown below)

## Below is my data frame (3 observations)
df - data.frame( ID=LETTERS[1:3], dead=c(1,0,1), studyyrs=c(2,5,3) )
df

## Person-Level Person-Period Converter Function - Source: 
http://www.ats.ucla.edu/stat/r/faq/person_period.htm
PLPP - function(data, id, period, event, direction = c(period, level)) {
  ## Data Checking and Verification Steps
  stopifnot(is.matrix(data) || is.data.frame(data))
  stopifnot(c(id, period, event) %in% c(colnames(data), 1:ncol(data)))

  if (any(is.na(data[, c(id, period, event)]))) {
stop(PLPP cannot currently handle missing data in the id, period, or event 
variables)
  }

  ## Do the conversion - Source: 
http://www.ats.ucla.edu/stat/r/faq/person_period.htm
  switch(match.arg(direction),
 period = {
   index - rep(1:nrow(data), data[, period])
   idmax - cumsum(data[, period])
   reve - !data[, event]
   dat - data[index, ]
   dat[, period] - ave(dat[, period], dat[, id], FUN = seq_along)
   dat[, event] - 0
   dat[idmax, event] - reve},
 level = {
   tmp - cbind(data[, c(period, id)], i = 1:nrow(data))
   index - as.vector(by(tmp, tmp[, id],
 FUN = function(x) x[which.max(x[, period]), 
i]))
   dat - data[index, ]
   dat[, event] - as.integer(!dat[, event])
 })

  rownames(dat) - NULL
  return(dat)
}

tpp - PLPP(data = df, id = ID, period = studyyrs,
event = dead, direction = period)
tpp



Pradip K. Muhuri,
SAMHSA/CBHSQ


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] R example codes for direct standardization of rates (Reference: Thoma's Lumley's survey package)

2014-12-30 Thread Muhuri, Pradip (SAMHSA/CBHSQ)

Hello,

I am looking for R  example codes to compute age-standardized death rates by 
smoking and psychological distress status using person-years of observation 
created from the National Health Interview Survey Linked Mortality Files.  Any 
help with the example codes or references will be appreciated.

Thanks,

Pradip

Pradip K. Muhuri
SAMHSA/CBHSQ
1 Choke Cherry Road, Room 2-1071
Rockville, MD 20857
Tel: 240-276-1070
Fax: 240-276-1260


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R example codes for direct standardization of rates (Reference: Thoma's Lumley's survey package)

2014-12-30 Thread Muhuri, Pradip (SAMHSA/CBHSQ)

Hi Anthony,

Thank you for sending me your well-documented R scripts that are meant for 
age-adjusted rate calculations.  I will keep you posted on the implementation 
of these scripts in the context of my analyses.

Pradip

Pradip K. Muhuri, PhD
SAMHSA/CBHSQ
1 Choke Cherry Road, Room 2-1071
Rockville, MD 20857
Tel: 240-276-1070
Fax: 240-276-1260

From: Anthony Damico [mailto:ajdam...@gmail.com]
Sent: Tuesday, December 30, 2014 3:01 PM
To: Muhuri, Pradip (SAMHSA/CBHSQ)
Cc: r-help@r-project.org
Subject: Re: [R] R example codes for direct standardization of rates 
(Reference: Thoma's Lumley's survey package)

hi pradip hope you're doing well!  these two scripts have age adjustment 
calculations, but neither are specific to nhis.  the nhanes example is probably 
closer to what you're trying to do :)


https://github.com/ajdamico/usgsd/blob/master/National%20Health%20and%20Nutrition%20Examination%20Survey/2009-2010%20interview%20plus%20laboratory%20-%20download%20and%20analyze.R

https://github.com/ajdamico/usgsd/blob/master/National%20Vital%20Statistics%20System/replicate%20age-adjusted%20death%20rate.R


On Tue, Dec 30, 2014 at 2:55 PM, Muhuri, Pradip (SAMHSA/CBHSQ) 
pradip.muh...@samhsa.hhs.govmailto:pradip.muh...@samhsa.hhs.gov wrote:
Hello,

I am looking for R  example codes to compute age-standardized death rates by 
smoking and psychological distress status using person-years of observation 
created from the National Health Interview Survey Linked Mortality Files.  Any 
help with the example codes or references will be appreciated.

Thanks,

Pradip

Pradip K. Muhuri
SAMHSA/CBHSQ
1 Choke Cherry Road, Room 2-1071
Rockville, MD 20857
Tel: 240-276-1070tel:240-276-1070
Fax: 240-276-1260tel:240-276-1260


[[alternative HTML version deleted]]

__
R-help@r-project.orgmailto:R-help@r-project.org mailing list -- To 
UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Getting the most recent dates in a new column from dates in four columns using the dplyr package (mutate verb)

2014-12-04 Thread Muhuri, Pradip (SAMHSA/CBHSQ)

Hello Jeff,

Your code has given me desired results, and your advice is well taken.  I agree 
with you regarding the use of logical indexing for testing conditions.   Thank 
you so much for your time and advice.

Pradip

Pradip K. Muhuri
SAMHSA/CBHSQ
1 Choke Cherry Road, Room 2-1071
Rockville, MD 20857
Tel: 240-276-1070
Fax: 240-276-1260


-Original Message-
From: Jeff Newmiller [mailto:jdnew...@dcn.davis.ca.us] 
Sent: Thursday, December 04, 2014 1:20 PM
To: Muhuri, Pradip (SAMHSA/CBHSQ); r-help@r-project.org
Subject: Re: [R] Getting the most recent dates in a new column from dates in 
four columns using the dplyr package (mutate verb)

There is something weird going on with mutate's interaction with the scalar 
Date objects. It seems to be passing them to max as constants of mode double.

Regardless, use of rowwise should be very rare, and you are definitely abusing 
it. Learn to work with vectors of values rather than one value at a time.

new3 - example.data %%
  mutate( oiddate = pmax( mrjdate, cocdate, inhdate, haldate, na.rm=TRUE)
   , na.date.cases= as.numeric( !is.na( oiddate ) ) )

You might find it more useful to not convert the result of is.na to numeric... 
logical indexing can use that more efficiently than testing which rows have 
na.date.cases==1.
---
Jeff NewmillerThe .   .  Go Live...
DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
---
Sent from my phone. Please excuse my brevity.

On December 3, 2014 7:43:37 PM PST, Muhuri, Pradip (SAMHSA/CBHSQ) 
pradip.muh...@samhsa.hhs.gov wrote:
Hello Chel and David,

Thank you very much for providing new insights into this issue.  Here 
is one more question.  Why  does the mutate () give incorrect results 
here?

# The following gives INCORRECT results - mutated()ed object 
na.date.cases = ifelse(!is.na(oiddate),1,0)

# The following gives CORRECT results
new2$na.date.cases = ifelse(!is.na(new2$oiddate),1,0)

###  reproducible example - slightly 
revised/modified  ###
library(dplyr)
# data object - description

temp - id  mrjdate cocdate inhdate haldate
1 2004-11-04 2008-07-18 2005-07-07 2007-11-07
2 NA NA NA NA 
3 2009-10-24 NA 2011-10-13 NA
4 2007-10-10 NA NA NA
5 2006-09-01 2005-08-10 NA NA
6 2007-09-04 2011-10-05 NA NA
7 2005-10-25 NA NA 2011-11-04

# read the data object

example.data - read.table(textConnection(temp), 
   colClasses=c(character, Date, Date, Date, Date),  
header=TRUE, as.is=TRUE
)


# create a new column -dplyr solution (Acknowledgement: Arun)

new1 - example.data %% 
 rowwise() %%
mutate(oiddate=as.Date(max(mrjdate,cocdate, inhdate, haldate, 
na.rm=TRUE), origin='1970-01-01'),
 na.date.cases = ifelse(!is.na(oiddate),1,0)
 )

# create a new column - Base R solution (Acknowlegement: Mark Sharp)

new2 - example.data
new2$oiddate - as.Date(sapply(seq_along(new2$id), function(row) { if 
(all(is.na(unlist(example.data[row, c('mrjdate','cocdate', 'inhdate', 
'haldate')] {
max_d - NA
  } else {
max_d - max(unlist(example.data[row, c('mrjdate','cocdate', 'inhdate', 
'haldate')]), na.rm = TRUE)
  }
  max_d}),
  origin = 1970-01-01)

new2$na.date.cases = ifelse(!is.na(new2$oiddate),1,0)


identical(new1, new2)

table(new1$oiddate)
table(new2$oiddate)

# print records

print (new1); print(new2)

Pradip K. Muhuri, PhD
SAMHSA/CBHSQ
1 Choke Cherry Road, Room 2-1071
Rockville, MD 20857
Tel: 240-276-1070
Fax: 240-276-1260

-Original Message-
From: Chel Hee Lee [mailto:chl...@mail.usask.ca]
Sent: Wednesday, December 03, 2014 8:48 PM
To: Muhuri, Pradip (SAMHSA/CBHSQ); r-help@r-project.org
Subject: Re: [R] Getting the most recent dates in a new column from 
dates in four columns using the dplyr package (mutate verb)

The output in the object 'new1' are apparently same the output in the 
object 'new2'.  Are you trying to compare the entries of two outputs 
'new1' and 'new2'?  If so, the function 'all()' would be useful:

  all(new1 == new2, na.rm=TRUE)
[1] TRUE

If you are interested in the comparison of two objects in terms of 
class, then the function 'identical()' is useful:

  attributes(new1)
$names
[1] id  mrjdate cocdate inhdate haldate oldflag

$class
[1] rowwise_df tbl_df tbldata.frame

$row.names
[1] 1 2 3 4 5 6 7

  attributes(new2)
$names
[1] id  mrjdate cocdate inhdate haldate oiddate

Re: [R] Getting the most recent dates in a new column from dates in four columns using the dplyr package (mutate verb)

2014-12-03 Thread Muhuri, Pradip (SAMHSA/CBHSQ)

Hello,

Two alternative approaches - mutate() vs. sapply() - were used to get the 
desired results (i.e., creating a new column of the most recent date  from 4 
dates ) with help from Arun and Mark on this forum.  I now find that the two 
data objects (created using two different approaches) are not identical 
although results are exactly the same.  
 
identical(new1, new2) 
[1] FALSE
 
Please see the reproducible example below.

I don't understand why the code returns FALSE here.  Any hints/comments  will 
be  appreciated.

Thanks,

Pradip

#  reproducible example 

library(dplyr)
# data object - description 

temp - id  mrjdate cocdate inhdate haldate
1 2004-11-04 2008-07-18 2005-07-07 2007-11-07
2 NA NA NA NA 
3 2009-10-24 NA 2011-10-13 NA
4 2007-10-10 NA NA NA
5 2006-09-01 2005-08-10 NA NA
6 2007-09-04 2011-10-05 NA NA
7 2005-10-25 NA NA 2011-11-04

# read the data object

example.data - read.table(textConnection(temp), 
colClasses=c(character, Date, Date, Date, Date),  
header=TRUE, as.is=TRUE
)


# create a new column -dplyr solution (Acknowledgement: Arun)

new1 - example.data %% 
 rowwise() %%
  mutate(oldflag=as.Date(max(mrjdate,cocdate, inhdate, haldate,
   na.rm=TRUE), 
origin='1970-01-01'))

# create a new column - Base R solution (Acknowlegement: Mark Sharp)

new2 - example.data
new2$oiddate - as.Date(sapply(seq_along(new2$id), function(row) {
  if (all(is.na(unlist(example.data[row, c('mrjdate','cocdate', 'inhdate', 
'haldate')] {
max_d - NA
  } else {
max_d - max(unlist(example.data[row, c('mrjdate','cocdate', 'inhdate', 
'haldate')]), na.rm = TRUE)
  }
  max_d}),
  origin = 1970-01-01)

identical(new1, new2) 

# print records

print (new1); print(new2)

Pradip K. Muhuri
SAMHSA/CBHSQ
1 Choke Cherry Road, Room 2-1071
Rockville, MD 20857
Tel: 240-276-1070
Fax: 240-276-1260

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Muhuri, Pradip (SAMHSA/CBHSQ)
Sent: Sunday, November 09, 2014 6:11 AM
To: 'Mark Sharp'
Cc: r-help@r-project.org
Subject: Re: [R] Getting the most recent dates in a new column from dates in 
four columns using the dplyr package (mutate verb)

Hi Mark,

Your code has also given me the results I expected.  Thank you so much for your 
help.

Regards,

Pradip

Pradip K. Muhuri, PhD
SAMHSA/CBHSQ
1 Choke Cherry Road, Room 2-1071
Rockville, MD 20857
Tel: 240-276-1070
Fax: 240-276-1260


-Original Message-
From: Mark Sharp [mailto:msh...@txbiomed.org] 
Sent: Sunday, November 09, 2014 3:01 AM
To: Muhuri, Pradip (SAMHSA/CBHSQ)
Cc: r-help@r-project.org
Subject: Re: [R] Getting the most recent dates in a new column from dates in 
four columns using the dplyr package (mutate verb)

Pradip,

mutate() works on the entire column as a vector so that you find the maximum of 
the entire data set.

I am almost certain there is some nice way to handle this, but the sapply() 
function is a standard approach.

max() does not want a dataframe thus the use of unlist().

Using your definition of data1:

data3 - data1
data3$oidflag - as.Date(sapply(seq_along(data3$id), function(row) {
  if (all(is.na(unlist(data1[row, -1] {
max_d - NA
  } else {
max_d - max(unlist(data1[row, -1]), na.rm = TRUE)
  }
  max_d}),
  origin = 1970-01-01)

data3
  idmrjdatecocdateinhdatehaldateoidflag
1  1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2008-07-18
2  2   NA   NA   NA   NA   NA
3  3 2009-10-24   NA 2011-10-13   NA 2011-10-13
4  4 2007-10-10   NA   NA   NA 2007-10-10
5  5 2006-09-01 2005-08-10   NA   NA 2006-09-01
6  6 2007-09-04 2011-10-05   NA   NA 2011-10-05
7  7 2005-10-25   NA   NA 2011-11-04 2011-11-04



R. Mark Sharp, Ph.D.
Director of Primate Records Database
Southwest National Primate Research Center Texas Biomedical Research Institute 
P.O. Box 760549 San Antonio, TX 78245-0549
Telephone: (210)258-9476
e-mail: msh...@txbiomed.org





NOTICE:  This E-Mail (including attachments) is confidential and may be legally 
privileged.  It is covered by the Electronic Communications Privacy Act, 18 
U.S.C.2510-2521.  If you are not the intended recipient, you are hereby 
notified that any retention, dissemination, distribution or copying of this 
communication is strictly prohibited.  Please reply to the sender that you have 
received this message in error, then delete it.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html

Re: [R] Getting the most recent dates in a new column from dates in four columns using the dplyr package (mutate verb)

2014-12-03 Thread Muhuri, Pradip (SAMHSA/CBHSQ)

Hello Chel and David,

Thank you very much for providing new insights into this issue.  Here is one 
more question.  Why  does the mutate () give incorrect results here? 

# The following gives INCORRECT results - mutated()ed object
na.date.cases = ifelse(!is.na(oiddate),1,0)

# The following gives CORRECT results
new2$na.date.cases = ifelse(!is.na(new2$oiddate),1,0)

###  reproducible example - slightly 
revised/modified  ###
library(dplyr)
# data object - description 

temp - id  mrjdate cocdate inhdate haldate
1 2004-11-04 2008-07-18 2005-07-07 2007-11-07
2 NA NA NA NA 
3 2009-10-24 NA 2011-10-13 NA
4 2007-10-10 NA NA NA
5 2006-09-01 2005-08-10 NA NA
6 2007-09-04 2011-10-05 NA NA
7 2005-10-25 NA NA 2011-11-04

# read the data object

example.data - read.table(textConnection(temp), 
colClasses=c(character, Date, Date, Date, Date),  
header=TRUE, as.is=TRUE
)


# create a new column -dplyr solution (Acknowledgement: Arun)

new1 - example.data %% 
 rowwise() %%
  mutate(oiddate=as.Date(max(mrjdate,cocdate, inhdate, haldate, 
na.rm=TRUE), origin='1970-01-01'),
 na.date.cases = ifelse(!is.na(oiddate),1,0)
 )

# create a new column - Base R solution (Acknowlegement: Mark Sharp)

new2 - example.data
new2$oiddate - as.Date(sapply(seq_along(new2$id), function(row) {
  if (all(is.na(unlist(example.data[row, c('mrjdate','cocdate', 'inhdate', 
'haldate')] {
max_d - NA
  } else {
max_d - max(unlist(example.data[row, c('mrjdate','cocdate', 'inhdate', 
'haldate')]), na.rm = TRUE)
  }
  max_d}),
  origin = 1970-01-01)

new2$na.date.cases = ifelse(!is.na(new2$oiddate),1,0)


identical(new1, new2) 

table(new1$oiddate)
table(new2$oiddate)

# print records

print (new1); print(new2)

Pradip K. Muhuri, PhD
SAMHSA/CBHSQ
1 Choke Cherry Road, Room 2-1071
Rockville, MD 20857
Tel: 240-276-1070
Fax: 240-276-1260

-Original Message-
From: Chel Hee Lee [mailto:chl...@mail.usask.ca] 
Sent: Wednesday, December 03, 2014 8:48 PM
To: Muhuri, Pradip (SAMHSA/CBHSQ); r-help@r-project.org
Subject: Re: [R] Getting the most recent dates in a new column from dates in 
four columns using the dplyr package (mutate verb)

The output in the object 'new1' are apparently same the output in the object 
'new2'.  Are you trying to compare the entries of two outputs 'new1' and 
'new2'?  If so, the function 'all()' would be useful:

  all(new1 == new2, na.rm=TRUE)
[1] TRUE

If you are interested in the comparison of two objects in terms of class, then 
the function 'identical()' is useful:

  attributes(new1)
$names
[1] id  mrjdate cocdate inhdate haldate oldflag

$class
[1] rowwise_df tbl_df tbldata.frame

$row.names
[1] 1 2 3 4 5 6 7

  attributes(new2)
$names
[1] id  mrjdate cocdate inhdate haldate oiddate

$row.names
[1] 1 2 3 4 5 6 7

$class
[1] data.frame

I hope this helps.

Chel Hee Lee

On 12/03/2014 04:10 PM, Muhuri, Pradip (SAMHSA/CBHSQ) wrote:
 Hello,

 Two alternative approaches - mutate() vs. sapply() - were used to get the 
 desired results (i.e., creating a new column of the most recent date  from 4 
 dates ) with help from Arun and Mark on this forum.  I now find that the two 
 data objects (created using two different approaches) are not identical 
 although results are exactly the same.

 identical(new1, new2)
 [1] FALSE

 Please see the reproducible example below.

 I don't understand why the code returns FALSE here.  Any hints/comments  will 
 be  appreciated.

 Thanks,

 Pradip

 #  reproducible example 
 
 library(dplyr)
 # data object - description

 temp - id  mrjdate cocdate inhdate haldate
 1 2004-11-04 2008-07-18 2005-07-07 2007-11-07
 2 NA NA NA NA
 3 2009-10-24 NA 2011-10-13 NA
 4 2007-10-10 NA NA NA
 5 2006-09-01 2005-08-10 NA NA
 6 2007-09-04 2011-10-05 NA NA
 7 2005-10-25 NA NA 2011-11-04

 # read the data object

 example.data - read.table(textConnection(temp),
  colClasses=c(character, Date, Date, Date, 
 Date),
  header=TRUE, as.is=TRUE
  )


 # create a new column -dplyr solution (Acknowledgement: Arun)

 new1 - example.data %%
   rowwise() %%
mutate(oldflag=as.Date(max(mrjdate,cocdate, inhdate, haldate,
 
 na.rm=TRUE), origin='1970-01-01'))

 # create a new column - Base R solution (Acknowlegement: Mark Sharp)

 new2 - example.data
 new2$oiddate - as.Date(sapply(seq_along(new2$id), function(row) {
if (all(is.na(unlist(example.data

[R] no non-missing arguments to max; returning -Inf [2(dplyr/mutate()]

2014-11-30 Thread Muhuri, Pradip (SAMHSA/CBHSQ)

Hello,

With dplyr mutate(), the code below creates a new column (oiddate), which is 
the maximum of the four dates (mrjdate,cocdate, inhdate, haldate).  The code 
seems to provide the results (presented below) I desired.  But, the issue is 
that I am getting  the following warning message:  1: In max(13113, NA_real_, 
14336, NA_real_, na.rm = TRUE) :   no non-missing arguments to max; returning 
-Inf 2.

Is this warning message harmful?  Any hints how to tweak the code in order to 
correct the problem or avoid this message?

Please note that I did not get this warning message when I executed the code on 
the reproducible example data posted to this forum in the past  and that I am 
now getting this warning when applying the code on the actual working data 
file.   Thanks to Arun, Mark and others on this forum for their help with 
tweaking the code in the past.   Sorry for not providing the reproducible 
example this time.  

Thanks,

Pradip Muhuri

#  R script followed by console (log and output) #
setwd (H:/R/cis_study)
library(dplyr)
load(xd2012.rdata)
# create a new column of the max date from four dates

 test - xd2012 %% 
  rowwise() %%
  mutate( oidflag= ifelse(mrjflag==1 | cocflag==1 | inhflag==1 | halflag==1, 1, 
0),
  oiddate=as.Date(max(mrjdate,cocdate, inhdate, haldate, na.rm=TRUE), 
origin='1970-01-01')) %%
  filter(oidflag==1)  %%
  select( mrjdate, cocdate, inhdate, haldate,  oiddate)
  
head(test)
warnings(2)


##  below is from the console  
load(xd2012.rdata)
 # create a new column of the max date from four dates
 
  test - xd2012 %% 
+   rowwise() %%
+   mutate( oidflag= ifelse(mrjflag==1 | cocflag==1 | inhflag==1 | halflag==1, 
1, 0),
+   oiddate=as.Date(max(mrjdate,cocdate, inhdate, haldate, na.rm=TRUE), 
origin='1970-01-01')) %%
+   filter(oidflag==1)  %%
+   select( mrjdate, cocdate, inhdate, haldate,  oiddate)
There were 50 or more warnings (use warnings() to see the first 50)
   
 head(test)
Source: local data frame [6 x 5]

 mrjdate cocdateinhdatehaldateoiddate
1 2003-02-22NA 2006-03-10 2005-09-17 2006-03-10
2 2007-12-07NA   NA   NA 2007-12-07
3 1994-05-15NA   NA   NA 1994-05-15
4 2003-04-19NA   NA   NA 2003-04-19
5 2009-11-13NA   NA   NA 2009-11-13
6 1973-10-08NA   NA 1974-01-04 1974-01-04
 warnings(2)
Warning messages:
1: In max(13113, NA_real_, 14336, NA_real_, na.rm = TRUE) :
  no non-missing arguments to max; returning -Inf 2
2: In max(13113, NA_real_, 14336, NA_real_, na.rm = TRUE) :
no non-missing arguments to max; returning -Inf 2


-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Muhuri, Pradip (SAMHSA/CBHSQ)
Sent: Monday, November 10, 2014 1:09 PM
To: 'Mark Sharp'
Cc: r-help@r-project.org
Subject: Re: [R] range () does not remove NA's with complete.cases() for dates 
(dplyr/mutate)

Mark,

Thank you very much for further looking into this issue.  So, the ugly 
solution is better!  Would you like to bring to Hadley's attention that mutate 
does set the NA value for the new column?

Regards,

Pradip

Pradip K. Muhuri, PhD
SAMHSA/CBHSQ
1 Choke Cherry Road, Room 2-1071
Rockville, MD 20857
Tel: 240-276-1070
Fax: 240-276-1260


-Original Message-
From: Mark Sharp [mailto:msh...@txbiomed.org] 
Sent: Monday, November 10, 2014 12:23 PM
To: Muhuri, Pradip (SAMHSA/CBHSQ)
Cc: r-help@r-project.org
Subject: Re: [R] range () does not remove NA's with complete.cases() for dates 
(dplyr/mutate)

Pradip,

For some reason mutate is not setting the is.NA value for the new column. Note 
the output below using your data structures.

 ## It looks at first as if the second element of both columns are NA.
 data2$mrjdate[2]
[1] NA
 data2$oiddate[2]
[1] NA
 ## for convenience
 mrj - data2$mrjdate[2]
 oid - data2$oiddate[2]
 mode(mrj)
[1] numeric
 mode(oid)
[1] numeric
 str(mrj)
 Date[1:1], format: NA
 str(oid)
 Date[1:1], format: NA
 class(mrj)
[1] Date
 class(oid)
[1] Date
 ## But note:
 identical(mrj, oid)
[1] FALSE
 all.equal(mrj, oid)
[1] 'is.NA' value mismatch: 0 in current 1 in target
## functioning code
data2$mrjdate[2]
data2$oiddate[2]
mrj - data2$mrjdate[2]
oid - data2$oiddate[2]
mode(mrj)
mode(oid)
str(mrj)
str(oid)
class(mrj)
class(oid)
# But note:
identical(mrj, oid)
all.equal(mrj, oid)

## This ugly solution does not have the problem.
 data3 - data1
 data3$oiddate - as.Date(sapply(seq_along(data3$id), function(row) {
+   if (all(is.na(unlist(data1[row, -1] {
+ max_d - NA
+   } else {
+ max_d - max(unlist(data1[row, -1]), na.rm = TRUE)
+   }
+   max_d}),
+   origin = 1970-01-01)

 range(data3$mrjdate[complete.cases(data3$mrjdate)])
[1] 2004-11-04 2009-10-24
 range(data3$cocdate[complete.cases(data3$cocdate)])
[1] 2005-08-10 2011-10-05
 range(data3$inhdate[complete.cases(data3$inhdate)])
[1] 2005-07-07 2011-10-13
 range(data3$haldate

[R] R dplyr solution vs. Base R solution for the slect column total

2014-11-30 Thread Muhuri, Pradip (SAMHSA/CBHSQ)

Hello,

I am looking for a dplyr or base R solution for the column total - JUST FOR THE 
LAST COLUMN in the example below. The following code works, giving me the total 
for each column - This is not exactly what I want.
rbind(test, colSums(test))

I only want the total for the very last column.  I am struggling with this part 
of the code: rbind(test, c(Total, colSums(test, ...)))
I have searched for a solution on Stack Oveflow.  I found  some mutate() code 
for the cumsum but no luck for the select column total.  Is there a dplyr 
solution for the select column total?

Any hints will be appreciated.

Thanks,

Pradip Muhuri


### The following is from the console - the R script with reproducible 
example is also appended.


mrjflag cocflag inhflag halflag oidflag count
10   0   0   0   0   256
20   0   0   1   1   256
30   0   1   0   1   256
40   0   1   1   1   256
50   1   0   0   1   256
60   1   0   1   1   256
70   1   1   0   1   256
80   1   1   1   1   256
91   0   0   0   1   256
10   1   0   0   1   1   256
11   1   0   1   0   1   256
12   1   0   1   1   1   256
13   1   1   0   0   1   256
14   1   1   0   1   1   256
15   1   1   1   0   1   256
16   1   1   1   1   1   256
17   8   8   8   8  15  4096



###  below is the reproducible example 

library(dplyr)
# generate data
dlist - rep( list( 0:1 ), 4 )
data - do.call(expand.grid, drbind)
data$id - 1:nrow(data)
names(data) - c('mrjflag', 'cocflag', 'inhflag', 'halflag')


# mutate a column and then sumamrize
  test - data %%
   mutate(oidflag= ifelse(mrjflag==1 | cocflag==1 | inhflag==1 | 
halflag==1, 1, 0)) %%
   group_by(mrjflag,cocflag, inhflag, halflag, oidflag) %%
   summarise(count=n()) %%
   arrange(mrjflag,cocflag, inhflag, halflag, oidflag)


#  This works, giving me the total for each column - This is not what I exactly 
want.
rbind(test, colSums(test))

# I only want the total for the very last column
rbind(test, c(Total, colSums(test, ...)))

Pradip K. Muhuri, PhD
SAMHSA/CBHSQ
1 Choke Cherry Road, Room 2-1071
Rockville, MD 20857
Tel: 240-276-1070
Fax: 240-276-1260


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R dplyr solution vs. Base R solution for the slect column total

2014-11-30 Thread Muhuri, Pradip (SAMHSA/CBHSQ)

Hi Boris,

That gives me the total for each of the 6 columns of the data frame. I want the 
column sum just for the last column.

Thanks,

Pradip Muhuri



-Original Message-
From: Boris Steipe [mailto:boris.ste...@utoronto.ca] 
Sent: Sunday, November 30, 2014 12:50 PM
To: Muhuri, Pradip (SAMHSA/CBHSQ)
Cc: r-help@r-project.org
Subject: Re: [R] R dplyr solution vs. Base R solution for the slect column total

try:

sum(test$count)


B.


On Nov 30, 2014, at 12:01 PM, Muhuri, Pradip (SAMHSA/CBHSQ) 
pradip.muh...@samhsa.hhs.gov wrote:

 Hello,
 
 I am looking for a dplyr or base R solution for the column total - JUST FOR 
 THE LAST COLUMN in the example below. The following code works, giving me the 
 total for each column - This is not exactly what I want.
 rbind(test, colSums(test))
 
 I only want the total for the very last column.  I am struggling with 
 this part of the code: rbind(test, c(Total, colSums(test, ...))) I have 
 searched for a solution on Stack Oveflow.  I found  some mutate() code for 
 the cumsum but no luck for the select column total.  Is there a dplyr 
 solution for the select column total?
 
 Any hints will be appreciated.
 
 Thanks,
 
 Pradip Muhuri
 
 
 ### The following is from the console - the R script with reproducible 
 example is also appended.
 
 
 mrjflag cocflag inhflag halflag oidflag count
 10   0   0   0   0   256
 20   0   0   1   1   256
 30   0   1   0   1   256
 40   0   1   1   1   256
 50   1   0   0   1   256
 60   1   0   1   1   256
 70   1   1   0   1   256
 80   1   1   1   1   256
 91   0   0   0   1   256
 10   1   0   0   1   1   256
 11   1   0   1   0   1   256
 12   1   0   1   1   1   256
 13   1   1   0   0   1   256
 14   1   1   0   1   1   256
 15   1   1   1   0   1   256
 16   1   1   1   1   1   256
 17   8   8   8   8  15  4096
 
 
 
 ###  below is the reproducible example 
 
 library(dplyr)
 # generate data
 dlist - rep( list( 0:1 ), 4 )
 data - do.call(expand.grid, drbind)
 data$id - 1:nrow(data)
 names(data) - c('mrjflag', 'cocflag', 'inhflag', 'halflag')
 
 
 # mutate a column and then sumamrize
  test - data %%
   mutate(oidflag= ifelse(mrjflag==1 | cocflag==1 | inhflag==1 | 
 halflag==1, 1, 0)) %%
   group_by(mrjflag,cocflag, inhflag, halflag, oidflag) %%
   summarise(count=n()) %%
   arrange(mrjflag,cocflag, inhflag, halflag, oidflag)
 
 
 #  This works, giving me the total for each column - This is not what I 
 exactly want.
rbind(test, colSums(test))
 
 # I only want the total for the very last column rbind(test, 
 c(Total, colSums(test, ...)))
 
 Pradip K. Muhuri, PhD
 SAMHSA/CBHSQ
 1 Choke Cherry Road, Room 2-1071
 Rockville, MD 20857
 Tel: 240-276-1070
 Fax: 240-276-1260
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R dplyr solution vs. Base R solution for the slect column total

2014-11-30 Thread Muhuri, Pradip (SAMHSA/CBHSQ)

Hi Boris,

Sorry for not being explicit when replying to your first email.   I wanted to 
say it does not work when row-binding.  I want the following output.  Thanks,  
Pradip


11  3
22  4
Total  7

### Below is the console ##
 test - data.frame(first=c(1,2), second=c(3,4)) 
 test
  first second
1 1  3
2 2  4
 
 sum(test$second)
[1] 7
 
 rbind(test, sum(test$second))
  first second
1 1  3
2 2  4
3 7  7

Pradip K. Muhuri, PhD
SAMHSA/CBHSQ
1 Choke Cherry Road, Room 2-1071
Rockville, MD 20857
Tel: 240-276-1070
Fax: 240-276-1260

-Original Message-
From: Boris Steipe [mailto:boris.ste...@utoronto.ca] 
Sent: Sunday, November 30, 2014 5:51 PM
To: Muhuri, Pradip (SAMHSA/CBHSQ)
Cc: r-help@r-project.org
Subject: Re: [R] R dplyr solution vs. Base R solution for the slect column total

No it doesn't ...
consider:

test - data.frame(first=c(1,2), second=c(3,4)) test
  first second
1 1  3
2 2  4

sum(test$second)
[1] 7




On Nov 30, 2014, at 3:48 PM, Muhuri, Pradip (SAMHSA/CBHSQ) 
pradip.muh...@samhsa.hhs.gov wrote:

 Hi Boris,
 
 That gives me the total for each of the 6 columns of the data frame. I want 
 the column sum just for the last column.
 
 Thanks,
 
 Pradip Muhuri
 
 
 
 -Original Message-
 From: Boris Steipe [mailto:boris.ste...@utoronto.ca]
 Sent: Sunday, November 30, 2014 12:50 PM
 To: Muhuri, Pradip (SAMHSA/CBHSQ)
 Cc: r-help@r-project.org
 Subject: Re: [R] R dplyr solution vs. Base R solution for the slect 
 column total
 
 try:
 
 sum(test$count)
 
 
 B.
 
 
 On Nov 30, 2014, at 12:01 PM, Muhuri, Pradip (SAMHSA/CBHSQ) 
 pradip.muh...@samhsa.hhs.gov wrote:
 
 Hello,
 
 I am looking for a dplyr or base R solution for the column total - JUST FOR 
 THE LAST COLUMN in the example below. The following code works, giving me 
 the total for each column - This is not exactly what I want.
 rbind(test, colSums(test))
 
 I only want the total for the very last column.  I am struggling with 
 this part of the code: rbind(test, c(Total, colSums(test, ...))) I have 
 searched for a solution on Stack Oveflow.  I found  some mutate() code for 
 the cumsum but no luck for the select column total.  Is there a dplyr 
 solution for the select column total?
 
 Any hints will be appreciated.
 
 Thanks,
 
 Pradip Muhuri
 
 
 ### The following is from the console - the R script with reproducible 
 example is also appended.
 
 
 mrjflag cocflag inhflag halflag oidflag count
 10   0   0   0   0   256
 20   0   0   1   1   256
 30   0   1   0   1   256
 40   0   1   1   1   256
 50   1   0   0   1   256
 60   1   0   1   1   256
 70   1   1   0   1   256
 80   1   1   1   1   256
 91   0   0   0   1   256
 10   1   0   0   1   1   256
 11   1   0   1   0   1   256
 12   1   0   1   1   1   256
 13   1   1   0   0   1   256
 14   1   1   0   1   1   256
 15   1   1   1   0   1   256
 16   1   1   1   1   1   256
 17   8   8   8   8  15  4096
 
 
 
 ###  below is the reproducible example 
 
 library(dplyr)
 # generate data
 dlist - rep( list( 0:1 ), 4 )
 data - do.call(expand.grid, drbind)
 data$id - 1:nrow(data)
 names(data) - c('mrjflag', 'cocflag', 'inhflag', 'halflag')
 
 
 # mutate a column and then sumamrize
 test - data %%
  mutate(oidflag= ifelse(mrjflag==1 | cocflag==1 | inhflag==1 | 
 halflag==1, 1, 0)) %%
  group_by(mrjflag,cocflag, inhflag, halflag, oidflag) %%
  summarise(count=n()) %%
  arrange(mrjflag,cocflag, inhflag, halflag, oidflag)
 
 
 #  This works, giving me the total for each column - This is not what I 
 exactly want.
   rbind(test, colSums(test))
 
 # I only want the total for the very last column rbind(test, 
 c(Total, colSums(test, ...)))
 
 Pradip K. Muhuri, PhD
 SAMHSA/CBHSQ
 1 Choke Cherry Road, Room 2-1071
 Rockville, MD 20857
 Tel: 240-276-1070
 Fax: 240-276-1260
 
 
  [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R dplyr solution vs. Base R solution for the slect column total

2014-11-30 Thread Muhuri, Pradip (SAMHSA/CBHSQ)

Hi Duncan,

Thank you for sending your solution.  Below is another way.  

Pradip

 test - data.frame(first=c(1,2),  second=c(3,4)) 
 total - c(, sum(test$second))
 rbind(test, Total=total)
  first second
1 1  3
2 2  4
Total7

 rbind(test, c(Total, colSums(test[,2, drop=FALSE])))
  first second
1 1  3
2 2  4
3 Total  7

Pradip K. Muhuri, PhD
SAMHSA/CBHSQ
1 Choke Cherry Road, Room 2-1071
Rockville, MD 20857
Tel: 240-276-1070
Fax: 240-276-1260


-Original Message-
From: Duncan Murdoch [mailto:murdoch.dun...@gmail.com] 
Sent: Sunday, November 30, 2014 9:16 PM
To: Muhuri, Pradip (SAMHSA/CBHSQ); 'Boris Steipe'
Cc: r-help@r-project.org
Subject: Re: [R] R dplyr solution vs. Base R solution for the slect column total

On 30/11/2014, 8:45 PM, Muhuri, Pradip (SAMHSA/CBHSQ) wrote:
 Hi Boris,
 
 Sorry for not being explicit when replying to your first email.   I wanted to 
 say it does not work when row-binding.  I want the following output.  Thanks, 
  Pradip
 
 
 11  3
 22  4
 Total  7

You are mixing up the computation of results with the presentation of them.  
That's the spreadsheet way of thinking, and it's okay for simple things like 
this, but gets really bogged down when the computations get hard.

In R you can do it, and it's not too hard:

test - data.frame(first=c(1,2), second=c(3,4)) total - c(, 
sum(test$second)) rbind(test, Total=total)

but this isn't a really sensible thing to do:  you can't work with that final 
result at all.  It makes more sense to leave it in the original form, and then 
think about how you want to present it, and write a function that displays the 
result, with nice formatting, etc.  That probably won't happen in the R 
console, you should be using Sweave or knitr or some other package for 
presentation of the results.

Duncan Murdoch


 
 ### Below is the console ##
 test - data.frame(first=c(1,2), second=c(3,4)) test
   first second
 1 1  3
 2 2  4

 sum(test$second)
 [1] 7

 rbind(test, sum(test$second))
   first second
 1 1  3
 2 2  4
 3 7  7
 
 Pradip K. Muhuri, PhD
 SAMHSA/CBHSQ
 1 Choke Cherry Road, Room 2-1071
 Rockville, MD 20857
 Tel: 240-276-1070
 Fax: 240-276-1260
 
 -Original Message-
 From: Boris Steipe [mailto:boris.ste...@utoronto.ca]
 Sent: Sunday, November 30, 2014 5:51 PM
 To: Muhuri, Pradip (SAMHSA/CBHSQ)
 Cc: r-help@r-project.org
 Subject: Re: [R] R dplyr solution vs. Base R solution for the slect 
 column total
 
 No it doesn't ...
 consider:
 
 test - data.frame(first=c(1,2), second=c(3,4)) test
   first second
 1 1  3
 2 2  4
 
 sum(test$second)
 [1] 7
 
 
 
 
 On Nov 30, 2014, at 3:48 PM, Muhuri, Pradip (SAMHSA/CBHSQ) 
 pradip.muh...@samhsa.hhs.gov wrote:
 
 Hi Boris,

 That gives me the total for each of the 6 columns of the data frame. I want 
 the column sum just for the last column.

 Thanks,

 Pradip Muhuri



 -Original Message-
 From: Boris Steipe [mailto:boris.ste...@utoronto.ca]
 Sent: Sunday, November 30, 2014 12:50 PM
 To: Muhuri, Pradip (SAMHSA/CBHSQ)
 Cc: r-help@r-project.org
 Subject: Re: [R] R dplyr solution vs. Base R solution for the slect 
 column total

 try:

 sum(test$count)


 B.


 On Nov 30, 2014, at 12:01 PM, Muhuri, Pradip (SAMHSA/CBHSQ) 
 pradip.muh...@samhsa.hhs.gov wrote:

 Hello,

 I am looking for a dplyr or base R solution for the column total - JUST FOR 
 THE LAST COLUMN in the example below. The following code works, giving me 
 the total for each column - This is not exactly what I want.
 rbind(test, colSums(test))

 I only want the total for the very last column.  I am struggling 
 with this part of the code: rbind(test, c(Total, colSums(test, ...))) I 
 have searched for a solution on Stack Oveflow.  I found  some mutate() code 
 for the cumsum but no luck for the select column total.  Is there a dplyr 
 solution for the select column total?

 Any hints will be appreciated.

 Thanks,

 Pradip Muhuri


 ### The following is from the console - the R script with reproducible 
 example is also appended.


 mrjflag cocflag inhflag halflag oidflag count
 10   0   0   0   0   256
 20   0   0   1   1   256
 30   0   1   0   1   256
 40   0   1   1   1   256
 50   1   0   0   1   256
 60   1   0   1   1   256
 70   1   1   0   1   256
 80   1   1   1   1   256
 91   0   0   0   1   256
 10   1   0   0   1   1   256
 11   1   0   1   0   1   256
 12   1   0   1   1   1   256
 13   1   1   0   0   1   256
 14   1   1   0   1   1   256
 15   1   1   1   0   1   256
 16   1   1   1

Re: [R] R dplyr solution vs. Base R solution for the slect column total

2014-11-30 Thread Muhuri, Pradip (SAMHSA/CBHSQ)

Hi Boris,

Excellent point.  Yes, I want to convert it into to the numeric type.  Your 
code has worked out well on the real data set.  The issue is resolved.

Thanks so much for your help!

Pradip



-Original Message-
From: Boris Steipe [mailto:boris.ste...@utoronto.ca] 
Sent: Sunday, November 30, 2014 9:42 PM
To: Muhuri, Pradip (SAMHSA/CBHSQ)
Cc: r-help@r-project.org
Subject: Re: [R] R dplyr solution vs. Base R solution for the slect column total

What do you think should be in the empty cells? Zero? NA? Empty strings? There 
can't just be nothing...
Here's an example with empty strings  as the filler element - but do consider 
carefully what Duncan wrote.


test - data.frame(first=c(1,2), second=c(3,4))
typeof(test[1,1])  # double

# rbind() a vector that repeats the empty element one-less-then-ncols() 
times, # and has the column sum as its last element.
test - rbind(test, c(rep(, ncol(test)-1), sum(test$second))) test

  first second
1 1  3
2 2  4
37

# but...!

typeof(test[1,1]) # character!
typeof(test[2,2]) # also character! 

By adding characters to your columns, you cast all of your data into character 
type!
If you want to *do* anything with the number, you'll need to cast it back to 
numeric.
Or use 0 or NA as the filler element.

test - rbind(test, c(rep(NA, ncol(test)-1), sum(test$second)))

But anyway ... as others have said, you may want to reconsider the logic of 
your approach.


B.



On Nov 30, 2014, at 8:45 PM, Muhuri, Pradip (SAMHSA/CBHSQ) 
pradip.muh...@samhsa.hhs.gov wrote:

 Hi Boris,
 
 Sorry for not being explicit when replying to your first email.   I wanted to 
 say it does not work when row-binding.  I want the following output.  Thanks, 
  Pradip
 
 
 11  3
 22  4
 Total  7
 
 ### Below is the console ##
 test - data.frame(first=c(1,2), second=c(3,4)) test
  first second
 1 1  3
 2 2  4
 
 sum(test$second)
 [1] 7
 
 rbind(test, sum(test$second))
  first second
 1 1  3
 2 2  4
 3 7  7
 
 Pradip K. Muhuri, PhD
 SAMHSA/CBHSQ
 1 Choke Cherry Road, Room 2-1071
 Rockville, MD 20857
 Tel: 240-276-1070
 Fax: 240-276-1260
 
 -Original Message-
 From: Boris Steipe [mailto:boris.ste...@utoronto.ca]
 Sent: Sunday, November 30, 2014 5:51 PM
 To: Muhuri, Pradip (SAMHSA/CBHSQ)
 Cc: r-help@r-project.org
 Subject: Re: [R] R dplyr solution vs. Base R solution for the slect 
 column total
 
 No it doesn't ...
 consider:
 
 test - data.frame(first=c(1,2), second=c(3,4)) test  first second
 1 1  3
 2 2  4
 
 sum(test$second)
 [1] 7
 
 
 
 
 On Nov 30, 2014, at 3:48 PM, Muhuri, Pradip (SAMHSA/CBHSQ) 
 pradip.muh...@samhsa.hhs.gov wrote:
 
 Hi Boris,
 
 That gives me the total for each of the 6 columns of the data frame. I want 
 the column sum just for the last column.
 
 Thanks,
 
 Pradip Muhuri
 
 
 
 -Original Message-
 From: Boris Steipe [mailto:boris.ste...@utoronto.ca]
 Sent: Sunday, November 30, 2014 12:50 PM
 To: Muhuri, Pradip (SAMHSA/CBHSQ)
 Cc: r-help@r-project.org
 Subject: Re: [R] R dplyr solution vs. Base R solution for the slect 
 column total
 
 try:
 
 sum(test$count)
 
 
 B.
 
 
 On Nov 30, 2014, at 12:01 PM, Muhuri, Pradip (SAMHSA/CBHSQ) 
 pradip.muh...@samhsa.hhs.gov wrote:
 
 Hello,
 
 I am looking for a dplyr or base R solution for the column total - JUST FOR 
 THE LAST COLUMN in the example below. The following code works, giving me 
 the total for each column - This is not exactly what I want.
 rbind(test, colSums(test))
 
 I only want the total for the very last column.  I am struggling 
 with this part of the code: rbind(test, c(Total, colSums(test, ...))) I 
 have searched for a solution on Stack Oveflow.  I found  some mutate() code 
 for the cumsum but no luck for the select column total.  Is there a dplyr 
 solution for the select column total?
 
 Any hints will be appreciated.
 
 Thanks,
 
 Pradip Muhuri
 
 
 ### The following is from the console - the R script with reproducible 
 example is also appended.
 
 
 mrjflag cocflag inhflag halflag oidflag count
 10   0   0   0   0   256
 20   0   0   1   1   256
 30   0   1   0   1   256
 40   0   1   1   1   256
 50   1   0   0   1   256
 60   1   0   1   1   256
 70   1   1   0   1   256
 80   1   1   1   1   256
 91   0   0   0   1   256
 10   1   0   0   1   1   256
 11   1   0   1   0   1   256
 12   1   0   1   1   1   256
 13   1   1   0   0   1   256
 14   1   1   0   1   1   256
 15   1   1   1   0   1   256
 16   1   1   1   1   1   256
 17   8   8   8   8  15

[R] file.copy

2014-11-14 Thread Muhuri, Pradip (SAMHSA/CBHSQ)

Hello,

Here is something (file.copy) trivial but does not seem to work.  I could not 
figure out what I am doing wrong.  

The R script below creates folders (fromFolder and toFolder) and finds the list 
of files (list.of.files) to be copied to the toFolder, which I have verified 
using the print ()  command. But, the issue is that the file.copy() command 
does not work.  

Both the R.script and the console are shown below.

Any help/hints will be appreciated.

Thanks,

Pradip Muhuri

#  R script  
#
#file.copy.R

#identify the folders
fromFolder - H:/R/cis_study
toFolder - F:/cis_study_backup

# find the list of files to copy 
list.of.files - list.files(fromFolder, .R$)

# print objects
print(c(fromFolder, toFolder, list.of.files))
options(warn=1)

# copy the files to the toFolder  - THIS DOES NOT WORK WHILE EVERYTHING PRIOR 
HAS WORKED
file.copy(list.of.files, toFolder)



#  Below is from console ###
#file.copy.R
 
 #identify the folders
 fromFolder - H:/R/cis_study
 toFolder - F:/cis_study_backup
 
 # find the list of files to copy 
 list.of.files - list.files(fromFolder, .R$)
 
 # print objects
 print(c(fromFolder, toFolder, list.of.files))
 [1] H:/R/cis_study  F:/cis_study_backup
 [3] anl.in.scope_14.R   create.oid.data.frame.R
 [5] create_xd2012.R file.copy.R
 [7] further.data.R  mrj.in.scope_111214.R  
 [9] oid.in.scope_14.R   oid_cohort.R   
[11] warning.max.R   xdate.R
[13] years.before.anl.init.R years.before.mrj.init.R
[15] years.before.oid.init.R
 options(warn=1)
 
 # copy the files to the toFolder  - THIS DOES NOT WORK WHILE EVERYTHING PRIOR 
 HAS WORKED
 file.copy(list.of.files, toFolder)
Warning in file.copy(list.of.files, toFolder) :
  problem copying .\anl.in.scope_14.R to 
F:\cis_study_backup\anl.in.scope_14.R: No such file or directory
(other similar warning messages are not shown)



Pradip K. Muhuri, PhD
SAMHSA/CBHSQ
1 Choke Cherry Road, Room 2-1071
Rockville, MD 20857
Tel: 240-276-1070
Fax: 240-276-1260

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] file.copy

2014-11-14 Thread Muhuri, Pradip (SAMHSA/CBHSQ)

Jeff,
Thank you so much for your help.

Below are the revised code (done with your hints) that has worked and the 
console. I have just added - overwrite=TRUE) to file.copy().

Pradip

###
#file.copy.jn.way.R

#identify the folders
fromFolder - H:/R/cis_study
toFolder - F:/cis_study_backup

# find the list of files to copy
list.of.files - list.files(fromFolder, .R$)

# print objects
print(c(fromFolder, toFolder, list.of.files))
options(warn=1)

# copy the files to the toFolder  - THIS DOES NOT WORK WHILE EVERYTHING PRIOR 
HAS WORKED

file.copy(file.path(fromFolder,list.of.files), toFolder, overwrite=TRUE)

###  revised console 
 #file.copy.jn.way.R
 
 #identify the folders
 fromFolder - H:/R/cis_study
 toFolder - F:/cis_study_backup
 
 # find the list of files to copy
 list.of.files - list.files(fromFolder, .R$)
 
 # print objects
 print(c(fromFolder, toFolder, list.of.files))
 [1] H:/R/cis_study  F:/cis_study_backup
 [3] anl.in.scope_14.R   create.oid.data.frame.R
 [5] create_xd2012.R file.copy.R
 [7] file.copy_Duncan_way.R  further.data.R 
 [9] mrj.in.scope_111214.R   oid.in.scope_14.R  
[11] oid_cohort.Rwarning.max.R  
[13] xdate.R years.before.anl.init.R
[15] years.before.mrj.init.R years.before.oid.init.R
 options(warn=1)
 
 # copy the files to the toFolder  - THIS DOES NOT WORK WHILE EVERYTHING PRIOR 
 HAS WORKED
 
 file.copy(file.path(fromFolder,list.of.files), toFolder, overwrite=TRUE)
 [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE


Pradip K. Muhuri, PhD
SAMHSA/CBHSQ
1 Choke Cherry Road, Room 2-1071
Rockville, MD 20857
Tel: 240-276-1070
Fax: 240-276-1260
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] file.copy

2014-11-14 Thread Muhuri, Pradip (SAMHSA/CBHSQ)

Hello Duncan,

Jeff's tweaks to my code has worked.   Now I am trying your way. Below are the 
R script and console.   The issue is that the object (list.of.files) has not 
been created.  Any thoughts?

Thanks,

###  R script ##

#file.copy.dm.way.R

#identify the folders

fromFolder - file.path(H:, cis_study)

toFolder - file.path(F:, cis_study)

# find the list of files to copied
list.of.files - list.files(fromFolder, .R$)

# print objects
print(fromFolder, list.of.files, toFolder)

# copy the files
file.copy(list.of.files, toFiles)

###  console ###
 #file.copy.dm.way.R
 
 #identify the folders
 
 fromFolder - file.path(H:, cis_study)
 
 toFolder - file.path(F:, cis_study)
 
 # find the list of files to copied
 list.of.files - list.files(fromFolder, .R$)
 
 # print objects
 print(fromFolder, list.of.files, toFolder)
Error in print.default(fromFolder, list.of.files, toFolder) : 
  invalid 'digits' argument
 
 # copy the files
 file.copy(list.of.files, toFiles)
logical(0)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Getting the most recent dates in a new column from dates in four columns using the dplyr package (mutate verb)

2014-11-10 Thread Muhuri, Pradip (SAMHSA/CBHSQ)

Hi Mark,

Your code has also given me the results I expected.  Thank you so much for your 
help.

Regards,

Pradip

Pradip K. Muhuri, PhD
SAMHSA/CBHSQ
1 Choke Cherry Road, Room 2-1071
Rockville, MD 20857
Tel: 240-276-1070
Fax: 240-276-1260


-Original Message-
From: Mark Sharp [mailto:msh...@txbiomed.org] 
Sent: Sunday, November 09, 2014 3:01 AM
To: Muhuri, Pradip (SAMHSA/CBHSQ)
Cc: r-help@r-project.org
Subject: Re: [R] Getting the most recent dates in a new column from dates in 
four columns using the dplyr package (mutate verb)

Pradip,

mutate() works on the entire column as a vector so that you find the maximum of 
the entire data set.

I am almost certain there is some nice way to handle this, but the sapply() 
function is a standard approach.

max() does not want a dataframe thus the use of unlist().

Using your definition of data1:

data3 - data1
data3$oidflag - as.Date(sapply(seq_along(data3$id), function(row) {
  if (all(is.na(unlist(data1[row, -1] {
max_d - NA
  } else {
max_d - max(unlist(data1[row, -1]), na.rm = TRUE)
  }
  max_d}),
  origin = 1970-01-01)

data3
  idmrjdatecocdateinhdatehaldateoidflag
1  1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2008-07-18
2  2   NA   NA   NA   NA   NA
3  3 2009-10-24   NA 2011-10-13   NA 2011-10-13
4  4 2007-10-10   NA   NA   NA 2007-10-10
5  5 2006-09-01 2005-08-10   NA   NA 2006-09-01
6  6 2007-09-04 2011-10-05   NA   NA 2011-10-05
7  7 2005-10-25   NA   NA 2011-11-04 2011-11-04



R. Mark Sharp, Ph.D.
Director of Primate Records Database
Southwest National Primate Research Center Texas Biomedical Research Institute 
P.O. Box 760549 San Antonio, TX 78245-0549
Telephone: (210)258-9476
e-mail: msh...@txbiomed.org





NOTICE:  This E-Mail (including attachments) is confidential and may be legally 
privileged.  It is covered by the Electronic Communications Privacy Act, 18 
U.S.C.2510-2521.  If you are not the intended recipient, you are hereby 
notified that any retention, dissemination, distribution or copying of this 
communication is strictly prohibited.  Please reply to the sender that you have 
received this message in error, then delete it.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] range () does not remove NA's with complete.cases() for dates (dplyr/mutate)

2014-11-10 Thread Muhuri, Pradip (SAMHSA/CBHSQ)

Hello,

The range() with complete.cases() removes NA's for the date variables that are 
read from a data frame.  However, the issue is that the same function does not 
remove NA's for the other date variable that is created using the 
dplyr/mutate().  The console and the reproducible example are given below. Any 
advice how to resolve this issue would be appreciated.

Thanks,

Pradip Muhuri


#  cut and pasted from the R console 

idmrjdatecocdateinhdatehaldateoiddate
1  1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2008-07-18
2  2   NA   NA   NA   NA   NA
3  3 2009-10-24   NA 2011-10-13   NA 2011-10-13
4  4 2007-10-10   NA   NA   NA 2007-10-10
5  5 2006-09-01 2005-08-10   NA   NA 2006-09-01
6  6 2007-09-04 2011-10-05   NA   NA 2011-10-05
7  7 2005-10-25   NA   NA 2011-11-04 2011-11-04

 # range of dates

 range(data2$mrjdate[complete.cases(data2$mrjdate)])
[1] 2004-11-04 2009-10-24
 range(data2$cocdate[complete.cases(data2$cocdate)])
[1] 2005-08-10 2011-10-05
 range(data2$inhdate[complete.cases(data2$inhdate)])
[1] 2005-07-07 2011-10-13
 range(data2$haldate[complete.cases(data2$haldate)])
[1] 2007-11-07 2011-11-04
 range(data2$oiddate[complete.cases(data2$oiddate)])
[1] NA   2011-11-04


  reproducible code #

library(dplyr)
library(lubridate)
library(zoo)
# data object - description of the

temp - id  mrjdate cocdate inhdate haldate
1 2004-11-04 2008-07-18 2005-07-07 2007-11-07
2 NA NA NA NA
3 2009-10-24 NA 2011-10-13 NA
4 2007-10-10 NA NA NA
5 2006-09-01 2005-08-10 NA NA
6 2007-09-04 2011-10-05 NA NA
7 2005-10-25 NA NA 2011-11-04

# read the data object

data1 - read.table(textConnection(temp),
colClasses=c(character, Date, Date, Date, Date),
header=TRUE, as.is=TRUE
)


# create a new column

data2 - data1 %%
 rowwise() %%
  mutate(oiddate=as.Date(max(mrjdate,cocdate, inhdate, haldate,
   na.rm=TRUE), 
origin='1970-01-01'))

# print records

print (data2)

# range of dates

range(data2$mrjdate[complete.cases(data2$mrjdate)])
range(data2$cocdate[complete.cases(data2$cocdate)])
range(data2$inhdate[complete.cases(data2$inhdate)])
range(data2$haldate[complete.cases(data2$haldate)])
range(data2$oiddate[complete.cases(data2$oiddate)])





Pradip K. Muhuri, PhD
SAMHSA/CBHSQ
1 Choke Cherry Road, Room 2-1071
Rockville, MD 20857
Tel: 240-276-1070
Fax: 240-276-1260



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] range () does not remove NA's with complete.cases() for dates (dplyr/mutate)

2014-11-10 Thread Muhuri, Pradip (SAMHSA/CBHSQ)

Hello Arun,

Thank you so much for your help.

Regards, 

Pradip

Pradip K. Muhuri, PhD
SAMHSA/CBHSQ
1 Choke Cherry Road, Room 2-1071
Rockville, MD 20857
Tel: 240-276-1070
Fax: 240-276-1260


-Original Message-
From: arun [mailto:smartpink...@yahoo.com] 
Sent: Monday, November 10, 2014 11:30 AM
To: Muhuri, Pradip (SAMHSA/CBHSQ); r-help@r-project.org
Subject: Re: [R] range () does not remove NA's with complete.cases() for dates 
(dplyr/mutate)

Try

range(data2$oiddate[complete.cases(data2$oiddate)  is.finite(data2$oiddate)]) 
#[1] 2006-09-01 2011-11-04



If you look at the `dput` output, it is `Inf` for oiddate
dput(data2$oiddate)
structure(c(14078, -Inf, 15260, 13796, 13392, 15252, 15282), class = Date)

   

A.K.

On Monday, November 10, 2014 11:15 AM, Muhuri, Pradip (SAMHSA/CBHSQ) 
pradip.muh...@samhsa.hhs.gov wrote:
Hello,

The range() with complete.cases() removes NA's for the date variables that are 
read from a data frame.  However, the issue is that the same function does not 
remove NA's for the other date variable that is created using the 
dplyr/mutate().  The console and the reproducible example are given below. Any 
advice how to resolve this issue would be appreciated.

Thanks,

Pradip Muhuri


#  cut and pasted from the R console 

idmrjdatecocdateinhdatehaldateoiddate
1  1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2008-07-18
2  2   NA   NA   NA   NA   NA
3  3 2009-10-24   NA 2011-10-13   NA 2011-10-13
4  4 2007-10-10   NA   NA   NA 2007-10-10
5  5 2006-09-01 2005-08-10   NA   NA 2006-09-01
6  6 2007-09-04 2011-10-05   NA   NA 2011-10-05
7  7 2005-10-25   NA   NA 2011-11-04 2011-11-04

 # range of dates

 range(data2$mrjdate[complete.cases(data2$mrjdate)])
[1] 2004-11-04 2009-10-24
 range(data2$cocdate[complete.cases(data2$cocdate)])
[1] 2005-08-10 2011-10-05
 range(data2$inhdate[complete.cases(data2$inhdate)])
[1] 2005-07-07 2011-10-13
 range(data2$haldate[complete.cases(data2$haldate)])
[1] 2007-11-07 2011-11-04
 range(data2$oiddate[complete.cases(data2$oiddate)])
[1] NA   2011-11-04


  reproducible code #

library(dplyr)
library(lubridate)
library(zoo)
# data object - description of the

temp - id  mrjdate cocdate inhdate haldate
1 2004-11-04 2008-07-18 2005-07-07 2007-11-07
2 NA NA NA NA
3 2009-10-24 NA 2011-10-13 NA
4 2007-10-10 NA NA NA
5 2006-09-01 2005-08-10 NA NA
6 2007-09-04 2011-10-05 NA NA
7 2005-10-25 NA NA 2011-11-04

# read the data object

data1 - read.table(textConnection(temp),
colClasses=c(character, Date, Date, Date, Date),
header=TRUE, as.is=TRUE
)


# create a new column

data2 - data1 %%
 rowwise() %%
  mutate(oiddate=as.Date(max(mrjdate,cocdate, inhdate, haldate,
   na.rm=TRUE), 
origin='1970-01-01'))

# print records

print (data2)

# range of dates

range(data2$mrjdate[complete.cases(data2$mrjdate)])
range(data2$cocdate[complete.cases(data2$cocdate)])
range(data2$inhdate[complete.cases(data2$inhdate)])
range(data2$haldate[complete.cases(data2$haldate)])
range(data2$oiddate[complete.cases(data2$oiddate)])





Pradip K. Muhuri, PhD
SAMHSA/CBHSQ
1 Choke Cherry Road, Room 2-1071
Rockville, MD 20857
Tel: 240-276-1070
Fax: 240-276-1260



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] range () does not remove NA's with complete.cases() for dates (dplyr/mutate)

2014-11-10 Thread Muhuri, Pradip (SAMHSA/CBHSQ)

Mark,

Thank you very much for further looking into this issue.  So, the ugly 
solution is better!  Would you like to bring to Hadley's attention that mutate 
does set the NA value for the new column?

Regards,

Pradip

Pradip K. Muhuri, PhD
SAMHSA/CBHSQ
1 Choke Cherry Road, Room 2-1071
Rockville, MD 20857
Tel: 240-276-1070
Fax: 240-276-1260


-Original Message-
From: Mark Sharp [mailto:msh...@txbiomed.org] 
Sent: Monday, November 10, 2014 12:23 PM
To: Muhuri, Pradip (SAMHSA/CBHSQ)
Cc: r-help@r-project.org
Subject: Re: [R] range () does not remove NA's with complete.cases() for dates 
(dplyr/mutate)

Pradip,

For some reason mutate is not setting the is.NA value for the new column. Note 
the output below using your data structures.

 ## It looks at first as if the second element of both columns are NA.
 data2$mrjdate[2]
[1] NA
 data2$oiddate[2]
[1] NA
 ## for convenience
 mrj - data2$mrjdate[2]
 oid - data2$oiddate[2]
 mode(mrj)
[1] numeric
 mode(oid)
[1] numeric
 str(mrj)
 Date[1:1], format: NA
 str(oid)
 Date[1:1], format: NA
 class(mrj)
[1] Date
 class(oid)
[1] Date
 ## But note:
 identical(mrj, oid)
[1] FALSE
 all.equal(mrj, oid)
[1] 'is.NA' value mismatch: 0 in current 1 in target
## functioning code
data2$mrjdate[2]
data2$oiddate[2]
mrj - data2$mrjdate[2]
oid - data2$oiddate[2]
mode(mrj)
mode(oid)
str(mrj)
str(oid)
class(mrj)
class(oid)
# But note:
identical(mrj, oid)
all.equal(mrj, oid)

## This ugly solution does not have the problem.
 data3 - data1
 data3$oiddate - as.Date(sapply(seq_along(data3$id), function(row) {
+   if (all(is.na(unlist(data1[row, -1] {
+ max_d - NA
+   } else {
+ max_d - max(unlist(data1[row, -1]), na.rm = TRUE)
+   }
+   max_d}),
+   origin = 1970-01-01)

 range(data3$mrjdate[complete.cases(data3$mrjdate)])
[1] 2004-11-04 2009-10-24
 range(data3$cocdate[complete.cases(data3$cocdate)])
[1] 2005-08-10 2011-10-05
 range(data3$inhdate[complete.cases(data3$inhdate)])
[1] 2005-07-07 2011-10-13
 range(data3$haldate[complete.cases(data3$haldate)])
[1] 2007-11-07 2011-11-04
 range(data3$oiddate[complete.cases(data3$oiddate)])
[1] 2006-09-01 2011-11-04

Working code below.

data3 - data1
data3$oiddate - as.Date(sapply(seq_along(data3$id), function(row) {
  if (all(is.na(unlist(data1[row, -1] {
max_d - NA
  } else {
max_d - max(unlist(data1[row, -1]), na.rm = TRUE)
  }
  max_d}),
  origin = 1970-01-01)

range(data3$mrjdate[complete.cases(data3$mrjdate)])
range(data3$cocdate[complete.cases(data3$cocdate)])
range(data3$inhdate[complete.cases(data3$inhdate)])
range(data3$haldate[complete.cases(data3$haldate)])
range(data3$oiddate[complete.cases(data3$oiddate)])


On Nov 10, 2014, at 10:10 AM, Muhuri, Pradip (SAMHSA/CBHSQ) wrote:

 Hello,

 The range() with complete.cases() removes NA's for the date variables that 
 are read from a data frame.  However, the issue is that the same function 
 does not remove NA's for the other date variable that is created using the 
 dplyr/mutate().  The console and the reproducible example are given below. 
 Any advice how to resolve this issue would be appreciated.

 Thanks,

 Pradip Muhuri


 #  cut and pasted from the R console 

 idmrjdatecocdateinhdatehaldateoiddate
 1  1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2008-07-18
 2  2   NA   NA   NA   NA   NA
 3  3 2009-10-24   NA 2011-10-13   NA 2011-10-13
 4  4 2007-10-10   NA   NA   NA 2007-10-10
 5  5 2006-09-01 2005-08-10   NA   NA 2006-09-01
 6  6 2007-09-04 2011-10-05   NA   NA 2011-10-05
 7  7 2005-10-25   NA   NA 2011-11-04 2011-11-04

 # range of dates

 range(data2$mrjdate[complete.cases(data2$mrjdate)])
 [1] 2004-11-04 2009-10-24
 range(data2$cocdate[complete.cases(data2$cocdate)])
 [1] 2005-08-10 2011-10-05
 range(data2$inhdate[complete.cases(data2$inhdate)])
 [1] 2005-07-07 2011-10-13
 range(data2$haldate[complete.cases(data2$haldate)])
 [1] 2007-11-07 2011-11-04
 range(data2$oiddate[complete.cases(data2$oiddate)])
 [1] NA   2011-11-04


   reproducible code #

 library(dplyr)
 library(lubridate)
 library(zoo)
 # data object - description of the

 temp - id  mrjdate cocdate inhdate haldate
 1 2004-11-04 2008-07-18 2005-07-07 2007-11-07
 2 NA NA NA NA
 3 2009-10-24 NA 2011-10-13 NA
 4 2007-10-10 NA NA NA
 5 2006-09-01 2005-08-10 NA NA
 6 2007-09-04 2011-10-05 NA NA
 7 2005-10-25 NA NA 2011-11-04

 # read the data object

 data1 - read.table(textConnection(temp),
colClasses=c(character, Date, Date, Date, Date),
header=TRUE, as.is=TRUE
)


 # create a new column

 data2 - data1 %%
 rowwise() %%
  mutate(oiddate=as.Date(max(mrjdate,cocdate, inhdate, haldate

Re: [R] range () does not remove NA's with complete.cases() for dates (dplyr/mutate)

2014-11-10 Thread Muhuri, Pradip (SAMHSA/CBHSQ)

Hi Bill and mark,

I meant the mutate does NOT set the NA value – sorry for the confusion.  Thank 
you for your clarifications that this may not be mutate()’s problem.  This 
thread is now closed from my end.

Thanks,

Pradip

Pradip K. Muhuri, PhD
SAMHSA/CBHSQ
1 Choke Cherry Road, Room 2-1071
Rockville, MD 20857
Tel: 240-276-1070
Fax: 240-276-1260

From: William Dunlap [mailto:wdun...@tibco.com]
Sent: Monday, November 10, 2014 1:30 PM
To: Muhuri, Pradip (SAMHSA/CBHSQ)
Cc: Mark Sharp; r-help@r-project.org
Subject: Re: [R] range () does not remove NA's with complete.cases() for dates 
(dplyr/mutate)

 Would you like to bring to Hadley's attention that mutate does
 set the NA value for the new column?

This may not be mutate()'s problem.

The Date class is messed up with regard to NA's and Inf's.  E.g., what gets 
printed as NA does not correspond to what is.nahttp://is.na() returns and its 
range() method does not appear to pass the finite=TRUE argument to 
range.default:
   d - as.Date(c(2014-10-31, c(2014-11-10)))
   d1 - range(d[0], finite=TRUE)
  Warning messages:
  1: In min.default(numeric(0), na.rm = FALSE) :
no non-missing arguments to min; returning Inf
  2: In max.default(numeric(0), na.rm = FALSE) :
no non-missing arguments to max; returning -Inf
   d1
  [1] NA NA
   is.nahttp://is.na(d1)
  [1] FALSE FALSE
   dput(d1)
  structure(c(Inf, -Inf), class = Date)
   range(c(d1, d), finite=TRUE)
  [1] NA NA
   range(c(d1, d), finite=TRUE, na.rm=TRUE)
  [1] NA NA



Bill Dunlap
TIBCO Software
wdunlap tibco.comhttp://tibco.com

On Mon, Nov 10, 2014 at 10:09 AM, Muhuri, Pradip (SAMHSA/CBHSQ) 
pradip.muh...@samhsa.hhs.govmailto:pradip.muh...@samhsa.hhs.gov wrote:
Mark,

Thank you very much for further looking into this issue.  So, the ugly 
solution is better!  Would you like to bring to Hadley's attention that mutate 
does set the NA value for the new column?

Regards,

Pradip

Pradip K. Muhuri, PhD
SAMHSA/CBHSQ
1 Choke Cherry Road, Room 2-1071
Rockville, MD 20857
Tel: 240-276-1070tel:240-276-1070
Fax: 240-276-1260tel:240-276-1260


-Original Message-
From: Mark Sharp [mailto:msh...@txbiomed.orgmailto:msh...@txbiomed.org]
Sent: Monday, November 10, 2014 12:23 PM
To: Muhuri, Pradip (SAMHSA/CBHSQ)
Cc: r-help@r-project.orgmailto:r-help@r-project.org
Subject: Re: [R] range () does not remove NA's with complete.cases() for dates 
(dplyr/mutate)
Pradip,

For some reason mutate is not setting the is.NA value for the new column. Note 
the output below using your data structures.

 ## It looks at first as if the second element of both columns are NA.
 data2$mrjdate[2]
[1] NA
 data2$oiddate[2]
[1] NA
 ## for convenience
 mrj - data2$mrjdate[2]
 oid - data2$oiddate[2]
 mode(mrj)
[1] numeric
 mode(oid)
[1] numeric
 str(mrj)
 Date[1:1], format: NA
 str(oid)
 Date[1:1], format: NA
 class(mrj)
[1] Date
 class(oid)
[1] Date
 ## But note:
 identical(mrj, oid)
[1] FALSE
 all.equal(mrj, oid)
[1] 'is.NA' value mismatch: 0 in current 1 in target
## functioning code
data2$mrjdate[2]
data2$oiddate[2]
mrj - data2$mrjdate[2]
oid - data2$oiddate[2]
mode(mrj)
mode(oid)
str(mrj)
str(oid)
class(mrj)
class(oid)
# But note:
identical(mrj, oid)
all.equal(mrj, oid)

## This ugly solution does not have the problem.
 data3 - data1
 data3$oiddate - as.Date(sapply(seq_along(data3$id), function(row) {
+   if (all(is.nahttp://is.na(unlist(data1[row, -1] {
+ max_d - NA
+   } else {
+ max_d - max(unlist(data1[row, -1]), na.rm = TRUE)
+   }
+   max_d}),
+   origin = 1970-01-01)

 range(data3$mrjdate[complete.cases(data3$mrjdate)])
[1] 2004-11-04 2009-10-24
 range(data3$cocdate[complete.cases(data3$cocdate)])
[1] 2005-08-10 2011-10-05
 range(data3$inhdate[complete.cases(data3$inhdate)])
[1] 2005-07-07 2011-10-13
 range(data3$haldate[complete.cases(data3$haldate)])
[1] 2007-11-07 2011-11-04
 range(data3$oiddate[complete.cases(data3$oiddate)])
[1] 2006-09-01 2011-11-04

Working code below.

data3 - data1
data3$oiddate - as.Date(sapply(seq_along(data3$id), function(row) {
  if (all(is.nahttp://is.na(unlist(data1[row, -1] {
max_d - NA
  } else {
max_d - max(unlist(data1[row, -1]), na.rm = TRUE)
  }
  max_d}),
  origin = 1970-01-01)

range(data3$mrjdate[complete.cases(data3$mrjdate)])
range(data3$cocdate[complete.cases(data3$cocdate)])
range(data3$inhdate[complete.cases(data3$inhdate)])
range(data3$haldate[complete.cases(data3$haldate)])
range(data3$oiddate[complete.cases(data3$oiddate)])


On Nov 10, 2014, at 10:10 AM, Muhuri, Pradip (SAMHSA/CBHSQ) wrote:

 Hello,

 The range() with complete.cases() removes NA's for the date variables that 
 are read from a data frame.  However, the issue is that the same function 
 does not remove NA's for the other date variable that is created using the 
 dplyr/mutate().  The console and the reproducible example are given below. 
 Any advice how to resolve this issue would be appreciated.

 Thanks,

 Pradip Muhuri


 #  cut and pasted from the R console

Re: [R] Getting the most recent dates in a new column from dates in four columns using the dplyr package (mutate verb)

2014-11-09 Thread Muhuri, Pradip (SAMHSA/CBHSQ)

Hi Dan,

Thank you so much for sending me your code that provides me desired results. 
But, I don't understand  why I am getting the follow warning message, In 
FUN(newX[, i], ...) : no non-missing arguments, returning NA. Any thoughts?

Regards,

Pradip



data2x - within(data1, oidflag - apply(data1[,-1], 1, max, na.rm=TRUE))

Warning message:
In FUN(newX[, i], ...) : no non-missing arguments, returning NA
 data2x
  idmrjdatecocdateinhdatehaldateoidflag
1  1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2008-07-18
2  2   NA   NA   NA   NA   NA
3  3 2009-10-24   NA 2011-10-13   NA 2011-10-13
4  4 2007-10-10   NA   NA   NA 2007-10-10
5  5 2006-09-01 2005-08-10   NA   NA 2006-09-01
6  6 2007-09-04 2011-10-05   NA   NA 2011-10-05
7  7 2005-10-25   NA   NA 2011-11-04 2011-11-04


Pradip K. Muhuri, PhD
SAMHSA/CBHSQ
1 Choke Cherry Road, Room 2-1071
Rockville, MD 20857
Tel: 240-276-1070
Fax: 240-276-1260

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Daniel Nordlund
Sent: Sunday, November 09, 2014 5:33 AM
To: r-help@r-project.org
Subject: Re: [R] Getting the most recent dates in a new column from dates in 
four columns using the dplyr package (mutate verb)

On 11/8/2014 8:40 PM, Muhuri, Pradip (SAMHSA/CBHSQ) wrote:
 Hello,



 The example data frame in the reproducible code below has 5 columns (1 column 
 for id and 4 columns for dates), and there are 7 observations.  I would like 
 to insert the most recent date from those 4 date columns into a new column 
 (oiddate) using the mutate() function in the dplyr package.   I am getting 
 correct results (NA in the new column) if a given row has all NA's in the 
 four columns.  However, the issue is that the date value inserted into the 
 new column (oidflag) is incorrect for 5 of the remaining 6 rows (with a 
 non-NA value in at least 1 of the four columns).



 I would appreciate receiving your help toward resolving the issue.  Please 
 see the R console and the R script (reproducible example)below.



 Thanks in advance.



 Pradip





 ##  from the console 

 print (data2)

idmrjdatecocdateinhdatehaldateoidflag

 1  1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2011-11-04

 2  2   NA   NA   NA   NA   NA

 3  3 2009-10-24   NA 2011-10-13   NA 2011-11-04

 4  4 2007-10-10   NA   NA   NA 2011-11-04

 5  5 2006-09-01 2005-08-10   NA   NA 2011-11-04

 6  6 2007-09-04 2011-10-05   NA   NA 2011-11-04

 7  7 2005-10-25   NA   NA 2011-11-04 2011-11-04





 ##  Reproducible code and data 
 #



 library(dplyr)

 library(lubridate)

 library(zoo)

 # data object - description of the



 temp - id  mrjdate cocdate inhdate haldate

 1 2004-11-04 2008-07-18 2005-07-07 2007-11-07

 2 NA NA NA NA

 3 2009-10-24 NA 2011-10-13 NA

 4 2007-10-10 NA NA NA

 5 2006-09-01 2005-08-10 NA NA

 6 2007-09-04 2011-10-05 NA NA

 7 2005-10-25 NA NA 2011-11-04



 # read the data object



 data1 - read.table(textConnection(temp),

  colClasses=c(character, Date, Date, Date, 
 Date),

  header=TRUE, as.is=TRUE

  )

 # create a new column



 data2 - mutate(data1,

  oidflag= ifelse(is.na(mrjdate)  is.na(cocdate)  
 is.na(inhdate)   is.na(haldate), NA,

max(mrjdate, cocdate, inhdate, 
 haldate,na.rm=TRUE )

  )

  )



 # convert to date

 data2$oidflag = as.Date(data2$oidflag, origin=1970-01-01)



 # print records



 print (data2)





 Pradip K. Muhuri, PhD

 SAMHSA/CBHSQ

 1 Choke Cherry Road, Room 2-1071

 Rockville, MD 20857

 Tel: 240-276-1070

 Fax: 240-276-1260





   [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


I am not familiar with the mutate() function from dplyr, but you can get your 
wanted results as follows:

data2 - within(data1, oidflag - apply(data1[,-1], 1, max, na.rm=TRUE))


Hope this is helpful,

Dan

Daniel Nordlund
Bothell, WA USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https

Re: [R] Getting the most recent dates in a new column from dates in four columns using the dplyr package (mutate verb)

2014-11-09 Thread Muhuri, Pradip (SAMHSA/CBHSQ)

Dear Arun,

Thank you so much for sending me the dplyr/mutate() solution to my code.
But,  I am getting the following warning message.  Any suggestions on how to 
avoid this message?

Pradip

Warning message:
In max(13081, NA_real_, NA_real_, 15282, na.rm = TRUE) :
  no non-missing arguments to max; returning -Inf


#
data1 %% 
+   
+   rowwise() %%
+   mutate(oldflag=as.Date(max(mrjdate,cocdate, inhdate, haldate,
+  na.rm=TRUE), origin='1970-01-01'))
Source: local data frame [7 x 6]
Groups: by row

  idmrjdatecocdateinhdatehaldateoldflag
1  1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2008-07-18
2  2   NA   NA   NA   NA   NA
3  3 2009-10-24   NA 2011-10-13   NA 2011-10-13
4  4 2007-10-10   NA   NA   NA 2007-10-10
5  5 2006-09-01 2005-08-10   NA   NA 2006-09-01
6  6 2007-09-04 2011-10-05   NA   NA 2011-10-05
7  7 2005-10-25   NA   NA 2011-11-04 2011-11-04
Warning message:
In max(13081, NA_real_, NA_real_, 15282, na.rm = TRUE) :
  no non-missing arguments to max; returning -Inf


Pradip K. Muhuri, PhD
SAMHSA/CBHSQ
1 Choke Cherry Road, Room 2-1071
Rockville, MD 20857
Tel: 240-276-1070
Fax: 240-276-1260

-Original Message-
From: arun [mailto:smartpink...@yahoo.com] 
Sent: Sunday, November 09, 2014 7:00 AM
To: Muhuri, Pradip (SAMHSA/CBHSQ); r-help@r-project.org
Subject: Re: [R] Getting the most recent dates in a new column from dates in 
four columns using the dplyr package (mutate verb)

You could try

library(dplyr)
data1 %% 

  rowwise() %%
   mutate(oldflag=as.Date(max(mrjdate,cocdate, inhdate, haldate,
   na.rm=TRUE), origin='1970-01-01'))
Source: local data frame [7 x 6]
Groups: by row

idmrjdatecocdateinhdatehaldateoldflag
1  1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2008-07-18
2  2   NA   NA   NA   NA   NA
3  3 2009-10-24   NA 2011-10-13   NA 2011-10-13
4  4 2007-10-10   NA   NA   NA 2007-10-10
5  5 2006-09-01 2005-08-10   NA   NA 2006-09-01
6  6 2007-09-04 2011-10-05   NA   NA 2011-10-05
7  7 2005-10-25   NA   NA 2011-11-04 2011-11-04

A.K.


On Saturday, November 8, 2014 11:42 PM, Muhuri, Pradip (SAMHSA/CBHSQ) 
pradip.muh...@samhsa.hhs.gov wrote:
Hello,



The example data frame in the reproducible code below has 5 columns (1 column 
for id and 4 columns for dates), and there are 7 observations.  I would like to 
insert the most recent date from those 4 date columns into a new column 
(oiddate) using the mutate() function in the dplyr package.   I am getting 
correct results (NA in the new column) if a given row has all NA's in the four 
columns.  However, the issue is that the date value inserted into the new 
column (oidflag) is incorrect for 5 of the remaining 6 rows (with a non-NA 
value in at least 1 of the four columns).



I would appreciate receiving your help toward resolving the issue.  Please see 
the R console and the R script (reproducible example)below.



Thanks in advance.



Pradip





##  from the console 

print (data2)

  idmrjdatecocdateinhdatehaldateoidflag

1  1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2011-11-04

2  2   NA   NA   NA   NA   NA

3  3 2009-10-24   NA 2011-10-13   NA 2011-11-04

4  4 2007-10-10   NA   NA   NA 2011-11-04

5  5 2006-09-01 2005-08-10   NA   NA 2011-11-04

6  6 2007-09-04 2011-10-05   NA   NA 2011-11-04

7  7 2005-10-25   NA   NA 2011-11-04 2011-11-04





##  Reproducible code and data 
#



library(dplyr)

library(lubridate)

library(zoo)

# data object - description of the



temp - id  mrjdate cocdate inhdate haldate

1 2004-11-04 2008-07-18 2005-07-07 2007-11-07

2 NA NA NA NA

3 2009-10-24 NA 2011-10-13 NA

4 2007-10-10 NA NA NA

5 2006-09-01 2005-08-10 NA NA

6 2007-09-04 2011-10-05 NA NA

7 2005-10-25 NA NA 2011-11-04



# read the data object



data1 - read.table(textConnection(temp),

colClasses=c(character, Date, Date, Date, Date),

header=TRUE, as.is=TRUE

)

# create a new column



data2 - mutate(data1,

oidflag= ifelse(is.na(mrjdate)  is.na(cocdate)  
is.na(inhdate)   is.na(haldate), NA,

  max(mrjdate, cocdate, inhdate, 
haldate,na.rm=TRUE )

)

)



# convert to date

data2$oidflag = as.Date(data2$oidflag, origin=1970-01-01)



# print records



print (data2)





Pradip K. Muhuri, PhD

SAMHSA/CBHSQ

1 Choke Cherry Road, Room 2-1071

Rockville, MD 20857

Tel: 240-276

Re: [R] Getting the most recent dates in a new column from dates in four columns using the dplyr package (mutate verb)

2014-11-09 Thread Muhuri, Pradip (SAMHSA/CBHSQ)

Hi Arun and Dennis,

This is just an FYI.

You're right - In one row, there are all NA's in  the four  date columns.  I 
have tested below the TRUEness of the condition Arun has set.

is.logical(data1[rowSums(is.na(data1[,-1]))!=4,])
[1] FALSE

All these 3 approaches below provide the exact same results.

# Approach 1 (suggested by Arun): The code gives the expected results, but with 
a warning message.
data1 %% 

   rowwise() %%
   mutate(oldflag=as.Date(max(mrjdate,cocdate, inhdate, haldate,
 na.rm=TRUE), origin='1970-01-01'))

# Approach 2: This code (suggested by Dan) does not provide now a warning 
message although it provided such message earlier.
data2x - within(data1, oidflag - apply(data1[,-1], 1, max, na.rm=TRUE))


# Approach 2: This code (suggested by Mark) does not provide a warning message
data2 - data1
data2$oidflag - as.Date(sapply(seq_along(data2$id), function(row) {
  if (all(is.na(unlist(data1[row, -1] {
max_d - NA
  } else {
max_d - max(unlist(data1[row, -1]), na.rm = TRUE)
  }
  max_d}),
  origin = 1970-01-01)


##  ends here 

Pradip K. Muhuri, PhD
SAMHSA/CBHSQ
1 Choke Cherry Road, Room 2-1071
Rockville, MD 20857
Tel: 240-276-1070
Fax: 240-276-1260

-Original Message-
From: arun [mailto:smartpink...@yahoo.com] 
Sent: Sunday, November 09, 2014 10:18 AM
To: Muhuri, Pradip (SAMHSA/CBHSQ); r-help@r-project.org
Subject: Re: [R] Getting the most recent dates in a new column from dates in 
four columns using the dplyr package (mutate verb)



Dear Pradip,

From the documentation of ?max: 


   The minimum and maximum of a numeric empty set are ‘+Inf’ and
‘-Inf’ 

One of the rows in your dataset is all `NAs.`  I am not sure you want to keep 
that row with all NAs.  You could remove it and run the code or keep it and run 
with that warning.

data1 - data1[rowSums(is.na(data1[,-1]))!=4,]

data1 %% 

  rowwise()%%
  mutate(oldflag= as.Date(max(mrjdate, cocdate, inhdate, haldate, 
na.rm=TRUE), origin='1970-01-01')


A.K.
On Sunday, November 9, 2014 9:16 AM, Muhuri, Pradip (SAMHSA/CBHSQ) 
pradip.muh...@samhsa.hhs.gov wrote:



Dear Arun,

Thank you so much for sending me the dplyr/mutate() solution to my code.
But,  I am getting the following warning message.  Any suggestions on how to 
avoid this message?

Pradip

Warning message:
In max(13081, NA_real_, NA_real_, 15282, na.rm = TRUE) :
  no non-missing arguments to max; returning -Inf


#
data1 %% 
+  
+   rowwise() %%
+   mutate(oldflag=as.Date(max(mrjdate,cocdate, inhdate, haldate,
+  na.rm=TRUE), origin='1970-01-01'))
Source: local data frame [7 x 6]
Groups: by row

  idmrjdatecocdateinhdatehaldateoldflag
1  1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2008-07-18
2  2   NA   NA   NA   NA   NA
3  3 2009-10-24   NA 2011-10-13   NA 2011-10-13
4  4 2007-10-10   NA   NA   NA 2007-10-10
5  5 2006-09-01 2005-08-10   NA   NA 2006-09-01
6  6 2007-09-04 2011-10-05   NA   NA 2011-10-05
7  7 2005-10-25   NA   NA 2011-11-04 2011-11-04
Warning message:
In max(13081, NA_real_, NA_real_, 15282, na.rm = TRUE) :
  no non-missing arguments to max; returning -Inf


Pradip K. Muhuri, PhD
SAMHSA/CBHSQ
1 Choke Cherry Road, Room 2-1071
Rockville, MD 20857
Tel: 240-276-1070
Fax: 240-276-1260


-Original Message-
From: arun [mailto:smartpink...@yahoo.com] 
Sent: Sunday, November 09, 2014 7:00 AM
To: Muhuri, Pradip (SAMHSA/CBHSQ); r-help@r-project.org
Subject: Re: [R] Getting the most recent dates in a new column from dates in 
four columns using the dplyr package (mutate verb)

You could try

library(dplyr)
data1 %% 

  rowwise() %%
   mutate(oldflag=as.Date(max(mrjdate,cocdate, inhdate, haldate,
   na.rm=TRUE), origin='1970-01-01'))
Source: local data frame [7 x 6]
Groups: by row

idmrjdatecocdateinhdatehaldateoldflag
1  1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2008-07-18
2  2   NA   NA   NA   NA   NA
3  3 2009-10-24   NA 2011-10-13   NA 2011-10-13
4  4 2007-10-10   NA   NA   NA 2007-10-10
5  5 2006-09-01 2005-08-10   NA   NA 2006-09-01
6  6 2007-09-04 2011-10-05   NA   NA 2011-10-05
7  7 2005-10-25   NA   NA 2011-11-04 2011-11-04

A.K.


On Saturday, November 8, 2014 11:42 PM, Muhuri, Pradip (SAMHSA/CBHSQ) 
pradip.muh...@samhsa.hhs.gov wrote:
Hello,



The example data frame in the reproducible code below has 5 columns (1 column 
for id and 4 columns for dates), and there are 7 observations.  I would like to 
insert the most recent date from those 4 date columns into a new column 
(oiddate) using the mutate() function in the dplyr package.   I am getting 
correct results (NA in the new column) if a given row has all NA's in the four 
columns

[R] Getting the most recent dates in a new column from dates in four columns using the dplyr package (mutate verb)

2014-11-08 Thread Muhuri, Pradip (SAMHSA/CBHSQ)

Hello,



The example data frame in the reproducible code below has 5 columns (1 column 
for id and 4 columns for dates), and there are 7 observations.  I would like to 
insert the most recent date from those 4 date columns into a new column 
(oiddate) using the mutate() function in the dplyr package.   I am getting 
correct results (NA in the new column) if a given row has all NA's in the four 
columns.  However, the issue is that the date value inserted into the new 
column (oidflag) is incorrect for 5 of the remaining 6 rows (with a non-NA 
value in at least 1 of the four columns).



I would appreciate receiving your help toward resolving the issue.  Please see 
the R console and the R script (reproducible example)below.



Thanks in advance.



Pradip





##  from the console 

print (data2)

  idmrjdatecocdateinhdatehaldateoidflag

1  1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2011-11-04

2  2   NA   NA   NA   NA   NA

3  3 2009-10-24   NA 2011-10-13   NA 2011-11-04

4  4 2007-10-10   NA   NA   NA 2011-11-04

5  5 2006-09-01 2005-08-10   NA   NA 2011-11-04

6  6 2007-09-04 2011-10-05   NA   NA 2011-11-04

7  7 2005-10-25   NA   NA 2011-11-04 2011-11-04





##  Reproducible code and data 
#



library(dplyr)

library(lubridate)

library(zoo)

# data object - description of the



temp - id  mrjdate cocdate inhdate haldate

1 2004-11-04 2008-07-18 2005-07-07 2007-11-07

2 NA NA NA NA

3 2009-10-24 NA 2011-10-13 NA

4 2007-10-10 NA NA NA

5 2006-09-01 2005-08-10 NA NA

6 2007-09-04 2011-10-05 NA NA

7 2005-10-25 NA NA 2011-11-04



# read the data object



data1 - read.table(textConnection(temp),

colClasses=c(character, Date, Date, Date, Date),

header=TRUE, as.is=TRUE

)

# create a new column



data2 - mutate(data1,

oidflag= ifelse(is.na(mrjdate)  is.na(cocdate)  
is.na(inhdate)   is.na(haldate), NA,

  max(mrjdate, cocdate, inhdate, 
haldate,na.rm=TRUE )

)

)



# convert to date

data2$oidflag = as.Date(data2$oidflag, origin=1970-01-01)



# print records



print (data2)





Pradip K. Muhuri, PhD

SAMHSA/CBHSQ

1 Choke Cherry Road, Room 2-1071

Rockville, MD 20857

Tel: 240-276-1070

Fax: 240-276-1260





[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Adding labels to ColSums

2014-10-28 Thread Muhuri, Pradip (SAMHSA/CBHSQ)

Hello,

I was trying to add labels to the colSums  of the  integers variable 
corresponding to a factor.  Below are the  warning message and the 
reproducible code.  How would I tweak the code to replace the NA with the 
Total in the output?  Your advice toward resolving the issue would be greatly 
appreciated.

Thanks,

Pradip Muhuri



###  warning message - from the console  

rb.data - rbind(s.data2, c(Total, colSums(s.data2[,2, drop=FALSE]))) # row 
bind with the column total

Warning message:
In `[-.factor`(`*tmp*`, ri, value = Total) :
  invalid factor level, NA generated
 rb.data
Source: local data frame [7 x 2]

  years.before.initiated.cat anl.count
1  [0,1]89
2  (1,2]73
3  (2,3]72
4  (3,4]82
5  (4,5]82
6  (5,6]86
7 NA   484
#  reproducible code 
#
library(dplyr)

i.data2 - data.frame(sample(1:6, size=484, replace=T)) # simulate data to 
create a data frame
colnames(i.data2) - years.before.initiated # add a column name
  
m.data2 - mutate(i.data2,  years.before.initiated.cat = 
cut(years.before.initiated, 
breaks=c(0,1,2,3,4,5,6),include.lowest=TRUE))
# create a new variable

g.data2 - group_by(m.data2, years.before.initiated.cat) # group by 
years.before.initiated.cat
s.data2 - summarise(g.data2, anl.count =n() ) # summarize to get the count

rb.data - rbind(s.data2, c(Total, colSums(s.data2[,2, drop=FALSE]))) # row 
bind with the column total
rb.data
###

Pradip K. Muhuri, PhD
SAMHSA/CBHSQ
1 Choke Cherry Road, Room 2-1071
Rockville, MD 20857
Tel: 240-276-1070
Fax: 240-276-1260
ommented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Error Reading from Connection

2014-09-23 Thread Muhuri, Pradip (SAMHSA/CBHSQ)

Hello,



I am running Rx64 3.03 under Windows 8 environment. I have been getting the 
following error.

 when running some of my old R applications. Below is a mock-up example.





Could someone please help me resolve the issue?



Thanks,



Pradip Muhuri







setwd (D:/)


 #load Rdata file

 load(heroin.rdata)
Error: error reading from connection

 str(heroin)
Error in str(heroin) : object 'heroin' not found

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Error Reading from Connection

2014-09-23 Thread Muhuri, Pradip (SAMHSA/CBHSQ)

Hello,

Thank you so much for your guidance. This time I am providing more information. 
R Script and R Console are appended below.

The list.files() below provides evidence the existence of this file in the 
temp directory.  Please note that the heroin.rdata file was created from 
the SAS data set using the Stat Transfer utility software.

The file.access() below did not return mode=4.  Does this mean that I don't 
have read access to the file?  Is that the reason I could not load the file?

I would appreciate receiving help resolve the issue.

Pradip Muhuri

R Script ***
setwd (D:/temp)
list.files()
file.access(heroin.rdata, mode=4)
load(heroin.rdata)

* R Console *

 setwd (D:/temp)
 list.files()
[1] heroin.rdata
 file.access(heroin.rdata, mode=4)
heroin.rdata 
   0 
 load(heroin.rdata)
Error: error reading from connection


From: Jeff Newmiller [jdnew...@dcn.davis.ca.us]
Sent: Tuesday, September 23, 2014 9:20 PM
To: Muhuri, Pradip (SAMHSA/CBHSQ); r-help@r-project.org
Subject: Re: [R] Error Reading from Connection

Insufficient information, and irrelevant information (the second error is a 
direct consequence of the first).

We have no way of knowing based on this input that your file is there. 
(?list.files). We also don't know if you have read access to that file 
(?file.access).

Since you posted in HTML and failed to provide the requested minimum 
information, you should probably (re-)read the Posting Guide mentioned at the 
bottom of this (and any other) message on this mailing list. You should 
probably also follow the advice given there to update your R software to the 
latest version so we don't go chasing any problems in R for your operating 
system that have already been solved.
---
Jeff NewmillerThe .   .  Go Live...
DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
---
Sent from my phone. Please excuse my brevity.

On September 23, 2014 5:36:59 PM PDT, Muhuri, Pradip (SAMHSA/CBHSQ) 
pradip.muh...@samhsa.hhs.gov wrote:
Hello,



I am running Rx64 3.03 under Windows 8 environment. I have been getting
the following error.

when running some of my old R applications. Below is a mock-up example.





Could someone please help me resolve the issue?



Thanks,



Pradip Muhuri







setwd (D:/)


 #load Rdata file

 load(heroin.rdata)
Error: error reading from connection

 str(heroin)
Error in str(heroin) : object 'heroin' not found

   [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Regression Tolerance Intervals - Dr. Young's Code

2013-06-09 Thread Muhuri, Pradip (SAMHSA/CBHSQ)

Uwe and Dennis - Thank you so much for your comments, tips and advice. The 
following reproducible code has worked and given me the desired results. 

Pradip


### Revised Code #


setwd (C:/RAPP)
require (tolerance)
set.seed (100);x - runif (200,0,10); y - 20+5*x + rnorm (100,0,20); 
data.frame (cbind (x,y))
out - regtol.int (reg=lm(y~x), new.x=cbind (c(3,6,12)), side=2, alpha=.05, 
P=.90);
plottol(out, x=cbind(1,x), y=y, side=two, x.lab=X, y.lab=Y )




From: Uwe Ligges [lig...@statistik.tu-dortmund.de]
Sent: Sunday, June 09, 2013 11:54 AM
To: Muhuri, Pradip (SAMHSA/CBHSQ)
Cc: R help ‎[r-help@r-project.org]‎; mridulb...@aol.com
Subject: Re: [R] Regression Tolerance Intervals - Dr. Young's Code

On 08.06.2013 05:17, Muhuri, Pradip (SAMHSA/CBHSQ) wrote:
 Hello,

 Below is a reproducible example to generate the output by using Dr. Young's R 
 code on the above subject .   As commented below, the issue is that  part of 
 the code (regtol.int and plottol) does not seem to work.

 I would appreciate receiving your advice toward resolving the issue.

 Thanks and regards,

 Pradip Muhuri


 setwd (E:/)
 require (tolerance)

 d1- xlndur  ylnant
 8.910797  0.33901690
 9.001415  0.36464311
 8.983936  0.53976194
 8.948035  0.33901690
 9.056784  0.39266961
 9.018593  0.18617770
 9.001415  0.53976194
 8.983936 -0.11005034
 8.966147  0.53102826
 8.948035  0.59885086
 6.90  NA

 xd1 - read.table(textConnection(d1), header=TRUE, as.is=TRUE)
 print (xd1); str (xd1)

 #This code works
 xout1 - regtol.int (reg=lm (formula=ylnant ~ xlndur, data=xd1),  alpha=.05, 
 P=0.99, side=2)
 print (xout1)


 #This code does not work
 xout2 - regtol.int (reg=lm (formula=ylnant ~ xlndur, data=xd1), new.xlndur = 
 NULL,  alpha=.05, P=0.99, side=2)

Come on, start using your brain and replace new.xlndur by new.x?

 print (xout2)
 #This code does not work
 plottol(xout1, x=cbind(1,x), y=y, side=two, x.lab=X, y.lab=Y )

So replace x and y appropriately?

Best,
Uwe Ligges



 #This code does not work
 plottol(xout2, x=cbind(1,x), y=y, side=two, x.lab=X, y.lab=Y )


   [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Regression Tolerance Intervals - Dr. Young's Code

2013-06-07 Thread Muhuri, Pradip (SAMHSA/CBHSQ)

Hello,

Below is a reproducible example to generate the output by using Dr. Young's R 
code on the above subject .   As commented below, the issue is that  part of 
the code (regtol.int and plottol) does not seem to work.

I would appreciate receiving your advice toward resolving the issue.

Thanks and regards,

Pradip Muhuri


setwd (E:/)
require (tolerance)

d1- xlndur  ylnant
8.910797  0.33901690
9.001415  0.36464311
8.983936  0.53976194
8.948035  0.33901690
9.056784  0.39266961
9.018593  0.18617770
9.001415  0.53976194
8.983936 -0.11005034
8.966147  0.53102826
8.948035  0.59885086
6.90  NA

xd1 - read.table(textConnection(d1), header=TRUE, as.is=TRUE)
print (xd1); str (xd1)

#This code works
xout1 - regtol.int (reg=lm (formula=ylnant ~ xlndur, data=xd1),  alpha=.05, 
P=0.99, side=2)
print (xout1)


#This code does not work
xout2 - regtol.int (reg=lm (formula=ylnant ~ xlndur, data=xd1), new.xlndur = 
NULL,  alpha=.05, P=0.99, side=2)
print (xout2)

#This code does not work
plottol(xout1, x=cbind(1,x), y=y, side=two, x.lab=X, y.lab=Y )

#This code does not work
plottol(xout2, x=cbind(1,x), y=y, side=two, x.lab=X, y.lab=Y )


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Applying a user-defined function

2013-01-08 Thread Muhuri, Pradip (SAMHSA/CBHSQ)

Hello List,

My goal is to apply a user-defined function on several columns of a data frame. 
When testing the code on a reproducible example below, I get the following 
error message.

 #now Write a new function using the above cut ()/quantile function to apply 
 on different columns of the data frame

 CutQuintiles - function(x) {
+   cut (test1$x,quantile (test1$x, (0:5/5)),include.lowest=TRUE)
+ }

 #apply the CutQuintile () on every odd-numbered columns of the test1 data 
 frame
 newcols - sapply(test1 [, seq (1,6,2)], CutQuintiles)
Error in cut.default(test1$x, quantile(test1$x, (0:5/5)), include.lowest = 
TRUE) :
  'x' must be numeric

I would appreciate receiving your advice.

Thanks,

Pradip

## The reproducible example begins here

test1 - read.table (text=
State,ObtMj_P,ObtMj_SE,ExpPrevMed_P,ExpPrevMed_SE,ParMon_P,ParMon_SE
Alabama,49.60,1.37,80.00,0.91,12.10,0.68
Alaska,55.00,1.41,81.80,1.08,12.40,0.90
Arizona,52.50,1.56,79.60,1.20,15.80,1.08
Arkansas,50.50,1.22,78.00,0.78,12.80,0.72
California,51.10,0.65,80.50,0.53,13.00,0.41
Colorado,55.10,1.26,81.70,1.03,12.10,0.72
Connecticut,56.30,1.28,85.00,0.93,14.60,0.77
Delaware,53.60,1.30,79.50,1.04,14.70,0.97
District of Columbia,53.50,1.22,76.20,1.03,14.30,1.13
Florida,52.70,0.67,78.90,0.52,14.10,0.45
Georgia,52.50,1.15,79.30,1.02,15.90,0.98
Hawaii,49.40,1.33,83.80,1.12,16.00,1.06
Idaho,48.30,1.23,82.40,0.99,11.90,0.74
Illinois,52.70,0.63,81.00,0.46,13.60,0.40
Indiana,49.60,1.16,80.90,0.91,12.60,0.82
Iowa,46.30,1.37,82.10,1.01,13.60,0.87
Kansas,44.30,1.43,79.20,0.98,12.90,0.79
Kentucky,52.90,1.37,78.70,1.05,14.60,0.98
Louisiana,49.70,1.23,76.80,1.06,14.50,0.76
Maine,55.60,1.44,82.90,0.93,16.70,0.83
Maryland,53.90,1.46,83.60,0.95,14.00,0.80
Massachusetts,55.40,1.41,81.00,1.15,14.70,0.80
Michigan,52.40,0.62,80.50,0.47,15.00,0.43
Minnesota,51.50,1.20,84.40,0.87,14.40,0.86
Mississippi,43.20,1.14,76.60,0.91,12.30,0.78
Missouri,48.70,1.20,80.30,0.90,13.70,0.12
Montana,56.40,1.16,83.70,0.95,12.10,0.68
Nebraska,45.70,1.51,83.40,0.95,12.40,0.90
Nevada,54.20,1.17,80.60,1.07,15.80,1.08
New Hampshire,56.10,1.30,83.30,0.93,12.80,0.72
New Jersey,53.20,1.45,83.70,0.95,13.00,0.41
New Mexico,57.60,1.34,78.90,1.03,12.10,0.72
New York,53.70,0.67,82.60,0.48,14.60,0.77
North Carolina,52.20,1.26,81.90,0.84,14.70,0.97
North Dakota,48.60,1.34,84.20,0.88,14.30,1.13
Ohio,50.90,0.61,82.70,0.49,14.10,0.45
Oklahoma,47.20,1.42,78.80,1.33,15.90,0.98
Oregon,54.00,1.35,80.60,1.14,16.00,1.06
Pennsylvania,53.00,0.63,79.90,0.47,11.90,0.74
Rhode Island,57.20,1.20,79.50,1.02,13.60,0.40
South Carolina,50.50,1.21,79.50,0.95,12.60,0.82
South Dakota,43.40,1.30,81.70,1.05,13.60,0.87
Tennessee,48.90,1.35,78.40,1.35,12.90,0.79
Texas,48.70,0.62,79.00,0.48,14.60,0.98
Utah,42.00,1.49,85.00,0.93,14.50,0.76
Vermont,58.70,1.24,83.70,0.84,16.70,0.83
Virginia,51.80,1.18,82.00,1.04,14.00,0.80
Washington,53.50,1.39,84.10,0.96,14.70,0.80
West Virginia,52.80,1.07,79.80,0.93,15.00,0.43
Wisconsin,49.90,1.50,83.50,1.02,14.40,0.86
Wyoming,49.20,1.29,82.00,0.85,12.30,0.78
, sep=,, row.names='State',  header=TRUE, as.is=TRUE)


# Verify if The following function ctagorizes the obtmj_p values into one of 
the 5 equal sized groups- works fine.

cut (test1$obtmj_p,quantile (test1$obtmj_p, (0:5/5)),include.lowest=TRUE)


#now Write a new function using the above cut ()/quantile function to apply on 
different columns of the data frame

CutQuintiles - function(x) {
  cut (test1$x,quantile (test1$x, (0:5/5)),include.lowest=TRUE)
}

#apply the CutQuintile () on every odd-numbered columns of the test1 data 
frame
newcols - sapply(test1 [, seq (1,6,2)], CutQuintiles)

# name 3 new columns based on the odd-numbered columns
names(newcols) - paste (names(test1 [, seq (1,6,2)]), _cat)

##
Pradip K. Muhuri, PhD
Statistician
Substance Abuse  Mental Health Services Administration
The Center for Behavioral Health Statistics and Quality
Division of Population Surveys
1 Choke Cherry Road, Room 2-1071
Rockville, MD 20857

Tel: 240-276-1070
Fax: 240-276-1260
e-mail: pradip.muh...@samhsa.hhs.govmailto:pradip.muh...@samhsa.hhs.gov

The Center for Behavioral Health Statistics and Quality your feedback.  Please 
click on the following link to complete a brief customer survey:   
http://cbhsqsurvey.samhsa.govhttp://cbhsqsurvey.samhsa.gov/


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Applying a user-defined function

2013-01-08 Thread Muhuri, Pradip (SAMHSA/CBHSQ)


Hello List,

Last time, Arun's following solution worked to create 3 new columns (1,3,5).  
Now how would I tweak this function to create corresponding (additional) 
columns (7,8,9) of mode factor (levels = 1,2,3,4,5)?

Thanks for your continued support.

Pradip

### cut and paste from the reproducible example
CutQuintiles - function( x) {
  cut (x,quantile (x, (0:5/5)),include.lowest=TRUE)
}

#apply the CutQuintile () on every odd-numbered columns of the test1 data 
frame
test1$newcols - sapply(test1 [, seq (1,6,2)], CutQuintiles)

# name 3 new columns based on the odd-numbered columns
names(test1$newcols) - paste (names(test1 [, seq (1,6,2)]), _cat)




## Reproducible Example


test1 - read.table (text=
State,ObtMj_P,ObtMj_SE,ExpPrevMed_P,ExpPrevMed_SE,ParMon_P,ParMon_SE
Alabama,49.60,1.37,80.00,0.91,12.10,0.68
Alaska,55.00,1.41,81.80,1.08,12.40,0.90
Arizona,52.50,1.56,79.60,1.20,15.80,1.08
Arkansas,50.50,1.22,78.00,0.78,12.80,0.72
California,51.10,0.65,80.50,0.53,13.00,0.41
Colorado,55.10,1.26,81.70,1.03,12.10,0.72
Connecticut,56.30,1.28,85.00,0.93,14.60,0.77
Delaware,53.60,1.30,79.50,1.04,14.70,0.97
District of Columbia,53.50,1.22,76.20,1.03,14.30,1.13
Florida,52.70,0.67,78.90,0.52,14.10,0.45
Georgia,52.50,1.15,79.30,1.02,15.90,0.98
Hawaii,49.40,1.33,83.80,1.12,16.00,1.06
Idaho,48.30,1.23,82.40,0.99,11.90,0.74
Illinois,52.70,0.63,81.00,0.46,13.60,0.40
Indiana,49.60,1.16,80.90,0.91,12.60,0.82
Iowa,46.30,1.37,82.10,1.01,13.60,0.87
Kansas,44.30,1.43,79.20,0.98,12.90,0.79
Kentucky,52.90,1.37,78.70,1.05,14.60,0.98
Louisiana,49.70,1.23,76.80,1.06,14.50,0.76
Maine,55.60,1.44,82.90,0.93,16.70,0.83
Maryland,53.90,1.46,83.60,0.95,14.00,0.80
Massachusetts,55.40,1.41,81.00,1.15,14.70,0.80
Michigan,52.40,0.62,80.50,0.47,15.00,0.43
Minnesota,51.50,1.20,84.40,0.87,14.40,0.86
Mississippi,43.20,1.14,76.60,0.91,12.30,0.78
Missouri,48.70,1.20,80.30,0.90,13.70,0.12
Montana,56.40,1.16,83.70,0.95,12.10,0.68
Nebraska,45.70,1.51,83.40,0.95,12.40,0.90
Nevada,54.20,1.17,80.60,1.07,15.80,1.08
New Hampshire,56.10,1.30,83.30,0.93,12.80,0.72
New Jersey,53.20,1.45,83.70,0.95,13.00,0.41
New Mexico,57.60,1.34,78.90,1.03,12.10,0.72
New York,53.70,0.67,82.60,0.48,14.60,0.77
North Carolina,52.20,1.26,81.90,0.84,14.70,0.97
North Dakota,48.60,1.34,84.20,0.88,14.30,1.13
Ohio,50.90,0.61,82.70,0.49,14.10,0.45
Oklahoma,47.20,1.42,78.80,1.33,15.90,0.98
Oregon,54.00,1.35,80.60,1.14,16.00,1.06
Pennsylvania,53.00,0.63,79.90,0.47,11.90,0.74
Rhode Island,57.20,1.20,79.50,1.02,13.60,0.40
South Carolina,50.50,1.21,79.50,0.95,12.60,0.82
South Dakota,43.40,1.30,81.70,1.05,13.60,0.87
Tennessee,48.90,1.35,78.40,1.35,12.90,0.79
Texas,48.70,0.62,79.00,0.48,14.60,0.98
Utah,42.00,1.49,85.00,0.93,14.50,0.76
Vermont,58.70,1.24,83.70,0.84,16.70,0.83
Virginia,51.80,1.18,82.00,1.04,14.00,0.80
Washington,53.50,1.39,84.10,0.96,14.70,0.80
West Virginia,52.80,1.07,79.80,0.93,15.00,0.43
Wisconsin,49.90,1.50,83.50,1.02,14.40,0.86
Wyoming,49.20,1.29,82.00,0.85,12.30,0.78
, sep=,, row.names='State',  header=TRUE, as.is=TRUE)

# change names () to lower case

names (test1) - tolower (names (test1))

#Write a cut/quantile function to apply on different columns of the data frame

CutQuintiles - function( x) {
  cut (x,quantile (x, (0:5/5)),include.lowest=TRUE)
}

#apply the CutQuintile () on every odd-numbered columns of the test1 data 
frame
test1$newcols - sapply(test1 [, seq (1,6,2)], CutQuintiles)

# name 3 new columns based on the odd-numbered columns
names(test1$newcols) - paste (names(test1 [, seq (1,6,2)]), _cat)

dim (test1)
options (width=100)
test1




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] cut ()

2012-12-31 Thread Muhuri, Pradip (SAMHSA/CBHSQ)

Hello List,


My goal is to create a 5 category variable (p1_st_data$ob_mrj_cat), based on 
the p1_st_data$obt_mrj_p variable, using the following code for 50 States and 
District of Columbia (N=51).

p1_st_data$ob_mrj_cat - cut (p1_st_data$obt_mrj_p, quantile 
(p1_st_data$obt_mrj_p, (0:5/5), include.lowest=TRUE))

The issue is that, for Utah, I am getting an NA instead of (42,48.7] in the 
ob_mrj_cat column.

Is there a way to tweak the code (i.e., programmatically) to resolve the issue?

I would appreciate receiving your help.

Happy New Year and Best Wishes to R Expert-members, who have been so kind and 
helpful to beginner R users like me.

Thanks and regards,

Pradip Muhuri



##  console followed the reproducible example ###
 table(p1_st_data$ob_mrj_cat)

  (42,48.7] (48.7,50.9] (50.9,52.8] (52.8,54.2] (54.2,58.7]
 10  10  10  10  10

 p1_st_data [p1_st_data$state ==Utah,] [, 1:4]
   state obt_mrj_p obt_mrj_se ob_mrj_cat
45  Utah42   1.49   NA# I expected this to be (42,48.7] 
instead of NA.


### The Reproducible Example (data and code) is shown below:


#read estimates of risk factors for substances use (ages 12-17) by State 
obtained from SUDAAN output
p1_st_data -read.table (text=
Alabama,  49.60,   1.37
Alaska,  55.00,1.41
Arizona,  52.50, 1.56
Arkansas,50.50,1.22
California,51.10,0.65
Colorado,55.10,1.26
Connecticut,  56.30,1.28
Delaware,   53.60,1.30
District of Columbia,  53.50, 1.22
Florida,  52.70,   0.67
Georgia,   52.50,1.15
Hawaii, 49.40,1.33
Idaho,   48.30,1.23
Illinois,  52.70,0.63
Indiana,49.60,1.16
Iowa, 46.30,1.37
Kansas, 44.30,1.43
Kentucky,52.90,1.37
Louisiana,49.70,1.23
Maine,  55.60,1.44
Maryland,   53.90,1.46
Massachusetts,55.40,1.41
Michigan,52.40,0.62
Minnesota, 51.50,1.20
Mississippi, 43.20,1.14
Missouri, 48.70,1.20
Montana,56.40,1.16
Nebraska,   45.70,1.51
Nevada,   54.20,1.17
New Hampshire,  56.10,1.30
New Jersey,   53.20,1.45
New Mexico, 57.60,1.34
New York,   53.70,0.67
North Carolina, 52.20,1.26
North Dakota,   48.60,1.34
Ohio, 50.90,0.61
Oklahoma,  47.20,1.42
Oregon,   54.00,1.35
Pennsylvania,53.00,0.63
Rhode Island,57.20,1.20
South Carolina, 50.50,1.21
South Dakota,   43.40,1.30
Tennessee,48.90,1.35
Texas,   48.70,0.62
Utah, 42.00,1.49
Vermont,58.70,1.24
Virginia,51.80,1.18
Washington,  53.50,1.39
West Virginia,52.80,1.07
Wisconsin,  49.90,1.50
Wyoming,   49.20,1.29,
sep=  , , col.names = c(state ,   Obt_mrj_p ,  Obt_mrj_se ),
colClasses = c( character ,  numeric , numeric )
)

#change the names to lower cases
names(p1_st_data) - tolower (names(p1_st_data))

# cerate five equal-sized groups for the perceived ease of obtaining marijuana 
variable
p1_st_data$ob_mrj_cat - cut (p1_st_data$obt_mrj_p, quantile 
(p1_st_data$obt_mrj_p, (0:5/5), include.lowest=TRUE))

p1_st_data
dim (p1_st_data)
table(p1_st_data$ob_mrj_cat)
p1_st_data [p1_st_data$state ==Utah,] [, 1:4]



Pradip K. Muhuri, PhD
Statistician
Substance Abuse  Mental Health Services Administration
The Center for Behavioral Health Statistics and Quality
Division of Population Surveys
1 Choke Cherry Road, Room 2-1071
Rockville, MD 20857

Tel: 240-276-1070
Fax: 240-276-1260
e-mail: pradip.muh...@samhsa.hhs.govmailto:pradip.muh...@samhsa.hhs.gov

The Center for Behavioral Health Statistics and Quality your feedback.  Please 
click on the following link to complete a brief customer survey:   
http://cbhsqsurvey.samhsa.govhttp://cbhsqsurvey.samhsa.gov/


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] cut ()

2012-12-31 Thread Muhuri, Pradip (SAMHSA/CBHSQ)

Dear David,

Thank you so much for catching the mistake that is kind of careless.  Sorry 
about that.

Happy New Year.

Pradip

From: David L Carlson [dcarl...@tamu.edu]
Sent: Monday, December 31, 2012 6:18 PM
To: Muhuri, Pradip (SAMHSA/CBHSQ); 'R help'
Subject: RE: [R] cut ()

A misplaced right parenthesis caused the problem:

p1_st_data$ob_mrj_cat - cut (p1_st_data$obt_mrj_p, quantile
(p1_st_data$obt_mrj_p, (0:5/5), include.lowest=TRUE))

Should be

p1_st_data$ob_mrj_cat - cut (p1_st_data$obt_mrj_p, quantile
(p1_st_data$obt_mrj_p, (0:5/5)), include.lowest=TRUE)

-
David L Carlson
Associate Professor of Anthropology
Texas AM University
College Station, TX 77843-4352


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
 project.org] On Behalf Of Muhuri, Pradip (SAMHSA/CBHSQ)
 Sent: Monday, December 31, 2012 4:25 PM
 To: R help
 Subject: [R] cut ()

 Hello List,


 My goal is to create a 5 category variable (p1_st_data$ob_mrj_cat),
 based on the p1_st_data$obt_mrj_p variable, using the following code
 for 50 States and District of Columbia (N=51).

 p1_st_data$ob_mrj_cat - cut (p1_st_data$obt_mrj_p, quantile
 (p1_st_data$obt_mrj_p, (0:5/5), include.lowest=TRUE))

 The issue is that, for Utah, I am getting an NA instead of (42,48.7]
 in the ob_mrj_cat column.

 Is there a way to tweak the code (i.e., programmatically) to resolve
 the issue?

 I would appreciate receiving your help.

 Happy New Year and Best Wishes to R Expert-members, who have been so
 kind and helpful to beginner R users like me.

 Thanks and regards,

 Pradip Muhuri



 ##  console followed the reproducible example
 ###
  table(p1_st_data$ob_mrj_cat)

   (42,48.7] (48.7,50.9] (50.9,52.8] (52.8,54.2] (54.2,58.7]
  10  10  10  10  10

  p1_st_data [p1_st_data$state ==Utah,] [, 1:4]
state obt_mrj_p obt_mrj_se ob_mrj_cat
 45  Utah42   1.49   NA# I expected this to be
 (42,48.7] instead of NA.


 ### The Reproducible Example (data and code) is shown below:


 #read estimates of risk factors for substances use (ages 12-17) by
 State obtained from SUDAAN output
 p1_st_data -read.table (text=
 Alabama,  49.60,   1.37
 Alaska,  55.00,1.41
 Arizona,  52.50, 1.56
 Arkansas,50.50,1.22
 California,51.10,0.65
 Colorado,55.10,1.26
 Connecticut,  56.30,1.28
 Delaware,   53.60,1.30
 District of Columbia,  53.50, 1.22
 Florida,  52.70,   0.67
 Georgia,   52.50,1.15
 Hawaii, 49.40,1.33
 Idaho,   48.30,1.23
 Illinois,  52.70,0.63
 Indiana,49.60,1.16
 Iowa, 46.30,1.37
 Kansas, 44.30,1.43
 Kentucky,52.90,1.37
 Louisiana,49.70,1.23
 Maine,  55.60,1.44
 Maryland,   53.90,1.46
 Massachusetts,55.40,1.41
 Michigan,52.40,0.62
 Minnesota, 51.50,1.20
 Mississippi, 43.20,1.14
 Missouri, 48.70,1.20
 Montana,56.40,1.16
 Nebraska,   45.70,1.51
 Nevada,   54.20,1.17
 New Hampshire,  56.10,1.30
 New Jersey,   53.20,1.45
 New Mexico, 57.60,1.34
 New York,   53.70,0.67
 North Carolina, 52.20,1.26
 North Dakota,   48.60,1.34
 Ohio, 50.90,0.61
 Oklahoma,  47.20,1.42
 Oregon,   54.00,1.35
 Pennsylvania,53.00,0.63
 Rhode Island,57.20,1.20
 South Carolina, 50.50,1.21
 South Dakota,   43.40,1.30
 Tennessee,48.90,1.35
 Texas,   48.70,0.62
 Utah, 42.00,1.49
 Vermont,58.70,1.24
 Virginia,51.80,1.18
 Washington,  53.50,1.39
 West Virginia,52.80,1.07
 Wisconsin,  49.90,1.50
 Wyoming,   49.20,1.29,
 sep=  , , col.names = c(state ,   Obt_mrj_p ,  Obt_mrj_se ),
 colClasses = c( character ,  numeric , numeric )
 )

 #change the names to lower cases
 names(p1_st_data) - tolower (names(p1_st_data))

 # cerate five equal-sized groups for the perceived ease of obtaining
 marijuana variable
 p1_st_data$ob_mrj_cat - cut (p1_st_data$obt_mrj_p, quantile
 (p1_st_data$obt_mrj_p, (0:5/5), include.lowest=TRUE))

 p1_st_data
 dim (p1_st_data)
 table(p1_st_data$ob_mrj_cat)
 p1_st_data [p1_st_data$state ==Utah,] [, 1:4]



 Pradip K. Muhuri, PhD
 Statistician
 Substance Abuse  Mental Health Services Administration
 The Center for Behavioral Health Statistics and Quality
 Division of Population Surveys
 1 Choke Cherry Road, Room 2-1071
 Rockville, MD 20857

 Tel: 240-276-1070
 Fax: 240-276-1260
 e-mail:
 pradip.muh...@samhsa.hhs.govmailto:pradip.muh...@samhsa.hhs.gov

 The Center for Behavioral Health Statistics and Quality your feedback.
 Please click on the following link

Re: [R] cut ()

2012-12-31 Thread Muhuri, Pradip (SAMHSA/CBHSQ)

Dear Neal,

Although David's solution (putting the right parenthesis, which I had missed) 
has resolved the issue, I would like to try yours as well.

Could you please clarify the six elements:  c(-1e-8, 0, 0, 0, 0, 1e8)?

Thanks and regards,

Pradip



From: Neal H. Walfield [n...@walfield.org]
Sent: Monday, December 31, 2012 5:42 PM
To: Muhuri, Pradip (SAMHSA/CBHSQ)
Cc: R help
Subject: Re: [R] cut ()

At Mon, 31 Dec 2012 22:25:25 +,
Muhuri, Pradip (SAMHSA/CBHSQ) wrote:
 The issue is that, for Utah, I am getting an NA instead of (42,48.7] in the 
 ob_mrj_cat column.

The problem is likely due to comparisons of floating point numbers.
Try moving your lower and upper bounds out a tiny bit.  When I add

  c(-1e-8, 0, 0, 0, 0, 1e8)

to the result of quantile, I don't get any NAs.

Neal


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] format.pval () and printCoefmat ()

2012-12-15 Thread Muhuri, Pradip (SAMHSA/CBHSQ)

Dear Arun and David,

I am so grateful to you for all your help with the code.  Thanks and regards, 
Pradip


Arun - All this  is very helpful.  In general, I can follow the code. I only 
have the following questions:

 What changes in the code would be required to have 3 places after decimal for 
all numeric variables in the res data frame?

Thanks,

Pradip



### below is the display of the data from Lines1, Lines2, and res

 head (data.frame(Lines1))
 Lines1
1mean_level1 mean_level2 rel_diff p_mean cohens_d  
2 1   18.744  11.9110.574   0.000.175  
3 2   18.744  14.4550.297   0.000.110  
4 3   18.744  13.5400.384   0.000.133  
5 4   18.744   6.0022.123   0.000.333  
6 5   18.744   5.8342.213   0.000.349  
 head (data.frame(Lines2))
   Lines2
1mean_level1 mean_level2 rel_diff p_mean cohens_d
2 1   18.744  11.9110.574   0.000.175
3 2   18.744  14.4550.297   0.000.110
4 3   18.744  13.5400.384   0.000.133
5 4   18.744   6.0022.123   0.000.333
6 5   18.744   5.8342.213   0.000.349
 head (res)
  contrast_level1 contrast_level2 mean_level1 mean_level2 rel_diff p_mean 
cohens_d
1  wh2+hi18.7   11.910.574  0
0.175
2  wh2+rc18.7   14.460.297  0
0.110
3  whaian18.7   13.540.384  0
0.133
4  whasan18.76.002.123  0
0.333
5  whblck18.75.832.213  0
0.349
6  whcsam18.77.931.363  0
0.279







From: arun [smartpink...@yahoo.com]
Sent: Friday, December 14, 2012 10:12 PM
To: Muhuri, Pradip (SAMHSA/CBHSQ)
Cc: R help; David Winsemius
Subject: Re: [R] format.pval () and printCoefmat ()

Hi Pradip,

May be this helps:
dat1-read.table(text=
 contrast_level1 contrast_level2 mean_level1 mean_level2 rel_diffp_mean 
cohens_d
1  wh2+hi18.7  11.910.574  1.64e-05  
0.1753
2  wh2+rc18.7  14.460.297  9.24e-06  
0.1101
3  whaian18.7  13.540.384  9.01e-05  
0.1335
4  whasan18.76.002.123 2.20e-119  
0.3326
5  whblck18.75.832.213  0.00e+00  
0.3490
6  whcsam18.77.931.363  1.27e-47  
0.2793
7  whcub18.7  10.850.728  6.12e-08  
0.2025
8  whdmcn18.77.131.629  1.59e-15  
0.2981
9  whhisp18.79.720.928 3.27e-125  
0.2420
10  whmex18.79.600.952 8.81e-103  
0.2420
11  whnhpi18.7  16.140.162  1.74e-01  
0.0669
12  whothh18.7  NA  NANA
  NA
13  wh  pr18.7  10.470.791  3.64e-23  
0.2131
14  whspn18.7  15.150.237  1.58e-02  
0.0922
,sep=,header=TRUE,stringsAsFactors=FALSE)
 
Lines1-capture.output(printCoefmat(dat1[,-c(1:2)],has.Pvalue=TRUE,eps.Pvalue=0.001))
Lines2-gsub(\\s+$,,gsub(\\.$,,Lines1[1:15]))
res-data.frame(dat1[,1:2],read.table(text=Lines2,header=TRUE))
#or
# res-cbind(dat1[,1:2],read.table(text=Lines2,header=TRUE))


 res
#   contrast_level1 contrast_level2 mean_level1 mean_level2 rel_diff p_mean
#1   wh2+hi18.7   11.910.574 0.
#2   wh2+rc18.7   14.460.297 0.
#3   whaian18.7   13.540.384 0.0001
-

--

# cohens_d
#10.1753
#20.1101
#30.1335
-
-

 str(res)
#'data.frame':14 obs. of  7 variables:
# $ contrast_level1: chr  wh wh wh wh ...
# $ contrast_level2: chr  2+hi 2+rc aian asan ...
# $ mean_level1: num  18.7 18.7 18.7 18.7 18.7 18.7 18.7 18.7 18.7 18.7 ...
# $ mean_level2: num  11.91 14.46 13.54 6 5.83 ...
# $ rel_diff   : num  0.574 0.297 0.384 2.123 2.213 ...
# $ p_mean : num  0e+00 0e+00 1e-04 0e+00 0e+00 0e+00 0e+00 0e+00 0e+00 
0e+00 ...
# $ cohens_d   : num  0.175 0.11 0.134 0.333 0.349 ...


A.K.

- Original Message -
From: Muhuri, Pradip (SAMHSA/CBHSQ) pradip.muh...@samhsa.hhs.gov
To: 'David Winsemius' dwinsem...@comcast.net
Cc: R help r-help@r-project.org
Sent

Re: [R] format.pval () and printCoefmat ()

2012-12-15 Thread Muhuri, Pradip (SAMHSA/CBHSQ)

Hi Arun,

Thank you so much for further clarifications and help.

Pradip

Pradip K. Muhuri, PhD
Statistician
Substance Abuse  Mental Health Services Administration
The Center for Behavioral Health Statistics and Quality
Division of Population Surveys
1 Choke Cherry Road, Room 2-1071
Rockville, MD 20857

Tel: 240-276-1070
Fax: 240-276-1260
e-mail: pradip.muh...@samhsa.hhs.gov

The Center for Behavioral Health Statistics and Quality your feedback.  Please 
click on the following link to complete a brief customer survey:   
http://cbhsqsurvey.samhsa.gov


-Original Message-
From: arun [mailto:smartpink...@yahoo.com]
Sent: Saturday, December 15, 2012 11:04 AM
To: Muhuri, Pradip (SAMHSA/CBHSQ)
Cc: R help; David Winsemius
Subject: Re: [R] format.pval () and printCoefmat ()

Hi Pradip,

It this is just formatting issue, it is possible to do that with ?formatC() or 
?sprintf(), but it may change those variables from numeric to character.
One possibilty from `res`:
res-data.frame(dat1[,1:2],read.table(text=Lines2,header=TRUE))

varsNum-sapply(res,is.numeric)
res[varsNum]-lapply(res[varsNum],round,digits=3)
#Here, the numeric columns with digits3 are not changed, but the ones with 3 
were all changed to digits3.

As I mentioned, sprintf() changes the number of digits
 as.data.frame(do.call(cbind,lapply(res[varsNum],function(x) 
sprintf(%.3f,x
#   mean_level1 mean_level2 rel_diff p_mean cohens_d
#1   18.700  11.9100.574  0.0000.175
#2   18.700  14.4600.297  0.0000.110
#3   18.700  13.5400.384  0.0000.134

A.K.





- Original Message -
From: Muhuri, Pradip (SAMHSA/CBHSQ) pradip.muh...@samhsa.hhs.gov
To: arun smartpink...@yahoo.com
Cc: R help r-help@r-project.org; David Winsemius dwinsem...@comcast.net
Sent: Saturday, December 15, 2012 10:12 AM
Subject: RE: [R] format.pval () and printCoefmat ()

Dear Arun and David,

I am so grateful to you for all your help with the code.  Thanks and regards, 
Pradip


Arun - All this  is very helpful.  In general, I can follow the code. I only 
have the following questions:

What changes in the code would be required to have 3 places after decimal for 
all numeric variables in the res data frame?

Thanks,

Pradip



### below is the display of the data from Lines1, Lines2, and res

 head (data.frame(Lines1))
 Lines1
1mean_level1 mean_level2 rel_diff p_mean cohens_d
2 1   18.744  11.9110.574   0.000.175
3 2   18.744  14.4550.297   0.000.110
4 3   18.744  13.5400.384   0.000.133
5 4   18.744   6.0022.123   0.000.333
6 5   18.744   5.8342.213   0.000.349
 head (data.frame(Lines2))
   Lines2
1mean_level1 mean_level2 rel_diff p_mean cohens_d
2 1   18.744  11.9110.574   0.000.175
3 2   18.744  14.4550.297   0.000.110
4 3   18.744  13.5400.384   0.000.133
5 4   18.744   6.0022.123   0.000.333
6 5   18.744   5.8342.213   0.000.349
 head (res)
  contrast_level1 contrast_level2 mean_level1 mean_level2 rel_diff p_mean 
cohens_d
1  wh2+hi18.7   11.910.574  0
0.175
2  wh2+rc18.7   14.460.297  0
0.110
3  whaian18.7   13.540.384  0
0.133
4  whasan18.76.002.123  0
0.333
5  whblck18.75.832.213  0
0.349
6  whcsam18.77.931.363  0
0.279







From: arun [smartpink...@yahoo.com]
Sent: Friday, December 14, 2012 10:12 PM
To: Muhuri, Pradip (SAMHSA/CBHSQ)
Cc: R help; David Winsemius
Subject: Re: [R] format.pval () and printCoefmat ()

Hi Pradip,

May be this helps:
dat1-read.table(text=
contrast_level1 contrast_level2 mean_level1 mean_level2 rel_diffp_mean 
cohens_d
1  wh2+hi18.7  11.910.574  1.64e-05  
0.1753
2  wh2+rc18.7  14.460.297  9.24e-06  
0.1101
3  whaian18.7  13.540.384  9.01e-05  
0.1335
4  whasan18.76.002.123 2.20e-119  
0.3326
5  whblck18.75.832.213  0.00e+00  
0.3490
6  whcsam18.77.931.363  1.27e-47  
0.2793
7  whcub18.7  10.850.728  6.12e-08  
0.2025
8  whdmcn18.77.131.629  1.59e-15  
0.2981
9  whhisp18.79.720.928 3.27e-125  
0.2420
10  whmex18.79.600.952 8.81e-103  
0.2420
11  whnhpi

[R] format.pval () and printCoefmat ()

2012-12-14 Thread Muhuri, Pradip (SAMHSA/CBHSQ)

Hi List,

My goal is to force R not to print in scientific notation in the sixth column 
(rel_diff - for the p-value) of my data frame (not a matrix).

I have used the format.pval () and printCoefmat () functions on the data frame. 
The R script is appended below.

This issue is that use of the format.pval () and printCoefmat () functions on 
the data frame gives me the desired results, but coerces the character string 
into NAs for the two character variables, because my object is a data frame, 
not a matrix. Please see the first output below: contrast_level1 
contrast_level2).

Is there a way I could have avoid printing the NAs in the character fields when 
using the format.pval () and printCoefmat () on the data frame?

I would appreciate receiving your help.

Thanks,

Pradip
setwd (F:/PR1/R_PR1)

load (file = sigtests_overall_withid.rdata)

#format.pval(tt$p.value, eps=0.0001)

# keep only selected columns from the above data frame
keep_cols1 - c(contrast_level1, contrast_level2,mean_level1,
mean_level2, rel_diff,
  p_mean, cohens_d)

#subset the data frame
y0410_1825_mf_alc - subset (sigtests_overall_withid,
  years==0410  age_group==1825
   gender_group==all  drug==alc
   contrast_level1==wh,
  select=keep_cols1)
#change the row.names
row.names (y0410_1825_mf_alc)= 1:dim(y0410_1825_mf_alc)[1]

#force
format.pval(y0410_1825_mf_alc$p_mean, eps=0.0001)

#print the observations from the sub-data frame
options (width=120,digits=3 )
#y0410_1825_mf_alc

printCoefmat(y0410_1825_mf_alc, has.Pvalue=TRUE, eps.Pvalue=0.0001)

### When format.pval () and printCoefmat () used


contrast_level1 contrast_level2 mean_level1 mean_level2 rel_diff p_mean cohens_d

1   NA  NA  18.744  11.9110.574   0.00
0.175

2   NA  NA  18.744  14.4550.297   0.00
0.110

3   NA  NA  18.744  13.5400.384   0.00
0.133

4   NA  NA  18.744   6.0022.123   0.00
0.333

5   NA  NA  18.744   5.8342.213   0.00
0.349

6   NA  NA  18.744   7.9331.363   0.00
0.279

7   NA  NA  18.744  10.8490.728   0.00
0.203

8   NA  NA  18.744   7.1301.629   0.00
0.298

9   NA  NA  18.744   9.7200.928   0.00
0.242

10  NA  NA  18.744   9.6000.952   0.00
0.242

11  NA  NA  18.744  16.1350.162   0.17
0.067 .

12  NA  NA  18.744  NA   NA NA  
 NA

13  NA  NA  18.744  10.4650.791   0.00
0.213

14  NA  NA  18.744  15.1490.237   0.02
0.092 .

---

Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Warning messages:

1: In data.matrix(x) : NAs introduced by coercion

2: In data.matrix(x) : NAs introduced by coercion




### When format.pval () and printCoefmat () not used

contrast_level1 contrast_level2 mean_level1 mean_level2 rel_diffp_mean 
cohens_d
1   wh2+hi18.7   11.910.574  1.64e-05   
0.1753
2   wh2+rc18.7   14.460.297  9.24e-06   
0.1101
3   whaian18.7   13.540.384  9.01e-05   
0.1335
4   whasan18.76.002.123 2.20e-119   
0.3326
5   whblck18.75.832.213  0.00e+00   
0.3490
6   whcsam18.77.931.363  1.27e-47   
0.2793
7   wh cub18.7   10.850.728  6.12e-08   
0.2025
8   whdmcn18.77.131.629  1.59e-15   
0.2981
9   whhisp18.79.720.928 3.27e-125   
0.2420
10  wh mex18.79.600.952 8.81e-103   
0.2420
11  whnhpi18.7   16.140.162  1.74e-01   
0.0669
12  whothh18.7  NA   NANA   
NA
13  wh  pr18.7   10.470.791  3.64e-23   
0.2131
14  wh spn18.7   15.150.237  1.58e-02   
0.0922



Pradip K. Muhuri, PhD
Statistician
Substance Abuse  Mental Health Services Administration
The Center for Behavioral Health Statistics and Quality
Division of Population Surveys
1 Choke Cherry Road, Room 2-1071
Rockville, MD 20857

Tel: 240-276-1070
Fax: 240-276-1260
e-mail: pradip.muh...@samhsa.hhs.govmailto:pradip.muh...@samhsa.hhs.gov

The Center for Behavioral Health

Re: [R] format.pval () and printCoefmat ()

2012-12-14 Thread Muhuri, Pradip (SAMHSA/CBHSQ)

Hi David,

Thank you so much for helping me with the code.

Your suggested code gives me the following results. Please see below. I don't 
understand why I am getting two  blocks of prints (5 columns, and then 7 
columns), with some columns repeated.

Regards,

Pradip
#

 cbind(  y0410_1825_mf_alc[ 1:2],  
+ printCoefmat(y0410_1825_mf_alc[ -(1:2) ], has.Pvalue=TRUE, 
eps.Pvalue=0.0001)
+ )
   mean_level1 mean_level2 rel_diff p_mean cohens_d  
1   18.744  11.9110.574   0.000.175  
2   18.744  14.4550.297   0.000.110  
3   18.744  13.5400.384   0.000.133  
4   18.744   6.0022.123   0.000.333  
5   18.744   5.8342.213   0.000.349  
6   18.744   7.9331.363   0.000.279  
7   18.744  10.8490.728   0.000.203  
8   18.744   7.1301.629   0.000.298  
9   18.744   9.7200.928   0.000.242  
10  18.744   9.6000.952   0.000.242  
11  18.744  16.1350.162   0.170.067 .
12  18.744  NA   NA NA   NA  
13  18.744  10.4650.791   0.000.213  
14  18.744  15.1490.237   0.020.092 .
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
   contrast_level1 contrast_level2 mean_level1 mean_level2 rel_diffp_mean 
cohens_d
1   wh2+hi18.7   11.910.574  1.64e-05   
0.1753
2   wh2+rc18.7   14.460.297  9.24e-06   
0.1101
3   whaian18.7   13.540.384  9.01e-05   
0.1335
4   whasan18.76.002.123 2.20e-119   
0.3326
5   whblck18.75.832.213  0.00e+00   
0.3490
6   whcsam18.77.931.363  1.27e-47   
0.2793
7   wh cub18.7   10.850.728  6.12e-08   
0.2025
8   whdmcn18.77.131.629  1.59e-15   
0.2981
9   whhisp18.79.720.928 3.27e-125   
0.2420
10  wh mex18.79.600.952 8.81e-103   
0.2420
11  whnhpi18.7   16.140.162  1.74e-01   
0.0669
12  whothh18.7  NA   NANA   
NA
13  wh  pr18.7   10.470.791  3.64e-23   
0.2131
14  wh spn18.7   15.150.237  1.58e-02   
0.0922

Pradip K. Muhuri, PhD
Statistician
Substance Abuse  Mental Health Services Administration
The Center for Behavioral Health Statistics and Quality
Division of Population Surveys
1 Choke Cherry Road, Room 2-1071
Rockville, MD 20857
 
Tel: 240-276-1070
Fax: 240-276-1260
e-mail: pradip.muh...@samhsa.hhs.gov
 
The Center for Behavioral Health Statistics and Quality your feedback.  Please 
click on the following link to complete a brief customer survey:   
http://cbhsqsurvey.samhsa.gov

-Original Message-
From: David Winsemius [mailto:dwinsem...@comcast.net] 
Sent: Friday, December 14, 2012 3:22 PM
To: Muhuri, Pradip (SAMHSA/CBHSQ)
Cc: R help
Subject: Re: [R] format.pval () and printCoefmat ()


On Dec 14, 2012, at 11:48 AM, Muhuri, Pradip (SAMHSA/CBHSQ) wrote:

 Hi List,
 
 My goal is to force R not to print in scientific notation in the sixth column 
 (rel_diff - for the p-value) of my data frame (not a matrix).
 
 I have used the format.pval () and printCoefmat () functions on the data 
 frame. The R script is appended below.
 
 This issue is that use of the format.pval () and printCoefmat () functions on 
 the data frame gives me the desired results, but coerces the character string 
 into NAs for the two character variables, because my object is a data frame, 
 not a matrix. Please see the first output below: contrast_level1 
 contrast_level2).
 
 Is there a way I could have avoid printing the NAs in the character fields

They are probably factor columns.

 when using the format.pval () and printCoefmat () on the data frame?
 
 I would appreciate receiving your help.
 
 Thanks,
 
 Pradip
 setwd (F:/PR1/R_PR1)
 
 load (file = sigtests_overall_withid.rdata)
 
 #format.pval(tt$p.value, eps=0.0001)
 
 # keep only selected columns from the above data frame
 keep_cols1 - c(contrast_level1, contrast_level2,mean_level1,
mean_level2, rel_diff,
  p_mean, cohens_d)
 
 #subset the data frame
 y0410_1825_mf_alc - subset (sigtests_overall_withid,
  years==0410  age_group==1825
   gender_group==all  drug==alc
   contrast_level1==wh,
  select=keep_cols1)
 #change the row.names
 row.names (y0410_1825_mf_alc)= 1:dim(y0410_1825_mf_alc)[1]
 
 #force
 format.pval(y0410_1825_mf_alc$p_mean, eps=0.0001)

Presumably

[R] read.table()

2012-12-08 Thread Muhuri, Pradip (SAMHSA/CBHSQ)


Hi List,

I have spent more than 30 minutes, but failed to read in this file using the 
read.table() function. I could not figure out how to fix the following error.

Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  :   
line 1 did not have 6 elements

Any help would be be appreciated.

Thanks,

Pradip Muhuri


### below is the  reproducible example
xd1 -  raceage   percent  sepercent  flag_var
 Mexican 12-17  5.7926   0.64195  any
Puerto Rican 12-17  5.1975   0.24929  any
   Cuban 12-17  3.7977   1.00487  any
C-S American 12-17  4.3665   0.55329  any
   Dominican 12-17  1.8149   0.46677  any
 Spanish (Spain) 12-17  6.1971   0.98386  any
  Multi Hisp Eth 12-17  6.7006   1.12464  any
NH White 12-17  4.8442   0.08660  any
NH Black 12-17  3.6943   0.16045  any
NH AM-AK 12-17  9.6325   1.06100  any
   NH HI-OPI 12-17  3.9189   1.08047  any
NH Asian 12-17  1.9115   0.28432  any
  NH Multiracial 12-17  6.4255   0.51434  any
  Mexican 18-25  8.9284   0.73022  any
 Puerto Rican 18-25  6.1364   0.28394  any
Cuban 18-25  8.6782   1.45543  any
 C-S American 18-25  5.9360   0.59899  any
Dominican 18-25  7.7642   1.64553  any
  Spanish (Spain) 18-25  9.2632   1.15652  any
   Multi Hisp Eth 18-25 11.3566   1.79282  any
 NH White 18-25  8.6484   0.11866  any
 NH Black 18-25  7.5972   0.24926  any
 NH AM-AK 18-25 13.5041   1.57275  any
NH HI-OPI 18-25  8.0227   1.41348  any
 NH Asian 18-25  3.2701   0.32414  any
   NH Multiracial 18-25 10.6489   0.85105  any
  Mexican   26+  3.2110   0.51683  any
 Puerto Rican   26+  1.6273   0.15033  any
Cuban   26+  1.4419   0.44118  any
 C-S American   26+  1.0187   0.26594  any
Dominican   26+  0.9554   0.50275  any
  Spanish (Spain)   26+  2.5976   0.86230  any
   Multi Hisp Eth   26+  1.1345   0.66375  any
 NH White   26+  1.5510   0.04156  any
 NH Black   26+  2.8763   0.15133  any
 NH AM-AK   26+  3.9674   0.76611  any
NH HI-OPI   26+  1.2919   0.66205  any
 NH Asian   26+  0.7207   0.13870  any
   NH Multiracial   26+  3.0668   0.52334  any
  Mexican 12-17  4.3152   0.53235  mrj
 Puerto Rican 12-17  3.7237   0.20969  mrj
Cuban 12-17  2.0616   0.67248  mrj
 C-S American 12-17  3.3282   0.47392  mrj
Dominican 12-17  1.3797   0.40435  mrj
  Spanish (Spain) 12-17  5.1810   0.93979  mrj
   Multi Hisp Eth 12-17  4.8915   0.94816  mrj
 NH White 12-17  3.6190   0.07379  mrj
 NH Black 12-17  2.8196   0.14042  mrj
 NH AM-AK 12-17  6.5091   0.85124  mrj
NH HI-OPI 12-17  3.6267   1.06724  mrj
 NH Asian 12-17  1.3162   0.23575  mrj
   NH Multiracial 12-17  5.0657   0.49614  mrj
  Mexican 18-25  7.3802   0.67992  mrj
 Puerto Rican 18-25  4.3260   0.24191  mrj
Cuban 18-25  6.1433   1.19242  mrj
 C-S American 18-25  3.9166   0.51272  mrj
Dominican 18-25  5.8000   1.24097  mrj
  Spanish (Spain) 18-25  6.8646   1.01387  mrj
   Multi Hisp Eth 18-25 10.1134   1.75013  mrj
 NH White 18-25  5.8656   0.10100  mrj
 NH Black 18-25  6.6869   0.23643  mrj
 NH AM-AK 18-25 11.2989   1.51687  mrj
NH HI-OPI 18-25  5.6302   1.14561  mrj
 NH Asian 18-25  2.3418   0.28309  mrj
   NH Multiracial 18-25  8.2696   0.77139  mrj
  Mexican   26+  1.1658   0.33967  mrj
 Puerto Rican   26+  0.6757   0.09329  mrj
Cuban   26+  0.6653   0.31239  mrj
 C-S American   26+  0.3177   0.17604  mrj
Dominican   26+  0.5616   0.39780  mrj
  Spanish (Spain)   26+  1.8078   0.82590  mrj
   Multi Hisp Eth   26+  0.8468   0.63529  mrj
 NH White   26+  0.6915   0.02791  mrj
 NH Black   26+  1.5675   0.12031  mrj
 NH AM-AK   26+  1.7273   0.37673  mrj
NH HI-OPI   26+  0.0356   0.03535  mrj
 NH Asian   26+  0.2687   0.07564  mrj
   NH Multiracial   26+  1.3419   0.30074  mrj
  Mexican 12-17  1.2074   0.36082  anl
 Puerto Rican 12-17  1.0772   0.11547  anl
Cuban 12-17  1.2569   0.67109  anl
 C-S American 12-17  0.6213   0.22726  anl
Dominican 12-17  0.1412   0.08552  anl
  Spanish (Spain) 12-17  0.9625   0.25453  anl
   Multi Hisp Eth 12-17  1.2863   0.43909  anl
 NH White 12-17  1.1490   0.04289  anl
 NH Black 12-17  0.5932   0.06220  anl
 NH AM-AK 12-17  1.9117   0.50122  anl
NH HI-OPI 12-17  0.3833   0.20240  anl
 NH Asian 12-17  0.4782

Re: [R] read. table()

2012-12-08 Thread Muhuri, Pradip (SAMHSA/CBHSQ)

Dear Prof Ripley,

Your hint is helpful, and I see considerable improvements in the results.

The only issue is that the column names do not seem to be correct.  I did not 
understand part of your comment, which says fortunes::fortune(14) applies 
although I read about the double colon operator- ns-dblcolon {base}.

Could you please provide a little more hint for me to resolve the issue?

Thanks and regards,

# new result 
 agerace - read.delim(textConnection(xd1), sep=\t,  header=TRUE, as.is=TRUE)
 names(agerace)
[1] raceage...percent..sepercent..flag_var
 head(agerace)
 raceage...percent..sepercent..flag_var
1  Mexican 12-17  5.7926   0.64195  any
2 Puerto Rican 12-17  5.1975   0.24929  any
3Cuban 12-17  3.7977   1.00487  any
4 C-S American 12-17  4.3665   0.55329  any
5Dominican 12-17  1.8149   0.46677  any
6  Spanish (Spain) 12-17  6.1971   0.98386  any


Pradip K. Muhuri, PhD
Statistician
Substance Abuse  Mental Health Services Administration
The Center for Behavioral Health Statistics and Quality
Division of Population Surveys
1 Choke Cherry Road, Room 2-1071
Rockville, MD 20857
 
Tel: 240-276-1070
Fax: 240-276-1260
e-mail: pradip.muh...@samhsa.hhs.gov
 
The Center for Behavioral Health Statistics and Quality your feedback.  Please 
click on the following link to complete a brief customer survey:   
http://cbhsqsurvey.samhsa.gov


-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Prof Brian Ripley
Sent: Saturday, December 08, 2012 2:29 PM
To: r-help@r-project.org
Subject: Re: [R] read.table()

On 08/12/2012 19:10, Muhuri, Pradip (SAMHSA/CBHSQ) wrote:

 Hi List,

 I have spent more than 30 minutes, but failed to read in this file using the 
 read.table() function. I could not figure out how to fix the following error.

Well, we have a whole manual on this, mentioned on ?read.table (see See 
Also)  Have you read it?  fortunes::fortune(14) applies.

The issue is what the separator is.  You have specified whitespace, and 
that is not correct.  The original might have had tabs (see ?read.delim) 
but as pasted into this email only a human can disentangle this file.

 Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  : 
   line 1 did not have 6 elements

 Any help would be be appreciated.

 Thanks,

 Pradip Muhuri


 ### below is the  reproducible example
 xd1 -  raceage   percent  sepercent  flag_var
   Mexican 12-17  5.7926   0.64195  any
  Puerto Rican 12-17  5.1975   0.24929  any
 Cuban 12-17  3.7977   1.00487  any
  C-S American 12-17  4.3665   0.55329  any
 Dominican 12-17  1.8149   0.46677  any
   Spanish (Spain) 12-17  6.1971   0.98386  any
Multi Hisp Eth 12-17  6.7006   1.12464  any
  NH White 12-17  4.8442   0.08660  any
  NH Black 12-17  3.6943   0.16045  any
  NH AM-AK 12-17  9.6325   1.06100  any
 NH HI-OPI 12-17  3.9189   1.08047  any
  NH Asian 12-17  1.9115   0.28432  any
NH Multiracial 12-17  6.4255   0.51434  any
Mexican 18-25  8.9284   0.73022  any
   Puerto Rican 18-25  6.1364   0.28394  any
  Cuban 18-25  8.6782   1.45543  any
   C-S American 18-25  5.9360   0.59899  any
  Dominican 18-25  7.7642   1.64553  any
Spanish (Spain) 18-25  9.2632   1.15652  any
 Multi Hisp Eth 18-25 11.3566   1.79282  any
   NH White 18-25  8.6484   0.11866  any
   NH Black 18-25  7.5972   0.24926  any
   NH AM-AK 18-25 13.5041   1.57275  any
  NH HI-OPI 18-25  8.0227   1.41348  any
   NH Asian 18-25  3.2701   0.32414  any
 NH Multiracial 18-25 10.6489   0.85105  any
Mexican   26+  3.2110   0.51683  any
   Puerto Rican   26+  1.6273   0.15033  any
  Cuban   26+  1.4419   0.44118  any
   C-S American   26+  1.0187   0.26594  any
  Dominican   26+  0.9554   0.50275  any
Spanish (Spain)   26+  2.5976   0.86230  any
 Multi Hisp Eth   26+  1.1345   0.66375  any
   NH White   26+  1.5510   0.04156  any
   NH Black   26+  2.8763   0.15133  any
   NH AM-AK   26+  3.9674   0.76611  any
  NH HI-OPI   26+  1.2919   0.66205  any
   NH Asian   26+  0.7207   0.13870  any
 NH Multiracial   26+  3.0668   0.52334  any
Mexican 12-17  4.3152   0.53235  mrj
   Puerto Rican 12-17  3.7237   0.20969  mrj
  Cuban 12-17  2.0616   0.67248  mrj
   C-S American 12-17  3.3282   0.47392  mrj
  Dominican 12-17  1.3797   0.40435  mrj
Spanish (Spain) 12-17  5.1810   0.93979  mrj
 Multi Hisp Eth 12-17  4.8915   0.94816  mrj
   NH White 12-17  3.6190

Re: [R] read. table()

2012-12-08 Thread Muhuri, Pradip (SAMHSA/CBHSQ)

Dear Arun,

The issue is that the column names are incorrect.  I will also look into the 
comment by Prof Ripley.

Thanks for your continued support and help.

Pradip

 str(read.delim(textConnection(xd1),header=TRUE,sep=\t))
'data.frame':   195 obs. of  1 variable:
 $ raceage...percent..sepercent..flag_var: Factor w/ 195 levels
 Cuban   26+  0.6653   0.31239  mrj,..: 27 148 13 140 108 193 169 100 85 
67 ...
 names(agerace)
[1] raceage...percent..sepercent..flag_var
 head(agerace)
 raceage...percent..sepercent..flag_var
1  Mexican 12-17  5.7926   0.64195  any
2 Puerto Rican 12-17  5.1975   0.24929  any
3Cuban 12-17  3.7977   1.00487  any
4 C-S American 12-17  4.3665   0.55329  any
5Dominican 12-17  1.8149   0.46677  any
6  Spanish (Spain) 12-17  6.1971   0.98386  any

Pradip K. Muhuri, PhD
Statistician
Substance Abuse  Mental Health Services Administration
The Center for Behavioral Health Statistics and Quality
Division of Population Surveys
1 Choke Cherry Road, Room 2-1071
Rockville, MD 20857

Tel: 240-276-1070
Fax: 240-276-1260
e-mail: pradip.muh...@samhsa.hhs.gov

The Center for Behavioral Health Statistics and Quality your feedback.  Please 
click on the following link to complete a brief customer survey:   
http://cbhsqsurvey.samhsa.gov


-Original Message-
From: arun [mailto:smartpink...@yahoo.com]
Sent: Saturday, December 08, 2012 5:13 PM
To: Muhuri, Pradip (SAMHSA/CBHSQ)
Cc: David L Carlson; R help
Subject: Re: [R] read. table()



Hi,

You can check the str()
I assume it will be like this:
 str(read.delim(textConnection(Lines),header=TRUE,sep=\t))
#'data.frame':195 obs. of  1 variable:
# $ raceage...percent..sepercent..flag_var: Factor w/ 195 levels C-S 
American 12-17  0.2399   0.15804  coc,..: 50 170 20 5 35 185 65 155 110 80 
...

A.K.




- Original Message -
From: Muhuri, Pradip (SAMHSA/CBHSQ) pradip.muh...@samhsa.hhs.gov
To: 'Prof Brian Ripley' rip...@stats.ox.ac.uk; r-help@r-project.org 
r-help@r-project.org
Cc:
Sent: Saturday, December 8, 2012 5:05 PM
Subject: Re: [R] read. table()

Dear Prof Ripley,

Your hint is helpful, and I see considerable improvements in the results.

The only issue is that the column names do not seem to be correct.  I did not 
understand part of your comment, which says fortunes::fortune(14) applies 
although I read about the double colon operator- ns-dblcolon {base}.

Could you please provide a little more hint for me to resolve the issue?

Thanks and regards,

# new result 
 agerace - read.delim(textConnection(xd1), sep=\t,  header=TRUE, as.is=TRUE)
 names(agerace)
[1] raceage...percent..sepercent..flag_var
 head(agerace)
 raceage...percent..sepercent..flag_var
1  Mexican 12-17  5.7926   0.64195  any
2 Puerto Rican 12-17  5.1975   0.24929  any
3Cuban 12-17  3.7977   1.00487  any
4 C-S American 12-17  4.3665   0.55329  any
5Dominican 12-17  1.8149   0.46677  any
6  Spanish (Spain) 12-17  6.1971   0.98386  any


Pradip K. Muhuri, PhD
Statistician
Substance Abuse  Mental Health Services Administration
The Center for Behavioral Health Statistics and Quality
Division of Population Surveys
1 Choke Cherry Road, Room 2-1071
Rockville, MD 20857

Tel: 240-276-1070
Fax: 240-276-1260
e-mail: pradip.muh...@samhsa.hhs.gov

The Center for Behavioral Health Statistics and Quality your feedback.  Please 
click on the following link to complete a brief customer survey:   
http://cbhsqsurvey.samhsa.gov


-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Prof Brian Ripley
Sent: Saturday, December 08, 2012 2:29 PM
To: r-help@r-project.org
Subject: Re: [R] read.table()

On 08/12/2012 19:10, Muhuri, Pradip (SAMHSA/CBHSQ) wrote:

 Hi List,

 I have spent more than 30 minutes, but failed to read in this file using the 
 read.table() function. I could not figure out how to fix the following error.

Well, we have a whole manual on this, mentioned on ?read.table (see See
Also)  Have you read it?  fortunes::fortune(14) applies.

The issue is what the separator is.  You have specified whitespace, and
that is not correct.  The original might have had tabs (see ?read.delim)
but as pasted into this email only a human can disentangle this file.

 Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  : 
   line 1 did not have 6 elements

 Any help would be be appreciated.

 Thanks,

 Pradip Muhuri


 ### below is the  reproducible example
 xd1 -  raceage   percent  sepercent  flag_var
   Mexican 12-17  5.7926   0.64195  any
  Puerto Rican 12-17  5.1975   0.24929  any
 Cuban 12-17  3.7977   1.00487  any
  C-S American 12-17  4.3665   0.55329  any
 Dominican 12-17  1.8149   0.46677  any
   Spanish (Spain) 12-17  6.1971   0.98386

Re: [R] read. table()

2012-12-08 Thread Muhuri, Pradip (SAMHSA/CBHSQ)

Dear David and Arun,

Thank you very much for your time and efforts and for resolving the issue. 
From this exchange, I have learned something new about reading the data files 
into R.

Regards,

Pradip


Pradip K. Muhuri, PhD
Statistician
Substance Abuse  Mental Health Services Administration
The Center for Behavioral Health Statistics and Quality
Division of Population Surveys
1 Choke Cherry Road, Room 2-1071
Rockville, MD 20857

Tel: 240-276-1070
Fax: 240-276-1260
e-mail: pradip.muh...@samhsa.hhs.gov

The Center for Behavioral Health Statistics and Quality your feedback.  Please 
click on the following link to complete a brief customer survey:   
http://cbhsqsurvey.samhsa.gov

-Original Message-
From: arun [mailto:smartpink...@yahoo.com]
Sent: Saturday, December 08, 2012 8:45 PM
To: Muhuri, Pradip (SAMHSA/CBHSQ)
Cc: dcarl...@tamu.edu; R help
Subject: Re: [R] read. table()

Hi,

David's method is much better than mine.
Regarding the spaces in the race field, this should preserve them if you wish 
to try my method.
source(Muhuri.txt)
Lines1-readLines(textConnection(Lines))

 Col1new-gsub( 
+$,,gsub(\\s+(\\D+)[[:digit:]]+\\+.*,\\1,gsub(\\s+(\\D+)[[:digit:]]+\\-.*,\\1,Lines1[-1])))
 #changed

 
Col2-gsub(\\s+\\D+([[:digit:]]+\\+.*),\\1,gsub(\\s+\\D+([[:digit:]]+\\-.*),\\1,Lines1[-1]))
 
dat1-data.frame(Col1new,read.table(text=Col2,stringsAsFactors=FALSE,sep=),stringsAsFactors=FALSE)
 heading-unlist(strsplit(Lines1[1], ))
 colnames(dat1)-heading[heading!=]


head(dat1)
# race   age percent sepercent flag_var
#1 Mexican 12-17  5.7926   0.64195  any
#2Puerto Rican 12-17  5.1975   0.24929  any
#3   Cuban 12-17  3.7977   1.00487  any
#4C-S American 12-17  4.3665   0.55329  any
#5   Dominican 12-17  1.8149   0.46677  any
#6 Spanish (Spain) 12-17  6.1971   0.98386  any
 str(dat1)
#'data.frame':195 obs. of  5 variables:
# $ race : chr  Mexican Puerto Rican Cuban C-S American ...
# $ age  : chr  12-17 12-17 12-17 12-17 ...
# $ percent  : num  5.79 5.2 3.8 4.37 1.81 ...
# $ sepercent: num  0.642 0.249 1.005 0.553 0.467 ...
# $ flag_var : chr  any any any any ...


A.K.



- Original Message -
From: David L Carlson dcarl...@tamu.edu
To: 'arun' smartpink...@yahoo.com; 'Muhuri, Pradip (SAMHSA/CBHSQ)' 
pradip.muh...@samhsa.hhs.gov
Cc: 'R help' r-help@r-project.org
Sent: Saturday, December 8, 2012 8:06 PM
Subject: RE: [R] read. table()

Arun's solution works but you lose your spaces in the race field. These
commands will preserve them. We need to make sure that your file has two or
more spaces between each field. The first gsub() command strips leading
space. The second inserts a space before the digit 1 (that is where all the
fields separated by a single space are). Then we convert two or more spaces
to a comma. Finally you can use read.table().

Starting with your vector xd1 from your first posting:
 raw2 - readLines(con=textConnection(xd1))
 raw2 - gsub(^ +, , raw2)
 raw2 - gsub( 1,   1, raw2)
 raw3 - gsub(  +, ,, raw2)
 agerace - read.table(text=raw3, header=TRUE, sep=,, as.is=TRUE)
 str(agerace)
'data.frame':   195 obs. of  5 variables:
$ race : chr  Mexican Puerto Rican Cuban C-S American ...
$ age  : chr  12-17 12-17 12-17 12-17 ...
$ percent  : num  5.79 5.2 3.8 4.37 1.81 ...
$ sepercent: num  0.642 0.249 1.005 0.553 0.467 ...
$ flag_var : chr  any any any any ...


-Original Message-
 From: arun [mailto:smartpink...@yahoo.com]
 Sent: Saturday, December 08, 2012 5:11 PM
 To: Muhuri, Pradip (SAMHSA/CBHSQ)
 Cc: R help; David L Carlson
 Subject: Re: [R] read. table()

 HI Pradip,

 Try this:
 source(Muhuri.txt)
 #Muhuri.txt
 Lines-  raceage   percent  sepercent  flag_var
  Mexican 12-17  5.7926   0.64195  any--
 ---
 
 
 Lines1-readLines(textConnection(Lines))

 Col1new-gsub(
 ,,gsub(\\s+(\\D+)[[:digit:]]+\\+.*,\\1,gsub(\\s+(\\D+)[[:digit:
 ]]+\\-.*,\\1,Lines1[-1])))
 Col2-
 gsub(\\s+\\D+([[:digit:]]+\\+.*),\\1,gsub(\\s+\\D+([[:digit:]]+\\-
 .*),\\1,Lines1[-1]))
 dat1-
 data.frame(Col1new,read.table(text=Col2,stringsAsFactors=FALSE,sep=),
 stringsAsFactors=FALSE)

 heading-unlist(strsplit(Lines1[1], ))
 colnames(dat1)-heading[heading!=]
  head(dat1,6)
 #race   age percent sepercent flag_var
 #1Mexican 12-17  5.7926   0.64195  any
 #2PuertoRican 12-17  5.1975   0.24929  any
 #3  Cuban 12-17  3.7977   1.00487  any
 #4C-SAmerican 12-17  4.3665   0.55329  any
 #5  Dominican 12-17  1.8149   0.46677  any
 #6 Spanish(Spain) 12-17  6.1971   0.98386  any



  str(dat1)
 'data.frame':195 obs. of  5 variables:
  $ race : chr  Mexican PuertoRican Cuban C-SAmerican ...
  $ age  : chr  12-17 12-17 12-17 12-17 ...
  $ percent  : num  5.79 5.2 3.8 4.37 1.81 ...
  $ sepercent: num  0.642 0.249 1.005 0.553 0.467

[R] subsetting - questions

2012-11-23 Thread Muhuri, Pradip (SAMHSA/CBHSQ)


Hello,

I have two very basic questions (console attached):

1) What am I getting an error message for  # 5 and # 7 ?
2) How to fix the code?

I would appreciate receiving your help.

Thanks,

Pradip Muhuri



## Reproducible Example  #

N - 100
set.seed(13)
df-data.frame(matrix(sample(c(1:10),N, replace=TRUE),ncol=5))

keep_var - c(X1, X2)
drop_var - c(X3, X4, X5)


df[df$X1=8,] [,1:2]   #1
df[df$X1=8,] [,-c(3,4,5)] #2
df[df$X1=8,] [,c(-3,-4,-5)]   #3
df[df$X1=8,] [,c(X1, X2)] #4
df[df$X1=8,] [,-c(X3, X4, X5)]  #5  DOES NOT WORK
df[df$X1=8,] [,keep_var]  #6
df[df$X1=8,] [, !drop_var]#7   DOES NOT WORK

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] subsetting - questions

2012-11-23 Thread Muhuri, Pradip (SAMHSA/CBHSQ)

Hi Arun,

Thank you so much for your help.

Pradip

From: arun [smartpink...@yahoo.com]
Sent: Friday, November 23, 2012 10:15 PM
To: Muhuri, Pradip (SAMHSA/CBHSQ)
Cc: R help
Subject: Re: [R] subsetting - questions

HI,
This should work:
df[df$X1=8,][-which(names(df)%in% c(X3,X4,X5))]
#   X1 X2
#1   8  2
#5  10  1
#8   8  5
#9   9  4
#12  9  5
#13  9 10
#19  9  8
 df[df$X1=8,][,!names(df)%in%drop_var]
#   X1 X2
#1   8  2
#5  10  1
#8   8  5
#9   9  4
#12  9  5
#13  9 10
#19  9  8
A.K.

- Original Message -
From: Muhuri, Pradip (SAMHSA/CBHSQ) pradip.muh...@samhsa.hhs.gov
To: r-help@r-project.org r-help@r-project.org
Cc:
Sent: Friday, November 23, 2012 9:55 PM
Subject: [R] subsetting - questions

Hello,

I have two very basic questions (console attached):

1) What am I getting an error message for  # 5 and # 7 ?
2) How to fix the code?

I would appreciate receiving your help.

Thanks,

Pradip Muhuri

## Reproducible Example  #

N - 100
set.seed(13)
df-data.frame(matrix(sample(c(1:10),N, replace=TRUE),ncol=5))

keep_var - c(X1, X2)
drop_var - c(X3, X4, X5)

df[df$X1=8,] [,1:2]   #1
df[df$X1=8,] [,-c(3,4,5)] #2
df[df$X1=8,] [,c(-3,-4,-5)]   #3
df[df$X1=8,] [,c(X1, X2)] #4
df[df$X1=8,] [,-c(X3, X4, X5)]  #5  DOES NOT WORK
df[df$X1=8,] [,keep_var]  #6
df[df$X1=8,] [, !drop_var]    #7   DOES NOT WORK

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] subsetting - questions

2012-11-23 Thread Muhuri, Pradip (SAMHSA/CBHSQ)

Hi Jorge,

I could use subset(). But, I wanted to minimize coding.

Thanks,

Pradip

From: Jorge I Velez [jorgeivanve...@gmail.com]
Sent: Friday, November 23, 2012 10:02 PM
To: Muhuri, Pradip (SAMHSA/CBHSQ)
Cc: r-help@r-project.org
Subject: Re: [R] subsetting - questions

Hi Pradip,

It is easier to use subset().  Check ?subset for some examples and pay special 
attention to the select parameter.  By the way, do not call your data df as 
it is already a function.

Best,
Jorge.-


On Sat, Nov 24, 2012 at 1:55 PM, Muhuri, Pradip (SAMHSA/CBHSQ)  wrote:

Hello,

I have two very basic questions (console attached):

1) What am I getting an error message for  # 5 and # 7 ?
2) How to fix the code?

I would appreciate receiving your help.

Thanks,

Pradip Muhuri



## Reproducible Example  #

N - 100
set.seed(13)
df-data.frame(matrix(sample(c(1:10),N, replace=TRUE),ncol=5))

keep_var - c(X1, X2)
drop_var - c(X3, X4, X5)


df[df$X1=8,] [,1:2]   #1
df[df$X1=8,] [,-c(3,4,5)] #2
df[df$X1=8,] [,c(-3,-4,-5)]   #3
df[df$X1=8,] [,c(X1, X2)] #4
df[df$X1=8,] [,-c(X3, X4, X5)]  #5  DOES NOT WORK
df[df$X1=8,] [,keep_var]  #6
df[df$X1=8,] [, !drop_var]#7   DOES NOT WORK

__
R-help@r-project.orgmailto:R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] subsetting - questions

2012-11-23 Thread Muhuri, Pradip (SAMHSA/CBHSQ)



Hello Peter,

1. -c(X3, X4, X5) 
For the above variables, class is integer.

Arun has suggested the following:

df[df$X1=8,][-which(names(df)%in% c(X3,X4,X5))]

2.df[df$X1=8,] [, !names(df) %in% drop_var]

I agree - Arun has also suggested the same.

Thanks and regards,

Pradip



From: Peter Ehlers [ehl...@ucalgary.ca]
Sent: Friday, November 23, 2012 10:47 PM
To: Muhuri, Pradip (SAMHSA/CBHSQ)
Cc: r-help@r-project.org
Subject: Re: [R] subsetting - questions

On 2012-11-23 18:55, Muhuri, Pradip (SAMHSA/CBHSQ) wrote:

 Hello,

 I have two very basic questions (console attached):

 1) What am I getting an error message for  # 5 and # 7 ?
 2) How to fix the code?

 I would appreciate receiving your help.

 Thanks,

 Pradip Muhuri



 ## Reproducible Example  #

 N - 100
 set.seed(13)
 df-data.frame(matrix(sample(c(1:10),N, replace=TRUE),ncol=5))

 keep_var - c(X1, X2)
 drop_var - c(X3, X4, X5)


 df[df$X1=8,] [,1:2]   #1
 df[df$X1=8,] [,-c(3,4,5)] #2
 df[df$X1=8,] [,c(-3,-4,-5)]   #3
 df[df$X1=8,] [,c(X1, X2)] #4
 df[df$X1=8,] [,-c(X3, X4, X5)]  #5  DOES NOT WORK
 df[df$X1=8,] [,keep_var]  #6
 df[df$X1=8,] [, !drop_var]#7   DOES NOT WORK

To see what's wrong, just print the problematic part:

-c(X3, X4, X5)

You can't negate a character vector; you have to have a numeric vector.

And

!drop_var

doesn't work because you need something that evaluates to a logical
value if you want to ! it.

This will do it:

df[df$X1=8,] [, !names(df) %in% drop_var]

Or use the subset() function, as Jorge suggests.

Peter Ehlers

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Data Extraction

2012-11-22 Thread Muhuri, Pradip (SAMHSA/CBHSQ)

Hello,

I would appreciate if someone could help me resolve the following:

1. df1[!is.na( X1 | X2 | X3 | X4 | X5),][,1:5] # This does not work

2. Is these message harmful?  The following object(s) are masked from 'df1 
(position 3)':
X1, X2, X3, X4, X5

Thanks,

Pradip Muhuri


#Reproducible Example
set.seed(5)
df1-data.frame(matrix(sample(c(1:10,NA),100,replace=TRUE),ncol=5))
attach (df1)
#delete rows if any of them NA for X1
df1[!is.na( X1),][,1:5] # This works

#delete rows if any of them NA for X1, X2, X3, X4 or X5
df1[!is.na( X1 | X2 | X3 | X4 | X5),][,1:5] # This does not work

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Data Extraction

2012-11-22 Thread Muhuri, Pradip (SAMHSA/CBHSQ)

Petr,

You have shown a solution that is the simplest.

Thanks and regards,

Pradip Muhuri
Beginner useR


From: PIKAL Petr [petr.pi...@precheza.cz]
Sent: Thursday, November 22, 2012 9:33 AM
To: Muhuri, Pradip (SAMHSA/CBHSQ); r-help@r-project.org
Subject: RE: Data Extraction

Hi

do you want this?

 df1[complete.cases(df1),]
   X1 X2 X3 X4 X5
2   8  8  3  2 10
6   8  6  7 10  1
11  4  5  5 10  8
12  6  1  7  8  4
17  5  7  3  1  3
18 10  7  3  8  7
19  7  5  3  5  6
20 10  5  2  4  6

Regards
Petr

 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
 project.org] On Behalf Of Muhuri, Pradip (SAMHSA/CBHSQ)
 Sent: Thursday, November 22, 2012 3:11 PM
 To: Muhuri, Pradip (SAMHSA/CBHSQ); r-help@r-project.org
 Subject: [R] Data Extraction

 Hello,

 I would appreciate if someone could help me resolve the following:

 1. df1[!is.na( X1 | X2 | X3 | X4 | X5),][,1:5] # This does not work

 2. Is these message harmful?  The following object(s) are masked from
 'df1 (position 3)':
 X1, X2, X3, X4, X5

 Thanks,

 Pradip Muhuri


 #Reproducible Example
 set.seed(5)
 df1-data.frame(matrix(sample(c(1:10,NA),100,replace=TRUE),ncol=5))
 attach (df1)
 #delete rows if any of them NA for X1
 df1[!is.na( X1),][,1:5] # This works

 #delete rows if any of them NA for X1, X2, X3, X4 or X5 df1[!is.na( X1
 | X2 | X3 | X4 | X5),][,1:5] # This does not work

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Data Extraction

2012-11-22 Thread Muhuri, Pradip (SAMHSA/CBHSQ)

Hi Bert,

Your solution is similar to Petr's.

Thanks and regards,

Pradip Muhuri
BeginneR UseR


From: Bert Gunter [gunter.ber...@gene.com]
Sent: Thursday, November 22, 2012 10:20 AM
To: Berend Hasselman
Cc: Muhuri, Pradip (SAMHSA/CBHSQ); r-help@r-project.org
Subject: Re: [R] Data Extraction

Unnecessarily complicated. ?na.omit (linked from ?complete.cases)

df - na.omit(df)

-- Bert


On Thu, Nov 22, 2012 at 6:49 AM, Berend Hasselman 
b...@xs4all.nlmailto:b...@xs4all.nl wrote:

On 22-11-2012, at 15:11, Muhuri, Pradip (SAMHSA/CBHSQ) wrote:

 Hello,

 I would appreciate if someone could help me resolve the following:

 1. df1[!is.nahttp://is.na( X1 | X2 | X3 | X4 | X5),][,1:5] # This does not 
 work

 2. Is these message harmful?  The following object(s) are masked from 'df1 
 (position 3)':
X1, X2, X3, X4, X5

 Thanks,

 Pradip Muhuri


 #Reproducible Example
 set.seed(5)
 df1-data.frame(matrix(sample(c(1:10,NA),100,replace=TRUE),ncol=5))
 attach (df1)
 #delete rows if any of them NA for X1
 df1[!is.nahttp://is.na( X1),][,1:5] # This works

 #delete rows if any of them NA for X1, X2, X3, X4 or X5
 df1[!is.nahttp://is.na( X1 | X2 | X3 | X4 | X5),][,1:5] # This does not work

Yet another way of doing this is

df1[!is.nahttp://is.na(rowSums(df1)),][1:5]

But Petr's solution appears to be quickest.
See this:

 N - 10
 set.seed(13)
 df - data.frame(matrix(sample(c(1:10,NA),N,replace=TRUE),ncol=50))
 library(rbenchmark)

 f1 - function(df) {df[apply(df, 1, 
 function(x)all(!is.nahttp://is.na(x))),][,1:ncol(df)]}
 f2 - function(df) {df[!is.nahttp://is.na(rowSums(df)),][1:ncol(df)]}
 f3 - function(df) {df[complete.cases(df),][1:ncol(df)]}

 benchmark(d1 - f1(df), d2 - f2(df), d3 - f3(df), 
 columns=c(test,elapsed, relative, replications))
  test elapsed relative replications
1 d1 - f1(df)   3.675   13.172  100
2 d2 - f2(df)   0.4011.437  100
3 d3 - f3(df)   0.2791.000  100

 identical(d1,d2)
[1] TRUE
 identical(d1,d3)
[1] TRUE


Berend

__
R-help@r-project.orgmailto:R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Data Extraction

2012-11-22 Thread Muhuri, Pradip (SAMHSA/CBHSQ)

Hi Berend,

You have compared all 3 ways.  ... very nicely evaluated. 

Thanks and regards,

Pradip Muhuri

Beginner UseR


From: Berend Hasselman [b...@xs4all.nl]
Sent: Thursday, November 22, 2012 9:49 AM
To: Muhuri, Pradip (SAMHSA/CBHSQ)
Cc: r-help@r-project.org
Subject: Re: [R] Data Extraction

On 22-11-2012, at 15:11, Muhuri, Pradip (SAMHSA/CBHSQ) wrote:

 Hello,

 I would appreciate if someone could help me resolve the following:

 1. df1[!is.na( X1 | X2 | X3 | X4 | X5),][,1:5] # This does not work

 2. Is these message harmful?  The following object(s) are masked from 'df1 
 (position 3)':
X1, X2, X3, X4, X5

 Thanks,

 Pradip Muhuri


 #Reproducible Example
 set.seed(5)
 df1-data.frame(matrix(sample(c(1:10,NA),100,replace=TRUE),ncol=5))
 attach (df1)
 #delete rows if any of them NA for X1
 df1[!is.na( X1),][,1:5] # This works

 #delete rows if any of them NA for X1, X2, X3, X4 or X5
 df1[!is.na( X1 | X2 | X3 | X4 | X5),][,1:5] # This does not work

Yet another way of doing this is

df1[!is.na(rowSums(df1)),][1:5]

But Petr's solution appears to be quickest.
See this:

 N - 10
 set.seed(13)
 df - data.frame(matrix(sample(c(1:10,NA),N,replace=TRUE),ncol=50))
 library(rbenchmark)

 f1 - function(df) {df[apply(df, 1, function(x)all(!is.na(x))),][,1:ncol(df)]}
 f2 - function(df) {df[!is.na(rowSums(df)),][1:ncol(df)]}
 f3 - function(df) {df[complete.cases(df),][1:ncol(df)]}

 benchmark(d1 - f1(df), d2 - f2(df), d3 - f3(df), 
 columns=c(test,elapsed, relative, replications))
  test elapsed relative replications
1 d1 - f1(df)   3.675   13.172  100
2 d2 - f2(df)   0.4011.437  100
3 d3 - f3(df)   0.2791.000  100

 identical(d1,d2)
[1] TRUE
 identical(d1,d3)
[1] TRUE


Berend

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Data Extraction

2012-11-22 Thread Muhuri, Pradip (SAMHSA/CBHSQ)

Hi Sarah,

I am glad you have precisely caught where I made the mistake.  Thank you so 
much.

regards,

Pradip Muhuri


From: Sarah Goslee [sarah.gos...@gmail.com]
Sent: Thursday, November 22, 2012 9:21 AM
To: Muhuri, Pradip (SAMHSA/CBHSQ)
Cc: r-help@r-project.org
Subject: Re: [R] Data Extraction

Hi,

is.nahttp://is.na/( X1 | X2 | X3 | X4 | X5)
isn't a valid construct.

You'd need
!(is.nahttp://is.na(X1) | is.nahttp://is.na(X2) etc )

Or more elegantly
df1[apply(df1, 1, function(x)all(!is.nahttp://is.na(x))), ]

Sarah

On Thursday, November 22, 2012, Muhuri, Pradip (SAMHSA/CBHSQ) wrote:
Hello,

I would appreciate if someone could help me resolve the following:

1. df1[!is.nahttp://is.na( X1 | X2 | X3 | X4 | X5),][,1:5] # This does not 
work

2. Is these message harmful?  The following object(s) are masked from 'df1 
(position 3)':
X1, X2, X3, X4, X5

Thanks,

Pradip Muhuri


#Reproducible Example
set.seed(5)
df1-data.frame(matrix(sample(c(1:10,NA),100,replace=TRUE),ncol=5))
attach (df1)
#delete rows if any of them NA for X1
df1[!is.nahttp://is.na( X1),][,1:5] # This works

#delete rows if any of them NA for X1, X2, X3, X4 or X5
df1[!is.nahttp://is.na( X1 | X2 | X3 | X4 | X5),][,1:5] # This does not work

__
R-help@r-project.orgjavascript:; mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


--
Sarah Goslee
http://www.stringpage.com
http://www.sarahgoslee.com
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Data Extraction - benchmark()

2012-11-22 Thread Muhuri, Pradip (SAMHSA/CBHSQ)

Hi Berend,

I see you are one of the contributors to the rbecnhmark package. 

I am sorry that I am bothering you again.  I have tried to run your  code 
(slightly tweaked)  involving the benchmark function, and I am getting the 
following error message. What am I doing wrong?


Error in benchmark(d1 - s1(df), d2 - s2(df), d3 - s3(df), d4 - s4(df),  : 
  could not find function s1

 
 identical (d1,d2), identical (d1,d3), identical (d1,d4), identical (d1,d5), 
 identical (d1,d6)
Error: unexpected ',' in identical (d1,d2),

 sessionInfo ()
R version 2.15.1 (2012-06-22)
Platform: i386-pc-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252  
  LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C   LC_TIME=English_United States.1252   
 

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base 

other attached packages:
[1] rbenchmark_1.0.0

loaded via a namespace (and not attached):
[1] tools_2.15.1



I would appreciate receiving your help if your time permits ..


Thanks and regards,

Pradip Muhuri

#  Berend's code extended
N - 10
set.seed(13)
df-data.frame(matrix(sample(c(1:10,NA),N, replace=TRUE),ncol=50))
s1 - df[complete.cases(df),]
s2 - na.omit(df)
s3 - df[apply(df, 1, function(x)all(!is.na(x))), ]
s4 - function(df) {df[apply(df, 1, function(x)all(!is.na(x))),][,1:ncol(df)]}
s5 - function(df) {df[!is.na(rowSums(df)),][1:ncol(df)]}
s6 - function(df) {df[complete.cases(df),][1:ncol(df)]}

require(rbenchmark)
 
benchmark( d1 - s1(df), d2 - s2(df), d3 - s3(df), d4 - s4(df), d5 - 
s5(df), d6 - s6(df),
columns=c(test,elapsed, relative, replications) )

identical (d1,d2), identical (d1,d3), identical (d1,d4), identical (d1,d5), 
identical (d1,d6)





From: Berend Hasselman [b...@xs4all.nl]
Sent: Thursday, November 22, 2012 11:03 AM
To: Muhuri, Pradip (SAMHSA/CBHSQ)
Cc: r-help@r-project.org
Subject: Re: [R] Data Extraction

On 22-11-2012, at 16:50, Muhuri, Pradip (SAMHSA/CBHSQ) wrote:

 Hi Berend,

 You have compared all 3 ways.  ... very nicely evaluated.


Bert's solution is indeed nice and simple. But Petr's solution is still the 
quickest:

N - 10
 set.seed(13)
 df - data.frame(matrix(sample(c(1:10,NA),N,replace=TRUE),ncol=50))
 library(rbenchmark)

 f1 - function(df) {df[apply(df, 1, function(x)all(!is.na(x))),]}
 f2 - function(df) {df[!is.na(rowSums(df)),]}
 f3 - function(df) {df[complete.cases(df),]}
 f4 - function(df) {data.frame(na.omit(df))}
 benchmark(d1 - f1(df), d2 - f2(df), d3 - f3(df), d4 - f4(df), 
 columns=c(test,elapsed, relative, replications))
  test elapsed relative replications
1 d1 - f1(df)   3.588   14.888  100
2 d2 - f2(df)   0.4031.672  100
3 d3 - f3(df)   0.2411.000  100
4 d4 - f4(df)   0.5572.311  100

 identical(d1,d2)
[1] TRUE
 identical(d1,d3)
[1] TRUE
 identical(d1,d4)
[1] TRUE

Berend

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Data Extraction - benchmark()

2012-11-22 Thread Muhuri, Pradip (SAMHSA/CBHSQ)

Hi Berend,

Thank you very much for pointing out the mistake and for your patience. I have 
corrected the the script, which  has worked fine.

regards,

Pradip Muhuri



From: Berend Hasselman [b...@xs4all.nl]
Sent: Thursday, November 22, 2012 12:42 PM
To: Muhuri, Pradip (SAMHSA/CBHSQ)
Cc: r-help@r-project.org
Subject: Re: [R] Data Extraction - benchmark()

On 22-11-2012, at 18:20, Muhuri, Pradip (SAMHSA/CBHSQ) wrote:

 Hi Berend,

 I see you are one of the contributors to the rbecnhmark package.

 I am sorry that I am bothering you again.  I have tried to run your  code 
 (slightly tweaked)  involving the benchmark function, and I am getting the 
 following error message. What am I doing wrong?


 Error in benchmark(d1 - s1(df), d2 - s2(df), d3 - s3(df), d4 - s4(df),  :
  could not find function s1



Because you haven't defined a function s1 (or s2, s3, s4 for that matter).
You did s1 - df[complete.cases(df),]

Berend


 identical (d1,d2), identical (d1,d3), identical (d1,d4), identical (d1,d5), 
 identical (d1,d6)
 Error: unexpected ',' in identical (d1,d2),

 sessionInfo ()
 R version 2.15.1 (2012-06-22)
 Platform: i386-pc-mingw32/i386 (32-bit)

 locale:
 [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United 
 States.1252LC_MONETARY=English_United States.1252
 [4] LC_NUMERIC=C   LC_TIME=English_United States.1252

 attached base packages:
 [1] stats graphics  grDevices utils datasets  methods   base

 other attached packages:
 [1] rbenchmark_1.0.0

 loaded via a namespace (and not attached):
 [1] tools_2.15.1



 I would appreciate receiving your help if your time permits ..


 Thanks and regards,

 Pradip Muhuri

 #  Berend's code extended
 N - 10
 set.seed(13)
 df-data.frame(matrix(sample(c(1:10,NA),N, replace=TRUE),ncol=50))
 s1 - df[complete.cases(df),]
 s2 - na.omit(df)
 s3 - df[apply(df, 1, function(x)all(!is.na(x))), ]
 s4 - function(df) {df[apply(df, 1, function(x)all(!is.na(x))),][,1:ncol(df)]}
 s5 - function(df) {df[!is.na(rowSums(df)),][1:ncol(df)]}
 s6 - function(df) {df[complete.cases(df),][1:ncol(df)]}

 require(rbenchmark)

 benchmark( d1 - s1(df), d2 - s2(df), d3 - s3(df), d4 - s4(df), d5 - 
 s5(df), d6 - s6(df),
columns=c(test,elapsed, relative, replications) )

 identical (d1,d2), identical (d1,d3), identical (d1,d4), identical (d1,d5), 
 identical (d1,d6)




 
 From: Berend Hasselman [b...@xs4all.nl]
 Sent: Thursday, November 22, 2012 11:03 AM
 To: Muhuri, Pradip (SAMHSA/CBHSQ)
 Cc: r-help@r-project.org
 Subject: Re: [R] Data Extraction

 On 22-11-2012, at 16:50, Muhuri, Pradip (SAMHSA/CBHSQ) wrote:

 Hi Berend,

 You have compared all 3 ways.  ... very nicely evaluated.


 Bert's solution is indeed nice and simple. But Petr's solution is still the 
 quickest:

 N - 10
 set.seed(13)
 df - data.frame(matrix(sample(c(1:10,NA),N,replace=TRUE),ncol=50))
 library(rbenchmark)

 f1 - function(df) {df[apply(df, 1, function(x)all(!is.na(x))),]}
 f2 - function(df) {df[!is.na(rowSums(df)),]}
 f3 - function(df) {df[complete.cases(df),]}
 f4 - function(df) {data.frame(na.omit(df))}
 benchmark(d1 - f1(df), d2 - f2(df), d3 - f3(df), d4 - f4(df), 
 columns=c(test,elapsed, relative, replications))
  test elapsed relative replications
 1 d1 - f1(df)   3.588   14.888  100
 2 d2 - f2(df)   0.4031.672  100
 3 d3 - f3(df)   0.2411.000  100
 4 d4 - f4(df)   0.5572.311  100

 identical(d1,d2)
 [1] TRUE
 identical(d1,d3)
 [1] TRUE
 identical(d1,d4)
 [1] TRUE

 Berend


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] github

2012-11-20 Thread Muhuri, Pradip (SAMHSA/CBHSQ)

Thanks, Michael.

Pradip

Pradip K. Muhuri, PhD
Statistician
Substance Abuse  Mental Health Services Administration
The Center for Behavioral Health Statistics and Quality
Division of Population Surveys
1 Choke Cherry Road, Room 2-1071
Rockville, MD 20857
 
Tel: 240-276-1070
Fax: 240-276-1260
e-mail: pradip.muh...@samhsa.hhs.gov
 
The Center for Behavioral Health Statistics and Quality your feedback.  Please 
click on the following link to complete a brief customer survey:   
http://cbhsqsurvey.samhsa.gov


-Original Message-
From: R. Michael Weylandt [mailto:michael.weyla...@gmail.com] 
Sent: Tuesday, November 20, 2012 8:41 AM
To: Muhuri, Pradip (SAMHSA/CBHSQ)
Cc: r-help@r-project.org
Subject: Re: [R] github

On Tue, Nov 20, 2012 at 2:07 AM, Muhuri, Pradip (SAMHSA/CBHSQ)
pradip.muh...@samhsa.hhs.gov wrote:

 Hello,

 I would like to learn how to set up Github/repository and upload/update files 
 and am looking for Github for Dummies.  Any help will be appreciated.


I believe Hadley has done some github integration work:
https://github.com/hadley/devtools

Michael

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] kinitr

2012-11-20 Thread Muhuri, Pradip (SAMHSA/CBHSQ)

Dear Michael,

I really appreciated that you have sent me the link info - Jeromy Anglim's 
Blog. I was exactly looking for this kind of resources about R Markdown and 
knitr. All this would be of immense help.

Thank you so much. 

Pradip

Pradip K. Muhuri, PhD
Statistician
Substance Abuse  Mental Health Services Administration
The Center for Behavioral Health Statistics and Quality
Division of Population Surveys
1 Choke Cherry Road, Room 2-1071
Rockville, MD 20857
 
Tel: 240-276-1070
Fax: 240-276-1260
e-mail: pradip.muh...@samhsa.hhs.gov
 
The Center for Behavioral Health Statistics and Quality your feedback.  Please 
click on the following link to complete a brief customer survey:   
http://cbhsqsurvey.samhsa.gov


-Original Message-
From: R. Michael Weylandt [mailto:michael.weyla...@gmail.com] 
Sent: Tuesday, November 20, 2012 8:36 AM
To: Muhuri, Pradip (SAMHSA/CBHSQ)
Cc: r-help@r-project.org
Subject: Re: [R] kinitr

On Tue, Nov 20, 2012 at 1:57 AM, Muhuri, Pradip (SAMHSA/CBHSQ)
pradip.muh...@samhsa.hhs.gov wrote:
 Hello,

 I am an Intro-level R and ggplot2 user and looking for resources to self 
 teach dynamic report generation in R using knitr. Any advice would be highly 
 appreciated.

http://jeromyanglim.blogspot.co.uk/2012/05/getting-started-with-r-markdown-knitr.html

Michael Weylandt

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] kinitr

2012-11-20 Thread Muhuri, Pradip (SAMHSA/CBHSQ)

Mark-

Thank you for your help.

Pradip

Pradip K. Muhuri, PhD
Statistician
Substance Abuse  Mental Health Services Administration
The Center for Behavioral Health Statistics and Quality
Division of Population Surveys
1 Choke Cherry Road, Room 2-1071
Rockville, MD 20857

Tel: 240-276-1070
Fax: 240-276-1260
e-mail: pradip.muh...@samhsa.hhs.govmailto:pradip.muh...@samhsa.hhs.gov

The Center for Behavioral Health Statistics and Quality your feedback.  Please 
click on the following link to complete a brief customer survey:   
http://cbhsqsurvey.samhsa.govhttp://cbhsqsurvey.samhsa.gov/

From: Mark Lamias [mailto:mlam...@yahoo.com]
Sent: Tuesday, November 20, 2012 4:45 PM
To: Muhuri, Pradip (SAMHSA/CBHSQ); 'R. Michael Weylandt'
Cc: r-help@r-project.org
Subject: Re: [R] kinitr

This is how I learned everything about knitr:

http://yihui.name/knitr/

Yihui is great and his site gives  you pretty much all the information you need 
to get you started.


From: Muhuri, Pradip (SAMHSA/CBHSQ) pradip.muh...@samhsa.hhs.gov
To: 'R. Michael Weylandt' michael.weyla...@gmail.com
Cc: r-help@r-project.org r-help@r-project.org
Sent: Tuesday, November 20, 2012 9:10 AM
Subject: Re: [R] kinitr

Dear Michael,

I really appreciated that you have sent me the link info - Jeromy Anglim's 
Blog. I was exactly looking for this kind of resources about R Markdown and 
knitr. All this would be of immense help.

Thank you so much.

Pradip

Pradip K. Muhuri, PhD
Statistician
Substance Abuse  Mental Health Services Administration
The Center for Behavioral Health Statistics and Quality
Division of Population Surveys
1 Choke Cherry Road, Room 2-1071
Rockville, MD 20857

Tel: 240-276-1070
Fax: 240-276-1260
e-mail: pradip.muh...@samhsa.hhs.govmailto:pradip.muh...@samhsa.hhs.gov

The Center for Behavioral Health Statistics and Quality your feedback.  Please 
click on the following link to complete a brief customer survey:   
http://cbhsqsurvey.samhsa.gov


-Original Message-
From: R. Michael Weylandt 
[mailto:michael.weyla...@gmail.commailto:michael.weyla...@gmail.com]
Sent: Tuesday, November 20, 2012 8:36 AM
To: Muhuri, Pradip (SAMHSA/CBHSQ)
Cc: r-help@r-project.orgmailto:r-help@r-project.org
Subject: Re: [R] kinitr

On Tue, Nov 20, 2012 at 1:57 AM, Muhuri, Pradip (SAMHSA/CBHSQ)
pradip.muh...@samhsa.hhs.govmailto:pradip.muh...@samhsa.hhs.gov wrote:
 Hello,

 I am an Intro-level R and ggplot2 user and looking for resources to self 
 teach dynamic report generation in R using knitr. Any advice would be highly 
 appreciated.

http://jeromyanglim.blogspot.co.uk/2012/05/getting-started-with-r-markdown-knitr.html

Michael Weylandt

__
R-help@r-project.orgmailto:R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] kinitr

2012-11-19 Thread Muhuri, Pradip (SAMHSA/CBHSQ)

Hello,

I am an Intro-level R and ggplot2 user and looking for resources to self teach 
dynamic report generation in R using knitr. Any advice would be highly 
appreciated.

Thanks,

Pradip

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] github

2012-11-19 Thread Muhuri, Pradip (SAMHSA/CBHSQ)


Hello,

I would like to learn how to set up Github/repository and upload/update files 
and am looking for Github for Dummies.  Any help will be appreciated.

Thanks,

Pradip

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Saving R Graph to a file

2012-11-04 Thread Muhuri, Pradip (SAMHSA/CBHSQ)


Hello,

#Example 1: The following code to save svyboxplots works for me

pdf(boxplots_dthage.pdf, width = 1020) # 4 boxplots in 2 columns and 2 rows
par(mfrow=c(2,2),  oma=c(0,0,0,0))
# svyboxplot commands not shown
dev.off()


#Example 2: The following code to save a ggplot graph works for me:
# ggolot () not shown
print (p)
ggsave(file='Xfacet_abodill_age3.pdf', width=12, height=8) 


Thanks,

Pradip Muhuri

From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] On Behalf Of 
Robert Baer [rb...@atsu.edu]
Sent: Sunday, November 04, 2012 5:32 AM
To: frespider
Cc: r-help@r-project.org
Subject: Re: [R] Saving R Graph to a file

Some hints:
For pdf(), height and width are in inches, not pixels.  dev.off() is
necessary after drawing the image for pdf(). The name for the file
argument (file=c:/figure.xxx) is file not filename

hist(CO2[,5]) is more interesting

And yes,
?pdf
?postscript
?ping

On 11/3/2012 11:16 PM, frespider wrote:
 Hi

 I am not sure why I can't get my plot saved to a file as .ps, I searched
 online and I found that I have to use something is called postscript,png or
 pdf function which I did but still not working.

 Actually what I have is a matrix with almost 300-400 columns. I need to
 create a histogram and boxplot for some columns as .ps file (with reasonable
 size if i can adjust that would be nice also) so I can import them in my
 latex code to display a good chart on my report.  And I found out R display
 a certain limit of device.
 Can you please help me code this?

 This an example I create
 data(CO2)
 png(filename=C:/R/figure.png, height=295, width=300,  bg=white)
 hist(CO2[,4])
 device.off()
 pdf(filename=C:/R/figure.pdf, height=295, width=300,  bg=white)
 hist(CO2[,4])
 postscript(filename=C:/R/figure.pdf, height=295, width=300,  bg=white)
 hist(CO2[,4])


 Thanks





 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Saving-R-Graph-to-a-file-tp4648369.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


--
__
Robert W Baer, Ph.D.
Professor of Physiology
Kirksville College of Osteopathic Medicine
A. T. Still University of Health Sciences
Kirksville, MO 63501 US

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Logical vector-based extraction

2012-11-03 Thread Muhuri, Pradip (SAMHSA/CBHSQ)


Hello,

The most part of the program works except that the following logical variable 
does not get created although the second logical variable-based extraction 
works.

 I don't understand what I am doing wrong here.

state_pflt200 - df$p_fatal 200
df[state_pflt200, c(state.name,p_fatal)]


I would appreciate receiving your help.

Thanks,

Pradip Muhuri




# Below is the code that includes the reproducible example. 

df - data.frame (state.name=
  
c(Alabama,Alaska,Arizona,Arkansas,California,Colorado,Connecticut,
  Delaware,DC, 
Florida,Georgia,Hawaii,Idaho,Illinois,Indiana,
  Iowa,Kansas,Kentucky,   
Louisiana,Maine,Maryland,Massachusetts,Michigan,
  
Minnesota,Mississippi,Missouri,Montana,Nebraska,Nevada,New 
Hampshire,
  New Jersey,New Mexico,New York,North Carolina,North 
Dakota,Ohio,Oklahoma,
  Oregon,Pennsylvania,Rhode Island,South 
Carolina,South Dakota,Tennessee,Texas,
   Utah, Vermont,Virginia,Washington,West 
Virginia,Wisconsin,Wyoming),

   p_fatal = sample(200:500,51,replace=TRUE),

   t_safety_score = sample(1:10,51,replace=TRUE)
  )

options (width=120)



# The following logical variable does not get created - Don't understand what I 
am doing wrong
state_pflt200 - df$p_fatal 200
df[state_pflt200, c(state.name,p_fatal)]

# The following works
state_sslt5 - df$t_safety_score 5
df[state_sslt5,c(state.name, t_safety_score)]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How generate random numbers from given vector???

2012-10-25 Thread Muhuri, Pradip (SAMHSA/CBHSQ)


Hello,

The other options is to use the sample() function.

test2 - matrix (rep(sample(number1, size = 5), times=3), nrow=3)


Pradip Muhuri


From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] On Behalf Of 
Rui Barradas [ruipbarra...@sapo.pt]
Sent: Thursday, October 25, 2012 7:19 PM
To: Rlotus
Cc: r-help@r-project.org
Subject: Re: [R] How generate random numbers from given vector???

Hello,

You don't need the loop, the sample() argument 'size' is there for that.
See 'sample.

number - c(0,1,3,4,5,6,8)
rsidp - function(n) sample(number, n, replace = TRUE)
rsidp(5)

Hope this helps,

Rui Barradas
Em 25-10-2012 20:24, Rlotus escreveu:
 I wanna generate random numbers from a vector...

 for example number-c(0,1,3,4,5,6,8)
   so

 rsidp-function(x){
   i=0
   for (i in seq(1:x))

   {y-sample(number,x, replace=T)}
   return(y)
 }
   so all random numbers have to be from vector number;
 so if I type rsidp(5). it has to give me 5 random numbers except 2,7,9
 (because they are not in the vector numbers). help me plz with it (((







 --
 View this message in context: 
 http://r.789695.n4.nabble.com/How-generate-random-numbers-from-given-vector-tp4647447.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] svyboxplot - library (survey)

2012-10-18 Thread Muhuri, Pradip (SAMHSA/CBHSQ)

Hello Dr. Lumley,

Thank you for your advice/suggestions.

I have rescaled the weight (i.e., original weight divided by total weighted 
count averaged across 8 surveys - NHIS). As can be seen below (R console), the 
new weight sums to 1.

I have used the freq=TRUE argument in the svyhist () function along with a new 
svydesign object which includes the recalled weight.  There are two issues:

1) I am getting a warning message: In plot.histogram(h, ..., freq = 
freq, xlab =xlab,   main = main) :  the AREAS in the plot are wrong -- 
rather use freq=FALSE.

2) The scale of two graphs looks different (please see the attachment).

Any thoughts on how to resolve these issues?

Regards,

Pradip Muhuri

## R console is appended below ##
 options (width=120)  
 sum (tor$new_wt)
[1] 1
 
 # object with survey design variables and data with new_wt (rescaled) that 
 sums to 1
 xnhis - svydesign (id=~psu,strat=~stratum, weights=~new_wt, data=tor, 
 nest=TRUE)
 
 MyBreaks - c(18, 25, 35, 45, 55, 65, 75, 85, 95)
 
 par(mfrow=c(2,2))
 # Chart 1
 
 options( survey.lonely.psu = adjust )
 svyhist (~age_p,
+  subset (xnhis, xspd2=='SPD'), breaks=MyBreaks,
+   #ylim = c(0,0.040),
+  main=  , freq=TRUE,
+  col=red,
+  xlab=Age at Interview (SPD Category)
+  )
Warning message:
In plot.histogram(h, ..., freq = freq, xlab = xlab, main = main) :
  the AREAS in the plot are wrong -- rather use freq=FALSE
 #lines (svysmooth(~age_p, bandwidth=5,subset(nhis, xspd2=='SPD')), lwd=2)
 
 #Chart 2
 
 options( survey.lonely.psu = adjust )
  svyhist (~age_p,
+  subset (xnhis, xspd2=='No SPD'), breaks=MyBreaks,
+  #ylim = c(0,0.040),
+  main=  , freq=TRUE,
+  col=yellow, xlab=Age at Interview (No SPD Category)
+  )
Warning message:
In plot.histogram(h, ..., freq = freq, xlab = xlab, main = main) :
  the AREAS in the plot are wrong -- rather use freq=FALSE




Pradip K. Muhuri
Statistician
Substance Abuse  Mental Health Services Administration
The Center for Behavioral Health Statistics and Quality
Division of Population Surveys
1 Choke Cherry Road, Room 2-1071
Rockville, MD 20857
 
Tel: 240-276-1070
Fax: 240-276-1260
e-mail: pradip.muh...@samhsa.hhs.gov
 
The Center for Behavioral Health Statistics and Quality your feedback.  Please 
click on the following link to complete a brief customer survey:   
http://cbhsqsurvey.samhsa.gov


-Original Message-
From: Thomas Lumley [mailto:tlum...@uw.edu] 
Sent: Wednesday, October 17, 2012 11:13 PM
To: Muhuri, Pradip (SAMHSA/CBHSQ)
Cc: Anthony Damico; R help
Subject: Re: [R] svyboxplot - library (survey)

On Thu, Oct 18, 2012 at 2:04 PM, Muhuri, Pradip (SAMHSA/CBHSQ)
pradip.muh...@samhsa.hhs.gov wrote:
 Hello,

 I understand that svyhist ()  provides density histograms with density values 
 on the y-axis (R code shown below).  Is there a way one can have relative 
 relative frequency histograms with relative freqencies on the y-axis?

You get frequencies just by asking for them with freq: compare
   svyhist(~enroll, dstrat, main=Survey weighted,col=purple,freq=TRUE)
   svyhist(~enroll, dstrat, main=Survey weighted,col=purple)

If you mean that you want the heights of the bars to sum to 1, the
simplest way I know of is to rescale the weights to sum to 1 and use
freq=TRUE

   -thomas


 Any advice/help would be appreciated.

 Thanks,

 Pradip Muhuri





 ## svyhist - Density Histogram

 options( survey.lonely.psu = adjust )
 svyhist (~age_p,
  subset (nhis, xspd2=='SPD'), breaks=MyBreaks,
   ylim = c(0,0.040),
  main=  ,
  col=red,
  xlab=Age at Interview (SPD Category)
  )
 lines (svysmooth(~age_p, bandwidth=5,subset(nhis, xspd2=='SPD')), lwd=2)


 
 From: Anthony Damico [ajdam...@gmail.com]
 Sent: Monday, October 01, 2012 10:07 AM
 To: Muhuri, Pradip (SAMHSA/CBHSQ)
 Cc: R help
 Subject: Re: [R] svyboxplot - library (survey)

 using a slight modification of the example shown in ?svyboxplot


 # load survey library
 library(survey)

 # load example data
 data(api)

 # create an example svydesign
 dstrat - svydesign(id = ~1, strata = ~stype, weights = ~pw, data = apistrat,
 fpc = ~fpc)

 # set the plot window to display 1 plot x 2 plots
 par(mfrow=c(1,2))

 # generate two example boxplots
 svyboxplot(enroll~stype,dstrat,all.outliers=TRUE)
 svyboxplot(enroll~1,dstrat)

 # done



 # alternative: not as nice

 # set the plot window to display 2 plots x 1 plot
 par(mfrow=c(2,1))

 # generate two example boxplots
 svyboxplot(enroll~stype,dstrat,all.outliers=TRUE)
 svyboxplot(enroll~1,dstrat)

 # done







 On Mon, Oct 1, 2012 at 9:50 AM, Muhuri, Pradip (SAMHSA/CBHSQ) 
 pradip.muh...@samhsa.hhs.govmailto:pradip.muh...@samhsa.hhs.gov wrote:
 Hello,

 I have used the library (survey) package for boxplots using the following 
 code.

 Could anyone please tell me why I am getting only 1

Re: [R] svyboxplot - library (survey)

2012-10-18 Thread Muhuri, Pradip (SAMHSA/CBHSQ)

Hi Dr. Lumley,


Further thoughts:  To get the histogram of age with proportions (relative 
frequencies) on y-axis, I probably need to rescale the weight for each subgroup 
separately so that the  rescaled weight would sum to 1 for the respective 
subgroup.  Am I correct?

Thanks,

Pradip Muhuri



From: Muhuri, Pradip (SAMHSA/CBHSQ)
Sent: Thursday, October 18, 2012 4:45 PM
To: 'Thomas Lumley'
Cc: Anthony Damico; R help; Muhuri, Pradip (SAMHSA/CBHSQ)
Subject: RE: [R] svyboxplot - library (survey)

Hello Dr. Lumley,

Thank you for your advice/suggestions.

I have rescaled the weight (i.e., original weight divided by total weighted 
count averaged across 8 surveys - NHIS). As can be seen below (R console), the 
new weight sums to 1.

I have used the freq=TRUE argument in the svyhist () function along with a new 
svydesign object which includes the recalled weight.  There are two issues:

1) I am getting a warning message: In plot.histogram(h, ..., freq = 
freq, xlab =xlab,   main = main) :  the AREAS in the plot are wrong -- 
rather use freq=FALSE.

2) The scale of two graphs looks different (please see the attachment).

Any thoughts on how to resolve these issues?

Regards,

Pradip Muhuri

## R console is appended below ##
 options (width=120)
 sum (tor$new_wt)
[1] 1

 # object with survey design variables and data with new_wt (rescaled) that 
 sums to 1
 xnhis - svydesign (id=~psu,strat=~stratum, weights=~new_wt, data=tor, 
 nest=TRUE)

 MyBreaks - c(18, 25, 35, 45, 55, 65, 75, 85, 95)

 par(mfrow=c(2,2))
 # Chart 1

 options( survey.lonely.psu = adjust )
 svyhist (~age_p,
+  subset (xnhis, xspd2=='SPD'), breaks=MyBreaks,
+   #ylim = c(0,0.040),
+  main=  , freq=TRUE,
+  col=red,
+  xlab=Age at Interview (SPD Category)
+  )
Warning message:
In plot.histogram(h, ..., freq = freq, xlab = xlab, main = main) :
  the AREAS in the plot are wrong -- rather use freq=FALSE
 #lines (svysmooth(~age_p, bandwidth=5,subset(nhis, xspd2=='SPD')), lwd=2)

 #Chart 2

 options( survey.lonely.psu = adjust )
  svyhist (~age_p,
+  subset (xnhis, xspd2=='No SPD'), breaks=MyBreaks,
+  #ylim = c(0,0.040),
+  main=  , freq=TRUE,
+  col=yellow, xlab=Age at Interview (No SPD Category)
+  )
Warning message:
In plot.histogram(h, ..., freq = freq, xlab = xlab, main = main) :
  the AREAS in the plot are wrong -- rather use freq=FALSE




Pradip K. Muhuri
Statistician
Substance Abuse  Mental Health Services Administration
The Center for Behavioral Health Statistics and Quality
Division of Population Surveys
1 Choke Cherry Road, Room 2-1071
Rockville, MD 20857

Tel: 240-276-1070
Fax: 240-276-1260
e-mail: pradip.muh...@samhsa.hhs.gov

The Center for Behavioral Health Statistics and Quality your feedback.  Please 
click on the following link to complete a brief customer survey:   
http://cbhsqsurvey.samhsa.gov


-Original Message-
From: Thomas Lumley [mailto:tlum...@uw.edu]
Sent: Wednesday, October 17, 2012 11:13 PM
To: Muhuri, Pradip (SAMHSA/CBHSQ)
Cc: Anthony Damico; R help
Subject: Re: [R] svyboxplot - library (survey)

On Thu, Oct 18, 2012 at 2:04 PM, Muhuri, Pradip (SAMHSA/CBHSQ)
pradip.muh...@samhsa.hhs.gov wrote:
 Hello,

 I understand that svyhist ()  provides density histograms with density values 
 on the y-axis (R code shown below).  Is there a way one can have relative 
 relative frequency histograms with relative freqencies on the y-axis?

You get frequencies just by asking for them with freq: compare
   svyhist(~enroll, dstrat, main=Survey weighted,col=purple,freq=TRUE)
   svyhist(~enroll, dstrat, main=Survey weighted,col=purple)

If you mean that you want the heights of the bars to sum to 1, the
simplest way I know of is to rescale the weights to sum to 1 and use
freq=TRUE

   -thomas


 Any advice/help would be appreciated.

 Thanks,

 Pradip Muhuri





 ## svyhist - Density Histogram

 options( survey.lonely.psu = adjust )
 svyhist (~age_p,
  subset (nhis, xspd2=='SPD'), breaks=MyBreaks,
   ylim = c(0,0.040),
  main=  ,
  col=red,
  xlab=Age at Interview (SPD Category)
  )
 lines (svysmooth(~age_p, bandwidth=5,subset(nhis, xspd2=='SPD')), lwd=2)


 
 From: Anthony Damico [ajdam...@gmail.com]
 Sent: Monday, October 01, 2012 10:07 AM
 To: Muhuri, Pradip (SAMHSA/CBHSQ)
 Cc: R help
 Subject: Re: [R] svyboxplot - library (survey)

 using a slight modification of the example shown in ?svyboxplot


 # load survey library
 library(survey)

 # load example data
 data(api)

 # create an example svydesign
 dstrat - svydesign(id = ~1, strata = ~stype, weights = ~pw, data = apistrat,
 fpc = ~fpc)

 # set the plot window to display 1 plot x 2 plots
 par(mfrow=c(1,2))

 # generate two example boxplots
 svyboxplot(enroll~stype,dstrat,all.outliers=TRUE)
 svyboxplot(enroll~1,dstrat

Re: [R] svyboxplot - library (survey)

2012-10-17 Thread Muhuri, Pradip (SAMHSA/CBHSQ)

Hello,

I understand that svyhist ()  provides density histograms with density values 
on the y-axis (R code shown below).  Is there a way one can have relative 
relative frequency histograms with relative freqencies on the y-axis?  

Any advice/help would be appreciated.

Thanks,

Pradip Muhuri





## svyhist - Density Histogram

options( survey.lonely.psu = adjust )
svyhist (~age_p,
 subset (nhis, xspd2=='SPD'), breaks=MyBreaks,
  ylim = c(0,0.040),
 main=  ,
 col=red,
 xlab=Age at Interview (SPD Category)
 )
lines (svysmooth(~age_p, bandwidth=5,subset(nhis, xspd2=='SPD')), lwd=2)



From: Anthony Damico [ajdam...@gmail.com]
Sent: Monday, October 01, 2012 10:07 AM
To: Muhuri, Pradip (SAMHSA/CBHSQ)
Cc: R help
Subject: Re: [R] svyboxplot - library (survey)

using a slight modification of the example shown in ?svyboxplot


# load survey library
library(survey)

# load example data
data(api)

# create an example svydesign
dstrat - svydesign(id = ~1, strata = ~stype, weights = ~pw, data = apistrat,
fpc = ~fpc)

# set the plot window to display 1 plot x 2 plots
par(mfrow=c(1,2))

# generate two example boxplots
svyboxplot(enroll~stype,dstrat,all.outliers=TRUE)
svyboxplot(enroll~1,dstrat)

# done



# alternative: not as nice

# set the plot window to display 2 plots x 1 plot
par(mfrow=c(2,1))

# generate two example boxplots
svyboxplot(enroll~stype,dstrat,all.outliers=TRUE)
svyboxplot(enroll~1,dstrat)

# done







On Mon, Oct 1, 2012 at 9:50 AM, Muhuri, Pradip (SAMHSA/CBHSQ) 
pradip.muh...@samhsa.hhs.govmailto:pradip.muh...@samhsa.hhs.gov wrote:
Hello,

I have used the library (survey) package for boxplots using the following code.

Could anyone please tell me why I am getting only 1  boxplot instead of 2 
boxplots (1-SPD,  2-No SPD).

What changes in the following code would be required to get 2 boxplots in the 
same plot frame?

Thanks,

Pradip

###
nhis - svydesign (id=~psu, strat=~stratum, weights=~wt8,
data=tor, nest=TRUE)

svyboxplot (dthage~xspd2, subset (nhis, mortstat==1), col=gray80,
 varwidth=TRUE, ylab=Age at Death, xlab=SPD Status: 1-SPD, 2=No 
SPD)


Pradip K. Muhuri
Statistician
Substance Abuse  Mental Health Services Administration
The Center for Behavioral Health Statistics and Quality
Division of Population Surveys
1 Choke Cherry Road, Room 2-1071
Rockville, MD 20857

Tel: 240-276-1070
Fax: 240-276-1260
e-mail: pradip.muh...@samhsa.hhs.govmailto:pradip.muh...@samhsa.hhs.gov

The Center for Behavioral Health Statistics and Quality your feedback.  Please 
click on the following link to complete a brief customer survey:   
http://cbhsqsurvey.samhsa.gov

vide commented, minimal, self-contained, reproducible code.
__
R-help@r-project.orgmailto:R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] svyhist and svyboxplot

2012-10-13 Thread Muhuri, Pradip (SAMHSA/CBHSQ)

Hello,


The following code is expected to produce 4 charts. But, I only get charts 1,2 
, 4, NOT CHART # 3. 

For Chart# 3, I am getting the following error message: Error in 
tapply(1:NROW(x), list(factor(strata)), function(index) { :   arguments must 
have same length

I would appreciate if someone could  help me resolve the issue.

Thanks,

Pradip


# BELOW IS THE REPRODUCIBLE EXAMPLE
setwd (E:/RDATA)
options(width = 120)
library (survey)
library (KernSmooth)

xd1-
xsmoke age_p psu stratum   wt8
13601   322   2  20  356.5600
32966   338   2  45  434.3562
63493   132   1  87  699.9987
238175  346   1 338  982.8075
174162  340   1 240  273.6313
220206  333   2 308 1477.1688
118133  368   1 159  716.3012
142859  223   1 194 1100.9475
115253  235   2 155  444.3750
61675   331   1  85  769.5963
189813  337   1 263  328.5600
226274  147   2 318  605.8700
41969   371   2  58  597.0150
167667  340   2 230 1030.4637
225103  337   2 316  349.6825
49894   370   2  68  517.7862
98075   346   2 130 1428.7225
180771  350   1 250  652.4188
137057  342   1 186  590.2100
77705   223   1 105 1687.2450
89106   348   1 118  407.6513
208178  350   1 290  556.5000
100403  352   2 133 1481.8200
221571  127   2 310  833.5338
10823   272   1  16 1807.6425
108431  371   2 145  945.6263
68708   146   1  94 1989.3775
23874   323   2  33 1707.8775
150634  319   2 206  761.1500
231232  342   2 326 1487.4113
184654  242   2 255 1715.2375
215312  357   1 300  483.5663
40713   257   2  56 2042.2762
130309  323   1 177  948.5625
25515   255   1  35 2719.7525
235612  283   2 333  603.3537
13755   236   2  20  265.1938
2441333   1   4 1062.1200
157327  377   1 215 2010.6600
66502   320   2  91 1122.9725
230778  155   2 325 1207.3025
74805   354   1 101 1028.5150
166556  150   1 229 1546.9450
91914   168   1 121  428.5350
89651   359   2 118  143.5437
149329  344   2 204 1064.7725
212700  259   2 295 1050.1163
454 179   1   1  275.5700
125639  127   1 170  785.1037
55442   347   1  76  950.3312
145132  377   1 197 1269.2287
123069  324   1 167  216.1937
188301  155   2 260  426.6313
852 266   2   1 1443.4887
3582381   1   6  790.8412
235423  144   2 333  659.4238
42175   240   1  59 1089.6762
57033   343   1  78  226.8750
177273  285   1 244  392.7200
218558  340   2 305 1680.2700
27784   245   1  39  280.0550
81823   343   1 110  965.0438
76344   326   1 103 1095.6012
114916  356   2 154  436.8838
35563   378   1  49  333.2875
192279  330   2 267  722.0312
61315   148   2  84 1426.5725
219903  343   1 308  791.5738
42612   325   1  60  658.1387
178488  333   2 246  675.1912
9031127   2  14  989.4863
145092  264   1 197  960.1912
71885   353   2  97  595.4050
38137   275   1  53 1004.0912
140149  121   1 190 1870.9350
162052  325   1 223  892.7775
89527   239   2 118  518.1050
59650   326   2  82  432.7837
24709   284   1  34  453.9013
18933   385   1  27  582.3288
24904   335   2  34 1027.5287
213668  339   1 298 3174.1925
110509  330   1 149  469.8188
72462   363   1  98  386.2163
152596  319   1 209 1328.2188
17014   462   1  24  294.9250
33467   250   1  46 1601.4575
5241333   1   9 1651.0988
215094  323   1 300  427.6313
5   121   1 118 1092.2613
204868  260   2 285  781.2325
157415  231   2 215 1323.5750
71081   244   2  96 1059.2088
25420   338   1  35  530.7413
144226  127   1 196 1126.3112
47888   346   2  66  965.4050
216179  329   2 301 1237.6463
29172   368   1  41 1025.9738
168786  147   1 232  680.6213
94035   223   2 124  330.4563
170542  125   2 234  757.2287
160331  233   2 220  636.3900
124163  380   2 167  287.6988
71442   237   1  97  442.2300
80191   274   2 107  871.0338
199309  329   2 277  485.2337
91293   335   2 120

Re: [R] svyhist and svyboxplot

2012-10-13 Thread Muhuri, Pradip (SAMHSA/CBHSQ)

Anthony,

I now can't afford to forget that R is case-sensitive!

Thank you so much!

Pradip Muhuri

From: Anthony Damico [ajdam...@gmail.com]
Sent: Saturday, October 13, 2012 10:10 PM
To: Muhuri, Pradip (SAMHSA/CBHSQ)
Cc: Thomas Lumley; R help
Subject: Re: svyhist and svyboxplot

R is case sensitive

either change
subset (nhis, xsmoke=='Never SMK')
to
subset (nhis, xsmoke=='Never Smk')

or change
labels=c('Current SMK','Former SMK', 'Never Smk')
to
labels=c('Current SMK','Former SMK', 'Never SMK')

but not both  :)


On Sat, Oct 13, 2012 at 10:02 PM, Muhuri, Pradip (SAMHSA/CBHSQ) 
pradip.muh...@samhsa.hhs.govmailto:pradip.muh...@samhsa.hhs.gov wrote:
Hello,


The following code is expected to produce 4 charts. But, I only get charts 1,2 
, 4, NOT CHART # 3.

For Chart# 3, I am getting the following error message: Error in 
tapply(1:NROW(x), list(factor(strata)), function(index) { :   arguments must 
have same length

I would appreciate if someone could  help me resolve the issue.

Thanks,

Pradip


# BELOW IS THE REPRODUCIBLE EXAMPLE
setwd (E:/RDATA)
options(width = 120)
library (survey)
library (KernSmooth)

xd1-
xsmoke age_p psu stratum   wt8
13601   322   2  20  356.5600
32966   338   2  45  434.3562
63493   132   1  87  699.9987
238175  346   1 338  982.8075
174162  340   1 240  273.6313
220206  333   2 308 1477.1688
118133  368   1 159  716.3012
142859  223   1 194 1100.9475
115253  235   2 155  444.3750
61675   331   1  85  769.5963
189813  337   1 263  328.5600
226274  147   2 318  605.8700
41969   371   2  58  597.0150
167667  340   2 230 1030.4637
225103  337   2 316  349.6825
49894   370   2  68  517.7862
98075   346   2 130 1428.7225
180771  350   1 250  652.4188
137057  342   1 186  590.2100
77705   223   1 105 1687.2450
89106   348   1 118  407.6513
208178  350   1 290  556.5000
100403  352   2 133 1481.8200
221571  127   2 310  833.5338
10823   272   1  16 1807.6425
108431  371   2 145  945.6263
68708   146   1  94 1989.3775
23874   323   2  33 1707.8775
150634  319   2 206  761.1500
231232  342   2 326 1487.4113
184654  242   2 255 1715.2375
215312  357   1 300  483.5663
40713   257   2  56 2042.2762
130309  323   1 177  948.5625
25515   255   1  35 2719.7525
235612  283   2 333  603.3537
13755   236   2  20  265.1938
2441333   1   4 1062.1200
157327  377   1 215 2010.6600
66502   320   2  91 1122.9725
230778  155   2 325 1207.3025
74805   354   1 101 1028.5150
166556  150   1 229 1546.9450
91914   168   1 121  428.5350
89651   359   2 118  143.5437
149329  344   2 204 1064.7725
212700  259   2 295 1050.1163
454 179   1   1  275.5700
125639  127   1 170  785.1037
55442   347   1  76  950.3312
145132  377   1 197 1269.2287
123069  324   1 167  216.1937
188301  155   2 260  426.6313
852 266   2   1 1443.4887
3582381   1   6  790.8412
235423  144   2 333  659.4238
42175   240   1  59 1089.6762
57033   343   1  78  226.8750
177273  285   1 244  392.7200
218558  340   2 305 1680.2700
27784   245   1  39  280.0550
81823   343   1 110  965.0438
76344   326   1 103 1095.6012
114916  356   2 154  436.8838
35563   378   1  49  333.2875
192279  330   2 267  722.0312
61315   148   2  84 1426.5725
219903  343   1 308  791.5738
42612   325   1  60  658.1387
178488  333   2 246  675.1912
9031127   2  14  989.4863
145092  264   1 197  960.1912
71885   353   2  97  595.4050
38137   275   1  53 1004.0912
140149  121   1 190 1870.9350
162052  325   1 223  892.7775
89527   239   2 118  518.1050
59650   326   2  82  432.7837
24709   284   1  34  453.9013
18933   385   1  27  582.3288
24904   335   2  34 1027.5287
213668  339   1 298 3174.1925
110509  330   1 149  469.8188
72462   363   1  98  386.2163
152596  319   1 209 1328.2188
17014   462   1  24  294.9250
33467   250   1  46 1601.4575
5241333   1   9 1651.0988
215094  323   1 300  427.6313
5   121   1 118 1092.2613
204868

[R] svyplot

2012-10-10 Thread Muhuri, Pradip (SAMHSA/CBHSQ)

Hello,

Using the svyplot () function, I have plotted four graphs that are saved in  
four different .png files.

I am looking for examples how to redraw the same four graphs within grid 
viewports so that they stay together on a page. The goal is to create one .png 
file that will include all four graphs (2 rows, 2 columns).

Any help would be appreciated.

Thanks,

Pradip

Pradip K. Muhuri,
Statistician
Substance Abuse  Mental Health Services Administration
The Center for Behavioral Health Statistics and Quality
Division of Population Surveys
1 Choke Cherry Road, Room 2-1071
Rockville, MD 20857

Tel: 240-276-1070
Fax: 240-276-1260
e-mail: pradip.muh...@samhsa.hhs.govmailto:pradip.muh...@samhsa.hhs.gov

The Center for Behavioral Health Statistics and Quality your feedback.  Please 
click on the following link to complete a brief customer survey:   
http://cbhsqsurvey.samhsa.govhttp://cbhsqsurvey.samhsa.gov/


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] svyplot

2012-10-10 Thread Muhuri, Pradip (SAMHSA/CBHSQ)

Dear Anthony,

You have been so so helpful!  The par() example code has worked well.

Thanks, 

Pradip

From: Anthony Damico [ajdam...@gmail.com]
Sent: Wednesday, October 10, 2012 5:25 PM
To: Muhuri, Pradip (SAMH/CBHSQ)
Cc: R help
Subject: Re: [R] svyplot

https://stat.ethz.ch/pipermail/r-help/2012-October/324944.html

On Wed, Oct 10, 2012 at 5:05 PM, Muhuri, Pradip (SAMHSA/CBHSQ) 
pradip.muh...@samhsa.hhs.govmailto:pradip.muh...@samhsa.hhs.gov wrote:
Hello,

Using the svyplot () function, I have plotted four graphs that are saved in  
four different .png files.

I am looking for examples how to redraw the same four graphs within grid 
viewports so that they stay together on a page. The goal is to create one .png 
file that will include all four graphs (2 rows, 2 columns).

Any help would be appreciated.

Thanks,

Pradip

Pradip K. Muhuri,
Statistician
Substance Abuse  Mental Health Services Administration
The Center for Behavioral Health Statistics and Quality
Division of Population Surveys
1 Choke Cherry Road, Room 2-1071
Rockville, MD 20857

Tel: 240-276-1070tel:240-276-1070
Fax: 240-276-1260tel:240-276-1260
e-mail: 
pradip.muh...@samhsa.hhs.govmailto:pradip.muh...@samhsa.hhs.govmailto:pradip.muh...@samhsa.hhs.govmailto:pradip.muh...@samhsa.hhs.gov

The Center for Behavioral Health Statistics and Quality your feedback.  Please 
click on the following link to complete a brief customer survey:   
http://cbhsqsurvey.samhsa.govhttp://cbhsqsurvey.samhsa.gov/


[[alternative HTML version deleted]]

__
R-help@r-project.orgmailto:R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] svyhist

2012-10-08 Thread Muhuri, Pradip (SAMHSA/CBHSQ)

Thomas,

Sorry about my repeat typo in the line () function, which caused the distortion 
of the line that did not match in the earlier graph.  The revised code gives me 
the graphs that look lot better (please see the attachment).  Thank you for 
catching that mistake and also for providing clarification regarding the kernel 
density estimator.

Pradip Muhuri

From: Thomas Lumley [tlum...@uw.edu]
Sent: Monday, October 08, 2012 8:40 PM
To: Muhuri, Pradip (SAMHSA/CBHSQ)
Cc: Anthony Damico; R help
Subject: Re: [R] svyhist

The line isn't a theoretical distribution, it's a kernel density
estimator and so should match your histogram.

It looks as though you use exactly the same call
   lines (svysmooth(~dthage, bandwidth=5,subset(nhis, xspd2=='No SPD')), lwd=2)
for each plot, which gives a smooth curve estimating the age at death
for people with No SPD.

To get, eg, age at interview for the SPD group use something like:
 lines (svysmooth(~age_p, bandwidth=5,subset(nhis, xspd2=='SPD')), lwd=2)


   -thomas

On Sun, Oct 7, 2012 at 2:19 PM, Muhuri, Pradip (SAMHSA/CBHSQ)
pradip.muh...@samhsa.hhs.gov wrote:
 Hi Anthony,

 The ylim () has been added to the code (please see below), and I got 4 plots 
 that have the same y -dimension.

 Each plot displays 2 distributions - one as histogram from the data and 
 another one as line (i.e., idealized theoretical normal distribution?).

 My question is, Is there way to change the distribution in the line () 
 function and try other theoretical distribution to approximate the observed 
 distribution?


 Thanks,

 Pradip Muhuri


 

 MyBreaks - c(18,35,45,55,65,75,85,95)

 png(svyhist_no_spd_age_at_inteview.png)
 options( survey.lonely.psu = adjust )
 svyhist (~age_p,
  subset (nhis, xspd2=='No SPD'), breaks=MyBreaks,
  ylim = c(0,0.035),
  main=  ,
  col=grey80, xlab=Age at Interview among those Who had no SPD
  )

 lines (svysmooth(~dthage, bandwidth=5,subset(nhis, xspd2=='No SPD')), lwd=2)

 dev.off ()

 
 From: Anthony Damico [ajdam...@gmail.com]
 Sent: Saturday, October 06, 2012 6:56 AM
 To: Muhuri, Pradip (SAMHSA/CBHSQ)
 Cc: David Winsemius; R help
 Subject: Re: [R] svyhist

 ?ylim says numeric vectors of length 2  - so just the beginning and end.

 ?svyhist doesn't specifically mention the ylim parameter, meaning you should 
 look for a ... in the arguments list and click through to the page for ?hist

 ?hist has an example that shows the ylim parameter only containing the 
 beginning and end values.

 try using

 ylim = c( 0 , 0.030 )

 if you're looking to set the tick marks, look at ?axis   ;)


 On Fri, Oct 5, 2012 at 11:18 PM, Muhuri, Pradip (SAMHSA/CBHSQ) 
 pradip.muh...@samhsa.hhs.govmailto:pradip.muh...@samhsa.hhs.gov wrote:
 Dear Anthony and David,

 Sorry- the earlier-sent plots were mislabeled, which I have corrected and 
 attached.  But, the y-lim issue is yet to be resolved.

 Thanks,

 Pradip Muhuri


 
 From: Anthony Damico [ajdam...@gmail.commailto:ajdam...@gmail.com]
 Sent: Friday, October 05, 2012 7:29 PM
 To: David Winsemius
 Cc: Muhuri, Pradip (SAMHSA/CBHSQ); R help
 Subject: Re: [R] svyhist

 this worked for me -- and doesn't require removing the PSUs from the design  
 :)

 options( survey.lonely.psu = adjust )
 svyhist (~dthage,
 subset (nhis, xspd2=='No SPD'), breaks=MyBreaks, main=  ,
 col=grey80,
 xlab=Age at Death Distribution
 )
 lines (svysmooth(~dthage, bandwidth=5,subset(nhis, xspd2=='No SPD')), lwd=2)


 Dr. Lumley has written quite a bit about single-PSU strata here: 
 http://faculty.washington.edu/tlumley/survey/exmample-lonely.html



 On Fri, Oct 5, 2012 at 7:16 PM, David Winsemius 
 dwinsem...@comcast.netmailto:dwinsem...@comcast.netmailto:dwinsem...@comcast.netmailto:dwinsem...@comcast.net
  wrote:

 On Oct 5, 2012, at 3:33 PM, Muhuri, Pradip (SAMHSA/CBHSQ) wrote:

 Hello,

 I was trying to draw histograms of age at death  and got the following   2 
 error messages:


 1)  Error in tapply(1:NROW(x), list(factor(strata)), function(index) { :

  arguments must have same length

 This is the top of the output of str applied to the data argument you offered 
 to svyhist:


 str(subset (nhis, xspd2==2) )
 List of 9
  $ cluster   :'data.frame': 0 obs. of  1 variable:
   ..$ psu: Factor w/ 47 levels 109.1,115.2,..:
   ..- attr(*, terms)=Classes 'terms', 'formula' length 2 ~psu
   .. .. ..- attr(*, variables)= language list(psu)
   .. .. ..- attr(*, factors)= int [1, 1] 1
   .. .. .. ..- attr(*, dimnames)=List of 2
   .. .. .. .. ..$ : chr psu
   .. .. .. .. ..$ : chr psu

 At least one problem seems pretty clear. No data. That can be corrected by 
 wrapping as.numeric() around the factor on which you are subsetting in two 
 places.

 Another problem may arise when you restrict

Re: [R] svyhist

2012-10-06 Thread Muhuri, Pradip (SAMHSA/CBHSQ)

Hi Anthony,

The ylim () has been added to the code (please see below), and I got 4 plots 
that have the same y -dimension. 

Each plot displays 2 distributions - one as histogram from the data and another 
one as line (i.e., idealized theoretical normal distribution?).  

My question is, Is there way to change the distribution in the line () 
function and try other theoretical distribution to approximate the observed 
distribution?


Thanks,

Pradip Muhuri




MyBreaks - c(18,35,45,55,65,75,85,95)

png(svyhist_no_spd_age_at_inteview.png)
options( survey.lonely.psu = adjust )
svyhist (~age_p,
 subset (nhis, xspd2=='No SPD'), breaks=MyBreaks,
 ylim = c(0,0.035),
 main=  ,
 col=grey80, xlab=Age at Interview among those Who had no SPD
 )

lines (svysmooth(~dthage, bandwidth=5,subset(nhis, xspd2=='No SPD')), lwd=2)

dev.off ()


From: Anthony Damico [ajdam...@gmail.com]
Sent: Saturday, October 06, 2012 6:56 AM
To: Muhuri, Pradip (SAMHSA/CBHSQ)
Cc: David Winsemius; R help
Subject: Re: [R] svyhist

?ylim says numeric vectors of length 2  - so just the beginning and end.

?svyhist doesn't specifically mention the ylim parameter, meaning you should 
look for a ... in the arguments list and click through to the page for ?hist

?hist has an example that shows the ylim parameter only containing the 
beginning and end values.

try using

ylim = c( 0 , 0.030 )

if you're looking to set the tick marks, look at ?axis   ;)


On Fri, Oct 5, 2012 at 11:18 PM, Muhuri, Pradip (SAMHSA/CBHSQ) 
pradip.muh...@samhsa.hhs.govmailto:pradip.muh...@samhsa.hhs.gov wrote:
Dear Anthony and David,

Sorry- the earlier-sent plots were mislabeled, which I have corrected and 
attached.  But, the y-lim issue is yet to be resolved.

Thanks,

Pradip Muhuri



From: Anthony Damico [ajdam...@gmail.commailto:ajdam...@gmail.com]
Sent: Friday, October 05, 2012 7:29 PM
To: David Winsemius
Cc: Muhuri, Pradip (SAMHSA/CBHSQ); R help
Subject: Re: [R] svyhist

this worked for me -- and doesn't require removing the PSUs from the design  :)

options( survey.lonely.psu = adjust )
svyhist (~dthage,
subset (nhis, xspd2=='No SPD'), breaks=MyBreaks, main=  ,
col=grey80,
xlab=Age at Death Distribution
)
lines (svysmooth(~dthage, bandwidth=5,subset(nhis, xspd2=='No SPD')), lwd=2)


Dr. Lumley has written quite a bit about single-PSU strata here: 
http://faculty.washington.edu/tlumley/survey/exmample-lonely.html



On Fri, Oct 5, 2012 at 7:16 PM, David Winsemius 
dwinsem...@comcast.netmailto:dwinsem...@comcast.netmailto:dwinsem...@comcast.netmailto:dwinsem...@comcast.net
 wrote:

On Oct 5, 2012, at 3:33 PM, Muhuri, Pradip (SAMHSA/CBHSQ) wrote:

 Hello,

 I was trying to draw histograms of age at death  and got the following   2 
 error messages:


 1)  Error in tapply(1:NROW(x), list(factor(strata)), function(index) { :

  arguments must have same length

This is the top of the output of str applied to the data argument you offered 
to svyhist:


 str(subset (nhis, xspd2==2) )
List of 9
 $ cluster   :'data.frame': 0 obs. of  1 variable:
  ..$ psu: Factor w/ 47 levels 109.1,115.2,..:
  ..- attr(*, terms)=Classes 'terms', 'formula' length 2 ~psu
  .. .. ..- attr(*, variables)= language list(psu)
  .. .. ..- attr(*, factors)= int [1, 1] 1
  .. .. .. ..- attr(*, dimnames)=List of 2
  .. .. .. .. ..$ : chr psu
  .. .. .. .. ..$ : chr psu

At least one problem seems pretty clear. No data. That can be corrected by 
wrapping as.numeric() around the factor on which you are subsetting in two 
places.

Another problem may arise when you restrict to one class only, namely there 
won't any design to work with. All the clusters  there would be only one 
  no longer have any multiplicity,  and svyhist apparently isn't built to 
handle situation, at least with that design argument.

Error in onestrat(x[index, , drop = FALSE], clusters[index], nPSU[index][1],  :
  Stratum (2) has only one PSU at stage 1

Taking the 'stratum' argument out of the design() spec allows it to proceed, 
but I do not know if that is introducing invalidity in the analysis.
--
David.



 2)  Error in findInterval(mm[, i], gx) : 'vec' contains NAs

 In addition: Warning messages:

 1: In min(x) : no non-missing arguments to min; returning Inf

 2: In max(x) : no non-missing arguments to max; returning -Inf



 I would appreciate if someone could help me resolve these issues.



 Below is reproducible example.

 Thanks,

 Pradip Muhuri



 setwd (E:/RDATA)
 options(width = 120)
 library (survey)
 library (KernSmooth)
 xd1 -
 dthage ypll_75 xspd2 psu stratum wt8
   56  19 2   2  33 1512.7287
   86   0 2   2 129 1830.6400
   81   0 2   1  67  536.1400
   47  28 2   1  17  519.8350
   71   4 1   1 225  254.4087
   72   3

[R] svyhist

2012-10-05 Thread Muhuri, Pradip (SAMHSA/CBHSQ)

Hello,

I was trying to draw histograms of age at death  and got the following   2 
error messages:


1)  Error in tapply(1:NROW(x), list(factor(strata)), function(index) { :

  arguments must have same length



2)  Error in findInterval(mm[, i], gx) : 'vec' contains NAs

In addition: Warning messages:

1: In min(x) : no non-missing arguments to min; returning Inf

2: In max(x) : no non-missing arguments to max; returning -Inf



 I would appreciate if someone could help me resolve these issues.



Below is reproducible example.

Thanks,

Pradip Muhuri



setwd (E:/RDATA)
options(width = 120)
library (survey)
library (KernSmooth)
xd1 -
dthage ypll_75 xspd2 psu stratum wt8
   56  19 2   2  33 1512.7287
   86   0 2   2 129 1830.6400
   81   0 2   1  67  536.1400
   47  28 2   1  17  519.8350
   71   4 1   1 225  254.4087
   72   3 1   1 238  424.4787
   75   0 2   2 115  407.0987
   83   0 2   2  46  622.5137
   79  -4 2   1 300  509.1212
   78  -3 2   1 133  517.3325
   71   4 2   2 328 1179.3063
   64  11 2   1   2  301.5250
   78  -3 2   1  62  253.9025
   65  10 2   2 260  932.6575
   75   0 2   1 247  145.5900
   63  12 2   2 156  247.0650
   71   4 2   1 146  829.4787
   76  -1 2   2 234  432.5437
   76   0 2   1 109  859.6888
   68   7 2   1 228 1236.2975
   64  11 2   2 167  347.5788
   62  13 2   2 312  354.0500
   77   0 2   2 275  882.1938
   78  -3 2   1  28  481.5975
   81   0 2   1 180 1285.5425
   79   0 2   2 205  576.
   70   5 2   1 173  128.3725
   75   0 2   2 189  359.3863
   78   0 2   1 332  512.8062
   74   1 2   2  14  449.0800
   77   0 2   1 242  283.0013
   92   0 2   1 152  915.3200
   69   6 2   2 217  672.7663
   53  22 2   1 290 1430.8812
   81   0 2   2  90  699.1075
   67   8 2   2 316  607.6500
   85   0 2   1 171  312.9850
   93   0 2   2 119  936.1275
   82   0 2   1 118  186.4450
   71   4 2   2 329  729.1213
   43  32 2   1 215  887.6313
   74   1 2   1 180  569.9338
   89   0 2   1 324 1054.0887
   81   0 2   2  47  532.0987
   70   5 2   1  53  450.8750
   75   0 1   1  38  557.9750
   56  19 2   1  17  512.6363
   90   0 2   2  29  569.7888
   70   5 2   1 251  554.2138
   56  19 2   2  14 1114.1762
tor - read.table (textConnection(xd1), header=TRUE, sep='', as.is=TRUE)


# Grouping variable (xspd) to be  factor
tor - within(tor, {
 xspd2 - factor(xspd2,levels=c (1,2),
 labels=c('SPD', 'No SPD'), ordered=TRUE)
   }
  )
# object with survey design variables and data
nhis - svydesign (id=~psu,strat=~stratum, weights=~wt8, data=tor, nest=TRUE)

MyBreaks - c(18,35,45,55,65,75,85,95)

png(svyhist_age_at_death.png)

svyhist (~dthage,
subset (nhis, xspd2==2), breaks=MyBreaks, main=  ,
col=grey80,
xlab=Age at Death Distribution
)
lines (svysmooth(~dthage, bandwidth=5,subset(nhis, xspd2==2)), lwd=2)

dev.off ()






Pradip K. Muhuri
Statistician
Substance Abuse  Mental Health Services Administration
The Center for Behavioral Health Statistics and Quality
Division of Population Surveys
1 Choke Cherry Road, Room 2-1071
Rockville, MD 20857

Tel: 240-276-1070
Fax: 240-276-1260
e-mail: pradip.muh...@samhsa.hhs.govmailto:pradip.muh...@samhsa.hhs.gov

The Center for Behavioral Health Statistics and Quality your feedback.  Please 
click on the following link to complete a brief customer survey:   
http://cbhsqsurvey.samhsa.govhttp://cbhsqsurvey.samhsa.gov/


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] svyhist

2012-10-05 Thread Muhuri, Pradip (SAMHSA/CBHSQ)

Dear Anthony and David,

Thank you so much for your comments and suggestions! 

The sample data set I had embedded in the earlier-sent R script was intended to 
be used for the reproducible example.

Now I have  used Anthony's revised code on the the entire analytic file.  The 
code has worked fine.  Thanks, again.

Attached are the 2 .png files.

The only problem I see is that the y-lim in these 2 plots  is not exactly the 
same.  I have tried this: ylim = c(0,0.005, 0.010, 0.015, 0.020, 0.025, 0.030), 
which 
did not work. Any thoughts?


Pradip Muhuri

###  Revised Code ###

setwd (E:/RDATA)
options(width = 120)
library (survey)
library (KernSmooth)
library (Hmisc)
load(tor.rdata)
#contents (tor)


# Grouping variable (xspd) to be  factor 
tor - within(tor, {
 xspd2 - factor(xspd2,levels=c (1,2),
 labels=c('SPD', 'No SPD'), ordered=TRUE)
   } 
  )
# object with survey design variables and data 
nhis - svydesign (id=~psu,strat=~stratum, weights=~wt8, data=tor, nest=TRUE)

MyBreaks - c(18,35,45,55,65,75,85,95)

png(svyhist_no_spd_age_at_death.png)
options( survey.lonely.psu = adjust )
svyhist (~dthage,
 subset (nhis, mortstat==1  xspd2=='No SPD'), breaks=MyBreaks,
 main=  ,
 col=grey80, xlab=Age at Death among those Who had SPD
 )
lines (svysmooth(~dthage, bandwidth=5,subset(nhis, xspd2=='No SPD')), lwd=2)


dev.off ()


png(svyhist_spd_age_at_death.png)
options( survey.lonely.psu = adjust )
svyhist (~dthage,
 subset (nhis, mortstat==1  xspd2=='SPD'), breaks=MyBreaks,
  #ylim = c(0,0.005, 0.010, 0.015, 0.020, 0.025, 0.030),
 main=  ,
 col=grey80,
 xlab=Age at Death among those Who had no SPD Distribution
 )
lines (svysmooth(~dthage, bandwidth=5,subset(nhis, xspd2=='No SPD')), lwd=2)


dev.off ()

##





From: Anthony Damico [ajdam...@gmail.com]
Sent: Friday, October 05, 2012 7:29 PM
To: David Winsemius
Cc: Muhuri, Pradip (SAMHSA/CBHSQ); R help
Subject: Re: [R] svyhist

this worked for me -- and doesn't require removing the PSUs from the design  :)

options( survey.lonely.psu = adjust )
svyhist (~dthage,
subset (nhis, xspd2=='No SPD'), breaks=MyBreaks, main=  ,
col=grey80,
xlab=Age at Death Distribution
)
lines (svysmooth(~dthage, bandwidth=5,subset(nhis, xspd2=='No SPD')), lwd=2)


Dr. Lumley has written quite a bit about single-PSU strata here: 
http://faculty.washington.edu/tlumley/survey/exmample-lonely.html



On Fri, Oct 5, 2012 at 7:16 PM, David Winsemius 
dwinsem...@comcast.netmailto:dwinsem...@comcast.net wrote:

On Oct 5, 2012, at 3:33 PM, Muhuri, Pradip (SAMHSA/CBHSQ) wrote:

 Hello,

 I was trying to draw histograms of age at death  and got the following   2 
 error messages:


 1)  Error in tapply(1:NROW(x), list(factor(strata)), function(index) { :

  arguments must have same length

This is the top of the output of str applied to the data argument you offered 
to svyhist:


 str(subset (nhis, xspd2==2) )
List of 9
 $ cluster   :'data.frame': 0 obs. of  1 variable:
  ..$ psu: Factor w/ 47 levels 109.1,115.2,..:
  ..- attr(*, terms)=Classes 'terms', 'formula' length 2 ~psu
  .. .. ..- attr(*, variables)= language list(psu)
  .. .. ..- attr(*, factors)= int [1, 1] 1
  .. .. .. ..- attr(*, dimnames)=List of 2
  .. .. .. .. ..$ : chr psu
  .. .. .. .. ..$ : chr psu

At least one problem seems pretty clear. No data. That can be corrected by 
wrapping as.numeric() around the factor on which you are subsetting in two 
places.

Another problem may arise when you restrict to one class only, namely there 
won't any design to work with. All the clusters  there would be only one 
  no longer have any multiplicity,  and svyhist apparently isn't built to 
handle situation, at least with that design argument.

Error in onestrat(x[index, , drop = FALSE], clusters[index], nPSU[index][1],  :
  Stratum (2) has only one PSU at stage 1

Taking the 'stratum' argument out of the design() spec allows it to proceed, 
but I do not know if that is introducing invalidity in the analysis.
--
David.



 2)  Error in findInterval(mm[, i], gx) : 'vec' contains NAs

 In addition: Warning messages:

 1: In min(x) : no non-missing arguments to min; returning Inf

 2: In max(x) : no non-missing arguments to max; returning -Inf



 I would appreciate if someone could help me resolve these issues.



 Below is reproducible example.

 Thanks,

 Pradip Muhuri



 setwd (E:/RDATA)
 options(width = 120)
 library (survey)
 library (KernSmooth)
 xd1 -
 dthage ypll_75 xspd2 psu stratum wt8
   56  19 2   2  33 1512.7287
   86   0 2   2 129 1830.6400
   81   0 2   1  67  536.1400
   47  28 2   1

Re: [R] svyhist

2012-10-05 Thread Muhuri, Pradip (SAMHSA/CBHSQ)

Dear Anthony and David,

Sorry- the earlier-sent plots were mislabeled, which I have corrected and 
attached.  But, the y-lim issue is yet to be resolved. 

Thanks,

Pradip Muhuri



From: Anthony Damico [ajdam...@gmail.com]
Sent: Friday, October 05, 2012 7:29 PM
To: David Winsemius
Cc: Muhuri, Pradip (SAMHSA/CBHSQ); R help
Subject: Re: [R] svyhist

this worked for me -- and doesn't require removing the PSUs from the design  :)

options( survey.lonely.psu = adjust )
svyhist (~dthage,
subset (nhis, xspd2=='No SPD'), breaks=MyBreaks, main=  ,
col=grey80,
xlab=Age at Death Distribution
)
lines (svysmooth(~dthage, bandwidth=5,subset(nhis, xspd2=='No SPD')), lwd=2)


Dr. Lumley has written quite a bit about single-PSU strata here: 
http://faculty.washington.edu/tlumley/survey/exmample-lonely.html



On Fri, Oct 5, 2012 at 7:16 PM, David Winsemius 
dwinsem...@comcast.netmailto:dwinsem...@comcast.net wrote:

On Oct 5, 2012, at 3:33 PM, Muhuri, Pradip (SAMHSA/CBHSQ) wrote:

 Hello,

 I was trying to draw histograms of age at death  and got the following   2 
 error messages:


 1)  Error in tapply(1:NROW(x), list(factor(strata)), function(index) { :

  arguments must have same length

This is the top of the output of str applied to the data argument you offered 
to svyhist:


 str(subset (nhis, xspd2==2) )
List of 9
 $ cluster   :'data.frame': 0 obs. of  1 variable:
  ..$ psu: Factor w/ 47 levels 109.1,115.2,..:
  ..- attr(*, terms)=Classes 'terms', 'formula' length 2 ~psu
  .. .. ..- attr(*, variables)= language list(psu)
  .. .. ..- attr(*, factors)= int [1, 1] 1
  .. .. .. ..- attr(*, dimnames)=List of 2
  .. .. .. .. ..$ : chr psu
  .. .. .. .. ..$ : chr psu

At least one problem seems pretty clear. No data. That can be corrected by 
wrapping as.numeric() around the factor on which you are subsetting in two 
places.

Another problem may arise when you restrict to one class only, namely there 
won't any design to work with. All the clusters  there would be only one 
  no longer have any multiplicity,  and svyhist apparently isn't built to 
handle situation, at least with that design argument.

Error in onestrat(x[index, , drop = FALSE], clusters[index], nPSU[index][1],  :
  Stratum (2) has only one PSU at stage 1

Taking the 'stratum' argument out of the design() spec allows it to proceed, 
but I do not know if that is introducing invalidity in the analysis.
--
David.



 2)  Error in findInterval(mm[, i], gx) : 'vec' contains NAs

 In addition: Warning messages:

 1: In min(x) : no non-missing arguments to min; returning Inf

 2: In max(x) : no non-missing arguments to max; returning -Inf



 I would appreciate if someone could help me resolve these issues.



 Below is reproducible example.

 Thanks,

 Pradip Muhuri



 setwd (E:/RDATA)
 options(width = 120)
 library (survey)
 library (KernSmooth)
 xd1 -
 dthage ypll_75 xspd2 psu stratum wt8
   56  19 2   2  33 1512.7287
   86   0 2   2 129 1830.6400
   81   0 2   1  67  536.1400
   47  28 2   1  17  519.8350
   71   4 1   1 225  254.4087
   72   3 1   1 238  424.4787
   75   0 2   2 115  407.0987
   83   0 2   2  46  622.5137
   79  -4 2   1 300  509.1212
   78  -3 2   1 133  517.3325
   71   4 2   2 328 1179.3063
   64  11 2   1   2  301.5250
   78  -3 2   1  62  253.9025
   65  10 2   2 260  932.6575
   75   0 2   1 247  145.5900
   63  12 2   2 156  247.0650
   71   4 2   1 146  829.4787
   76  -1 2   2 234  432.5437
   76   0 2   1 109  859.6888
   68   7 2   1 228 1236.2975
   64  11 2   2 167  347.5788
   62  13 2   2 312  354.0500
   77   0 2   2 275  882.1938
   78  -3 2   1  28  481.5975
   81   0 2   1 180 1285.5425
   79   0 2   2 205  576.
   70   5 2   1 173  128.3725
   75   0 2   2 189  359.3863
   78   0 2   1 332  512.8062
   74   1 2   2  14  449.0800
   77   0 2   1 242  283.0013
   92   0 2   1 152  915.3200
   69   6 2   2 217  672.7663
   53  22 2   1 290 1430.8812
   81   0 2   2  90  699.1075
   67   8 2   2 316  607.6500
   85   0 2   1 171  312.9850
   93   0 2   2 119  936.1275
   82   0 2   1 118  186.4450
   71   4 2   2 329  729.1213
   43  32 2   1 215  887.6313
   74   1 2   1 180  569.9338
   89   0 2   1 324 1054.0887
   81   0 2   2  47  532.0987
   70   5 2   1  53  450.8750
   75   0 1   1  38  557.9750
   56  19 2   1  17  512.6363
   90   0 2   2

[R] svyby and make.formula

2012-10-02 Thread Muhuri, Pradip (SAMHSA/CBHSQ)


Hello,

Although my R code for the svymean () and svyquantile () functions works fine, 
I am stuck with the svyby () and make.formula () functions.   I got the 
following error messages.

-  Error: object of type 'closure' is not subsettable  # svyby ()
-  Error in xx[[1]] : subscript out of bounds# make.formula ()

A reproducible example is appended below.

I would appreciate if someone could help me.

Thank you in advance.

Pradip Muhuri


Below is a reproducible example ##

setwd (E:/RDATA)

library (survey)

xd1 -
dthage ypll_ler ypll_75 xspd2 psu stratum   wt8 mortstat
 NA NANA 2   1   1 1683.73870
 NA NANA 2   1   1  640.89500
 NA NANA 2   1   1  714.06620
 NA NANA 2   1   1  714.06620
 NA NANA 2   1   1  530.52630
 NA NANA 2   1   1 2205.28630
 NA NANA 2   1  339 1683.73870
 NA NANA 2   1  339  640.89500
 NA NANA 2   1  339  714.06620
 NA NANA 2   1  339  714.06620
 NA NANA 2   1  339  530.52630
 NA NANA 2   1  339 2205.28630
 788.817926   0  2   2 1  592.3100  1
 809.291881   0  2   2 1  1014.7387 1
 875.001076   0  2   2 1  853.4763  1
 875.001076   0  2   2 1  505.1475  1
 885.510514   0  2   2 1  1429.5963 1
 788.817926   0  2   2 339  592.31001
 809.291881   0  2   2 339 1014.73871
 875.001076   0  2   2 339  853.47631
 875.001076   0  2   2 339  505.14751
 885.510514   0  2   2 339 1429.59631
 788.817926   0  2   2 339  592.31001
 809.291881   0  2   2 339 1014.73871
 875.001076   0  2   2 339  853.47631
 875.001076   0  2   2 339  505.14751
 885.510514   0  2   2 339 1429.59631
newdata - read.table (textConnection(xd1), header=TRUE, as.is=TRUE)
dim  (newdata)


# make the grouping variable (xspd)2
newdata$xspd2 - factor(newdata$xspd2,levels=c (1,2),labels=c('SPD', 'No SPD'), 
ordered=TRUE)

nhis - svydesign (id=~psu,strat=~stratum, weights=~wt8, data=newdata, 
nest=TRUE)


# mean age at death - nationwide

svymean( ~dthage, data=nhis ,  subset (nhis, mortstat==1)) 

# mean by SPD status
svyby(~dthage, ~xspd2 ,  design=nhis, svymean )

#percentile
svyquantile(~dthage,  data = nhis ,  subset (nhis, mortstat==1), c( 0 , .25 , 
.5 , .75 , 1 )  )

# percentile by SPD status
svyby(~dthage, ~xspd2, desin=nhis, svyquantile,  c( 0 , .25 , .5 , .75 , 1 ),  
keep.var = F)

# mean for each of the 3 variables

vars - names(nhis) %in% c(dthage, ypll_ler, ypl_75)
vars
svymean(make.formula(vars),nhis,subset (nhis, mortstat==1), na.rm=TRUE)








#

Pradip K. Muhuri
Statistician
Substance Abuse  Mental Health Services Administration
The Center for Behavioral Health Statistics and Quality
Division of Population Surveys
1 Choke Cherry Road, Room 2-1071
Rockville, MD 20857

Tel: 240-276-1070
Fax: 240-276-1260
e-mail: pradip.muh...@samhsa.hhs.govmailto:pradip.muh...@samhsa.hhs.gov

The Center for Behavioral Health Statistics and Quality your feedback.  Please 
click on the following link to complete a brief customer survey:   
http://cbhsqsurvey.samhsa.gov

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] svyboxplot - library (survey)

2012-10-02 Thread Muhuri, Pradip (SAMHSA/CBHSQ)

Hi Thomas,

Thank you so much for your help.

Pradip

From: Thomas Lumley [tlum...@uw.edu]
Sent: Monday, October 01, 2012 6:45 PM
To: Muhuri, Pradip (SAMHSA/CBHSQ)
Cc: Anthony Damico; R help
Subject: Re: [R] svyboxplot - library (survey)

The documentation says The grouping variable in svyboxplot, if
present, must be a factor

  -thomas

On Tue, Oct 2, 2012 at 4:28 AM, Muhuri, Pradip (SAMHSA/CBHSQ)
pradip.muh...@samhsa.hhs.gov wrote:
 Dear Anthony,

 Yes, I can follow the example code you have given.  But, do you know from the 
 code shown below (following Thomas Lumley's Complex Surveys) why I am 
 getting the boxplot of dthage for just xspd=1, not xspd2=2?

 My intent is the make this code work so that I can generate similar plots on 
 other continuous variable.

 Any help will be appreciated.

 Thanks,

 Pradip

 nhis - svydesign (id=~psu, strat=~stratum, weights=~wt8,
 data=tor, nest=TRUE)

 svyboxplot (dthage~xspd2, subset (nhis, mortstat==1), col=gray80,
  varwidth=TRUE, ylab=Age at Death, xlab=SPD Status: 1-SPD, 
 2=No SPD)

 Pradip K. Muhuri, PhD
 Statistician
 Substance Abuse  Mental Health Services Administration
 The Center for Behavioral Health Statistics and Quality
 Division of Population Surveys
 1 Choke Cherry Road, Room 2-1071
 Rockville, MD 20857

 Tel: 240-276-1070
 Fax: 240-276-1260
 e-mail: pradip.muh...@samhsa.hhs.govmailto:pradip.muh...@samhsa.hhs.gov

 The Center for Behavioral Health Statistics and Quality your feedback.  
 Please click on the following link to complete a brief customer survey:   
 http://cbhsqsurvey.samhsa.govhttp://cbhsqsurvey.samhsa.gov/

 From: Anthony Damico [mailto:ajdam...@gmail.com]
 Sent: Monday, October 01, 2012 10:07 AM
 To: Muhuri, Pradip (SAMHSA/CBHSQ)
 Cc: R help
 Subject: Re: [R] svyboxplot - library (survey)

 using a slight modification of the example shown in ?svyboxplot

 # load survey library
 library(survey)

 # load example data
 data(api)

 # create an example svydesign
 dstrat - svydesign(id = ~1, strata = ~stype, weights = ~pw, data = apistrat,
 fpc = ~fpc)

 # set the plot window to display 1 plot x 2 plots
 par(mfrow=c(1,2))

 # generate two example boxplots
 svyboxplot(enroll~stype,dstrat,all.outliers=TRUE)
 svyboxplot(enroll~1,dstrat)

 # done

 # alternative: not as nice

 # set the plot window to display 2 plots x 1 plot
 par(mfrow=c(2,1))

 # generate two example boxplots
 svyboxplot(enroll~stype,dstrat,all.outliers=TRUE)
 svyboxplot(enroll~1,dstrat)

 # done

 On Mon, Oct 1, 2012 at 9:50 AM, Muhuri, Pradip (SAMHSA/CBHSQ) 
 pradip.muh...@samhsa.hhs.govmailto:pradip.muh...@samhsa.hhs.gov wrote:
 Hello,

 I have used the library (survey) package for boxplots using the following 
 code.

 Could anyone please tell me why I am getting only 1  boxplot instead of 2 
 boxplots (1-SPD,  2-No SPD).

 What changes in the following code would be required to get 2 boxplots in the 
 same plot frame?

 Thanks,

 Pradip

 ###
 nhis - svydesign (id=~psu, strat=~stratum, weights=~wt8,
 data=tor, nest=TRUE)

 svyboxplot (dthage~xspd2, subset (nhis, mortstat==1), col=gray80,
  varwidth=TRUE, ylab=Age at Death, xlab=SPD Status: 1-SPD, 
 2=No SPD)

 Pradip K. Muhuri
 Statistician
 Substance Abuse  Mental Health Services Administration
 The Center for Behavioral Health Statistics and Quality
 Division of Population Surveys
 1 Choke Cherry Road, Room 2-1071
 Rockville, MD 20857

 Tel: 240-276-1070
 Fax: 240-276-1260
 e-mail: pradip.muh...@samhsa.hhs.govmailto:pradip.muh...@samhsa.hhs.gov

 The Center for Behavioral Health Statistics and Quality your feedback.  
 Please click on the following link to complete a brief customer survey:   
 http://cbhsqsurvey.samhsa.gov

 vide commented, minimal, self-contained, reproducible code.
 __
 R-help@r-project.orgmailto:R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

--
Thomas Lumley
Professor of Biostatistics
University of Auckland

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] svyby and make.formula

2012-10-02 Thread Muhuri, Pradip (SAMHSA/CBHSQ)

Dear Anthony,

Thank you very much for helping me resolve the issues.  I now got all the 
results, which I intended to generate.

Pradip Muhuri


From: Anthony Damico [ajdam...@gmail.com]
Sent: Tuesday, October 02, 2012 9:50 PM
To: Muhuri, Pradip (SAMHSA/CBHSQ)
Cc: R help
Subject: Re: svyby and make.formula

please double-check that you've got all of your parameters correct by typing 
?svymean ?svyby  and ?make.formula before you send questions to r-help  :)


# you spelled design wrong and probably need to throw out your NA values.  try 
this

# percentile by SPD status
svyby(~dthage, ~xspd2, design=nhis, svyquantile,  c( 0 , .25 , .5 , .75 , 1 ),  
keep.var = F, na.rm = TRUE)



# mean for each of the 3 variables

# this returns a logical vector, but make.formula requires a character vector
vars - names(nhis) %in% c(dthage, ypll_ler, ypl_75)
vars
svymean(make.formula(vars),nhis,subset (nhis, mortstat==1), na.rm=TRUE)


# create a character vector instead

# note you also spelled the third variable wrong-- it will break unless you 
correct that
vars - c(dthage, ypll_ler, ypll_75)

# this statement has two survey design parameters, which won't work.  which one 
do you want to use?
svymean(make.formula(vars),nhis,subset (nhis, mortstat==1), na.rm=TRUE)

# pick one
svymean(make.formula(vars),nhis, na.rm=TRUE)
svymean(make.formula(vars),subset(nhis, mortstat==1), na.rm=TRUE)
# all of the variables in vars are NA whenever mortstat isn't 1, so they give 
the same results



On Tue, Oct 2, 2012 at 7:51 PM, Muhuri, Pradip (SAMHSA/CBHSQ) 
pradip.muh...@samhsa.hhs.govmailto:pradip.muh...@samhsa.hhs.gov wrote:

Hello,

Although my R code for the svymean () and svyquantile () functions works fine, 
I am stuck with the svyby () and make.formula () functions.   I got the 
following error messages.

-  Error: object of type 'closure' is not subsettable  # svyby ()
-  Error in xx[[1]] : subscript out of bounds# make.formula ()

A reproducible example is appended below.

I would appreciate if someone could help me.

Thank you in advance.

Pradip Muhuri


Below is a reproducible example ##

setwd (E:/RDATA)

library (survey)

xd1 -
dthage ypll_ler ypll_75 xspd2 psu stratum   wt8 mortstat
 NA NANA 2   1   1 1683.73870
 NA NANA 2   1   1  640.89500
 NA NANA 2   1   1  714.06620
 NA NANA 2   1   1  714.06620
 NA NANA 2   1   1  530.52630
 NA NANA 2   1   1 2205.28630
 NA NANA 2   1  339 1683.73870
 NA NANA 2   1  339  
640.8950tel:339%20%C2%A0640.89500
 NA NANA 2   1  339  
714.0662tel:339%20%C2%A0714.06620
 NA NANA 2   1  339  
714.0662tel:339%20%C2%A0714.06620
 NA NANA 2   1  339  
530.5263tel:339%20%C2%A0530.52630
 NA NANA 2   1  339 2205.28630
 788.817926   0  2   2 1  592.3100  1
 809.291881   0  2   2 1  1014.7387 1
 875.001076   0  2   2 1  853.4763  1
 875.001076   0  2   2 1  505.1475  1
 885.510514   0  2   2 1  1429.5963 1
 788.817926   0  2   2 339  
592.3100tel:339%20%C2%A0592.31001
 809.291881   0  2   2 339 1014.73871
 875.001076   0  2   2 339  
853.4763tel:339%20%C2%A0853.47631
 875.001076   0  2   2 339  
505.1475tel:339%20%C2%A0505.14751
 885.510514   0  2   2 339 1429.59631
 788.817926   0  2   2 339  
592.3100tel:339%20%C2%A0592.31001
 809.291881   0  2   2 339 1014.73871
 875.001076   0  2   2 339  
853.4763tel:339%20%C2%A0853.47631
 875.001076   0  2   2 339  
505.1475tel:339%20%C2%A0505.14751
 885.510514   0  2   2 339 1429.59631
newdata - read.table (textConnection(xd1), header=TRUE, 
as.ishttp://as.is=TRUE)
dim  (newdata)


# make the grouping variable (xspd)2
newdata$xspd2 - factor(newdata$xspd2,levels=c (1,2),labels=c('SPD', 'No SPD'), 
ordered=TRUE)

nhis - svydesign (id=~psu,strat=~stratum, weights=~wt8, data=newdata, 
nest=TRUE)


# mean age at death - nationwide

svymean( ~dthage, data=nhis ,  subset (nhis, mortstat==1))

# mean by SPD status
svyby(~dthage, ~xspd2 ,  design=nhis, svymean )

#percentile
svyquantile(~dthage,  data = nhis ,  subset (nhis, mortstat==1), c( 0 , .25 , 
.5 , .75 , 1 )  )

# percentile by SPD status
svyby(~dthage, ~xspd2

[R] svyboxplot - library (survey)

2012-10-01 Thread Muhuri, Pradip (SAMHSA/CBHSQ)

Hello,

I have used the library (survey) package for boxplots using the following code. 

Could anyone please tell me why I am getting only 1  boxplot instead of 2 
boxplots (1-SPD,  2-No SPD).  

What changes in the following code would be required to get 2 boxplots in the 
same plot frame?

Thanks,

Pradip

###
nhis - svydesign (id=~psu, strat=~stratum, weights=~wt8, 
data=tor, nest=TRUE)

svyboxplot (dthage~xspd2, subset (nhis, mortstat==1), col=gray80,
 varwidth=TRUE, ylab=Age at Death, xlab=SPD Status: 1-SPD, 2=No 
SPD) 


Pradip K. Muhuri
Statistician
Substance Abuse  Mental Health Services Administration
The Center for Behavioral Health Statistics and Quality
Division of Population Surveys
1 Choke Cherry Road, Room 2-1071
Rockville, MD 20857
 
Tel: 240-276-1070
Fax: 240-276-1260
e-mail: pradip.muh...@samhsa.hhs.gov
 
The Center for Behavioral Health Statistics and Quality your feedback.  Please 
click on the following link to complete a brief customer survey:   
http://cbhsqsurvey.samhsa.gov

vide commented, minimal, self-contained, reproducible code.
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] svyboxplot - library (survey)

2012-10-01 Thread Muhuri, Pradip (SAMHSA/CBHSQ)

Dear Anthony,

Yes, I can follow the example code you have given.  But, do you know from the 
code shown below (following Thomas Lumley's Complex Surveys) why I am getting 
the boxplot of dthage for just xspd=1, not xspd2=2?

My intent is the make this code work so that I can generate similar plots on 
other continuous variable.

Any help will be appreciated.

Thanks,

Pradip




nhis - svydesign (id=~psu, strat=~stratum, weights=~wt8,
data=tor, nest=TRUE)

svyboxplot (dthage~xspd2, subset (nhis, mortstat==1), col=gray80,
 varwidth=TRUE, ylab=Age at Death, xlab=SPD Status: 1-SPD, 2=No 
SPD)



Pradip K. Muhuri, PhD
Statistician
Substance Abuse  Mental Health Services Administration
The Center for Behavioral Health Statistics and Quality
Division of Population Surveys
1 Choke Cherry Road, Room 2-1071
Rockville, MD 20857

Tel: 240-276-1070
Fax: 240-276-1260
e-mail: pradip.muh...@samhsa.hhs.govmailto:pradip.muh...@samhsa.hhs.gov

The Center for Behavioral Health Statistics and Quality your feedback.  Please 
click on the following link to complete a brief customer survey:   
http://cbhsqsurvey.samhsa.govhttp://cbhsqsurvey.samhsa.gov/

From: Anthony Damico [mailto:ajdam...@gmail.com]
Sent: Monday, October 01, 2012 10:07 AM
To: Muhuri, Pradip (SAMHSA/CBHSQ)
Cc: R help
Subject: Re: [R] svyboxplot - library (survey)

using a slight modification of the example shown in ?svyboxplot


# load survey library
library(survey)

# load example data
data(api)

# create an example svydesign
dstrat - svydesign(id = ~1, strata = ~stype, weights = ~pw, data = apistrat,
fpc = ~fpc)

# set the plot window to display 1 plot x 2 plots
par(mfrow=c(1,2))

# generate two example boxplots
svyboxplot(enroll~stype,dstrat,all.outliers=TRUE)
svyboxplot(enroll~1,dstrat)

# done



# alternative: not as nice

# set the plot window to display 2 plots x 1 plot
par(mfrow=c(2,1))

# generate two example boxplots
svyboxplot(enroll~stype,dstrat,all.outliers=TRUE)
svyboxplot(enroll~1,dstrat)

# done






On Mon, Oct 1, 2012 at 9:50 AM, Muhuri, Pradip (SAMHSA/CBHSQ) 
pradip.muh...@samhsa.hhs.govmailto:pradip.muh...@samhsa.hhs.gov wrote:
Hello,

I have used the library (survey) package for boxplots using the following code.

Could anyone please tell me why I am getting only 1  boxplot instead of 2 
boxplots (1-SPD,  2-No SPD).

What changes in the following code would be required to get 2 boxplots in the 
same plot frame?

Thanks,

Pradip

###
nhis - svydesign (id=~psu, strat=~stratum, weights=~wt8,
data=tor, nest=TRUE)

svyboxplot (dthage~xspd2, subset (nhis, mortstat==1), col=gray80,
 varwidth=TRUE, ylab=Age at Death, xlab=SPD Status: 1-SPD, 2=No 
SPD)


Pradip K. Muhuri
Statistician
Substance Abuse  Mental Health Services Administration
The Center for Behavioral Health Statistics and Quality
Division of Population Surveys
1 Choke Cherry Road, Room 2-1071
Rockville, MD 20857

Tel: 240-276-1070
Fax: 240-276-1260
e-mail: pradip.muh...@samhsa.hhs.govmailto:pradip.muh...@samhsa.hhs.gov

The Center for Behavioral Health Statistics and Quality your feedback.  Please 
click on the following link to complete a brief customer survey:   
http://cbhsqsurvey.samhsa.gov

vide commented, minimal, self-contained, reproducible code.
__
R-help@r-project.orgmailto:R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Bar chart in ascending order for each level of X

2011-07-14 Thread Muhuri, Pradip (SAMHSA/CBHSQ)


Hello List,

The question is how to plot a bar chart in which bars are sorted  in ascending 
order for each level of X.  I would appreciate receiving your advice and help.


Thanks,

Pradip Muhuri

**

The following codes work when producing the chart in which bars are NOT sorted. 
 Please see the output.

* Data File
5.1 8.7 1.6
3.7 7.4 2.8
10.412.03.5
4.4 8.8 1.7
2.0 3.5 0.7
6.7 11.03.1
5.3 6.7 1.8

###
#source(C:/Documents and Settings/pradip.muhuri/My 
Documents/disorders_chart1.R) - Please ignore this line

#R Scripts for bar chart begin here

# Read drug data from tab-delimited data set
drug_data - read.table(C:/Documents and Settings/pradip.muhuri/My 
Documents/xdrug.dat, header=FALSE,
col.names=c(Age_1217, Age_1825, Age_26Plus),
row.names = c(White,Black,Native American/Alaska 
Native,Hawaiian/OPI,Asian, More than One Race, Hispanic),
sep=\t)

# Graph drug use disorder data with adjacent bars using rainbow colors
barplot(as.matrix(drug_data), main=Past-Year Illicit Drug Use Disorders by 
Race/Ethnicity, ylab= Past-Year Use Disorder Rate (%), beside=TRUE, 
col=rainbow(7))
legend(topright, c(White,Black,Native American/Alaska 
Native,Hawaiian/OPI,Asian, More than One Race, Hispanic), cex=0.6, 
bty=n, fill=rainbow(7));

Bar_Graph.pdf
Description: Bar_Graph.pdf
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Asymmetrical Confidence Interval

2011-06-20 Thread Muhuri, Pradip (SAMHSA/CBHSQ)

Dear Patrick,


I do agree with you that it is a very simple problem.  Actually, I do have the 
following SAS program written to compute the asymmetrical confidence interval.

As a new user of R, I just wanted to see the corresponding codes in R if they 
already exist.

Thanks,

Pradip 



*SAS program begins here;

/
MEAN = prevalence rate
PLOWER = lower 95% confidence limit for the rate
PPER = upper 95% confidence limit for the rate
TLOWER = lower 95% confidence limit for the total
TUPPER = upper 95% confidence limit for the total

Calculate the 95% CI FOR PREVALENCE RATES AND TOTALS 
/

IF MEAN=0 OR MEAN=1 THEN DO;
  L=.;
  NUMBER=.;
  A=.;  B=.;
  PLOWER=.; PUPPER=.; TLOWER=.; TUPPER=.;
END;


ELSE DO;

  L=LOG(MEAN/(1-MEAN));
  NUMBER=SEMEAN/(MEAN*(1-MEAN));
  A=L-1.96*NUMBER;
  B=L+1.96*NUMBER;
  PLOWER=1/(1+EXP(-A));  PUPPER=1/(1+EXP(-B)); 
  TLOWER=WSUM*PLOWER;TUPPER=WSUM*PUPPER;

END;

RUN;

*SAS program ends here:


Pradip K. Muhuri, PhD
Statistician
Substance Abuse  Mental Health Services Administration
The Center for Behavioral Health Statistics and Quality
Division of Population Surveys
1 Choke Cherry Road, Room 7-1023
Rockville, MD 20857

Tel: 240-276-1070
Fax: 240-276-1260
e-mail: pradip.muh...@samhsa.hhs.gov


The Center for Behavioral Health Statistics and Quality your feedback.  Please 
click on the following link to complete a brief customer survey:   
http://cbhsqsurvey.samhsa.gov

-Original Message-
From: Patrick Connolly [mailto:p_conno...@slingshot.co.nz] 
Sent: Sunday, June 19, 2011 1:57 AM
To: Muhuri, Pradip (SAMHSA/CBHSQ)
Cc: r-help@r-project.org; 'tlum...@u.washington.edu'
Subject: Re: [R] Asymetrical Confidence Interval

On Thu, 16-Jun-2011 at 04:43PM -0400, Muhuri, Pradip (SAMHSA/CBHSQ) wrote:

| 
| Dear List,
| 

| I wanted to calculate the asymmetrical confidence interval based on
| the sample statistic and standard error that available from the
| published report (complex survey-based).

| The calculation details can be seen from pages 17-18 of the
| document at the following link:
| http://www.oas.samhsa.gov/nsduh/2k5MRB/2k5statInference.pdf.

| 
| Could someone tell me whether R has any function included in it
| survey or other contributed package of R.

There might be one in a package somewhere, but it's so trivial to make
your own function by using the information you already have.

This is sounding suspiciously like a homework question. 


-- 
~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.  
   ___Patrick Connolly   
 {~._.~}   Great minds discuss ideas
 _( Y )_ Average minds discuss events 
(:_~*~_:)  Small minds discuss people  
 (_)-(_)  . Eleanor Roosevelt
  
~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Asymetrical Confidence Interval

2011-06-16 Thread Muhuri, Pradip (SAMHSA/CBHSQ)


Dear List,

I wanted to calculate the asymmetrical confidence interval based on the sample 
statistic and standard error that available from the published report (complex 
survey-based).

The calculation details can be seen from pages 17-18 of the document at the 
following link: http://www.oas.samhsa.gov/nsduh/2k5MRB/2k5statInference.pdf.


Could someone tell me whether R has any function included in it survey or 
other contributed package of R.

Thank you in advance,

Pradip Muhuri

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Contributed Packages - Hmisc survey

2011-06-01 Thread Muhuri, Pradip (SAMHSA/CBHSQ)

Hello List,

Could someone tell  why I can't install the Himsc and survey packages for R 
version 2.13.0 (2011-04-13)? What am I doing wrong here?

Thanks,

Pradip


 install.packages (Hmisc, dependencies=TRUE)
--- Please select a CRAN mirror for use in this session ---
Warning: unable to access index for repository 
http://watson.nci.nih.gov/cran_mirror/bin/windows/contrib/2.13
Warning message:
In getDependencies(pkgs, dependencies, available, lib) :
  package 'Hmisc' is not available (for R version 2.13.0)



 install.packages (survey, dependencies=TRUE)
Warning: unable to access index for repository 
http://watson.nci.nih.gov/cran_mirror/bin/windows/contrib/2.13
Warning message:
In getDependencies(pkgs, dependencies, available, lib) :
  package 'survey' is not available (for R version 2.13.0)

Pradip K. Muhuri, PhD
Statistician
Substance Abuse  Mental Health Services Administration
The Center for Behavioral Health Statistics and Quality
Division of Population Surveys
1 Choke Cherry Road, Room 7-1023
Rockville, MD 20857

Tel: 240-276-1070
Fax: 240-276-1260
e-mail: pradip.muh...@samhsa.hhs.govmailto:pradip.muh...@samhsa.hhs.gov

The Center for Behavioral Health Statistics and Quality your feedback.  Please 
click on the following link to complete a brief customer survey:   
http://cbhsqsurvey.samhsa.gov

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] R in batch mode

2011-05-24 Thread Muhuri, Pradip (SAMHSA/CBHSQ)

Hi Everyone,



I am a new R user and trying to run R jobs in batch mode.



Robert Muenchen (2009), in his book R for SAS and SPSS Users, has suggested 
writing a small batch file like mR.bat as shown below:



C:\Program File\R\R-2.10.0\bin\Rterm.exe --no-restore --no-save  %1  
%1.Rout 2 1



Could anyone tell me in which directory or subdirectory I should save the 
mR.bat file?



I would appreciate receiving any support you could extend on this subject.



Thank you in advance,



Pradip


Pradip K. Muhuri, PhD
Statistician
Substance Abuse  Mental Health Services Administration
The Center for Behavioral Health Statistics and Quality
Division of Population Surveys
1 Choke Cherry Road, Room 7-1023
Rockville, MD 20857

Tel: 240-276-1070
Fax: 240-276-1260
e-mail: pradip.muh...@samhsa.hhs.govmailto:pradip.muh...@samhsa.hhs.gov

The Center for Behavioral Health Statistics and Quality your feedback.  Please 
click on the following link to complete a brief customer survey:   
http://cbhsqsurvey.samhsa.gov

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R in batch mode

2011-05-24 Thread Muhuri, Pradip (SAMHSA/CBHSQ)

Dear Jonathan (and List),

Sorry for bothering you again, and I am requesting your further guidance on 
this subject. Below are the steps, which I have followed. But I got an error 
message.



1. The content of myR.bat is as follows: 

C:\R\bin\i386\R\R-2.10.0\bin\Rterm.exe --no-restore --no-save  %1  %1.Rout 
2 1

2. I have saved that .bat file in the subdirectory C:\R\bin\i386\R\R-2.10.0\bin.

3. on the R prompt, I have issued the following: setwd(E:/R)

4. Then I have issued the following: myR dateR.R.

Error: unexpected symbol in myR dateR.R

What am I doing wrong?

Please help resolve the issue.

Thanks,

Pradip



Pradip K. Muhuri
Statistician
Substance Abuse  Mental Health Services Administration
The Center for Behavioral Health Statistics and Quality
Division of Population Surveys
1 Choke Cherry Road, Room 7-1023
Rockville, MD 20857
 
Tel: 240-276-1070
Fax: 240-276-1260
e-mail: pradip.muh...@samhsa.hhs.gov
 

The Center for Behavioral Health Statistics and Quality your feedback.  Please 
click on the following link to complete a brief customer survey:   
http://cbhsqsurvey.samhsa.gov

-Original Message-
From: Jonathan Daily [mailto:biomathjda...@gmail.com] 
Sent: Tuesday, May 24, 2011 1:18 PM
To: Muhuri, Pradip (SAMHSA/CBHSQ)
Cc: R-help@r-project.org
Subject: Re: [R] R in batch mode

Save it anywhere that is on your search path, which can be seen by
typing path into the command line.

On Tue, May 24, 2011 at 12:40 PM, Muhuri, Pradip (SAMHSA/CBHSQ)
pradip.muh...@samhsa.hhs.gov wrote:
 Hi Everyone,



 I am a new R user and trying to run R jobs in batch mode.



 Robert Muenchen (2009), in his book R for SAS and SPSS Users, has suggested 
 writing a small batch file like mR.bat as shown below:



 C:\Program File\R\R-2.10.0\bin\Rterm.exe --no-restore --no-save  %1  
 %1.Rout 2 1



 Could anyone tell me in which directory or subdirectory I should save the 
 mR.bat file?



 I would appreciate receiving any support you could extend on this subject.



 Thank you in advance,



 Pradip


 Pradip K. Muhuri, PhD
 Statistician
 Substance Abuse  Mental Health Services Administration
 The Center for Behavioral Health Statistics and Quality
 Division of Population Surveys
 1 Choke Cherry Road, Room 7-1023
 Rockville, MD 20857

 Tel: 240-276-1070
 Fax: 240-276-1260
 e-mail: pradip.muh...@samhsa.hhs.govmailto:pradip.muh...@samhsa.hhs.gov

 The Center for Behavioral Health Statistics and Quality your feedback.  
 Please click on the following link to complete a brief customer survey:   
 http://cbhsqsurvey.samhsa.gov

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
===
Jon Daily
Technician
===
#!/usr/bin/env outside
# It's great, trust me.
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

88 matches

Mail list logo