[R] R Error: wrong result size (...), expected ... or 1 (minimal example provided)
Hello, I am reposting my question with a reproducible example/minimal dataset (6 rows) this time. I have written a user-defined function (myFunc below) with ten arguments. When calling the function, I get the following message: �Error: wrong result size (0), expected 2 or 1�. I am not getting the desired output dataset that will have 2 rows. How would I resolve the issue? Any hints would be appreciated. These results are from the following code chunk outside myFunc: addmargins(table(xanloid_set$cohort_type)) NMPR_Cohort OID_Cohort Other Sum 2 1 3 6 . Thanks, Pradip Muhuri # myFunc_rev.R setwd (H:/R/cis_data) library(dplyr) rm(list = ls()) # data object - description temp - id intdate anldate oiddate herdate cohort_type 1 2004-11-04 2002-07-18 2001-07-07 2003-11-03 NMPR_Cohort 2 2004-10-24 NA 2002-10-13 NA OID_Cohort 3 2004-10-10 NA NA NA Other 4 2004-09-01 1999-08-10 NA 2002-11-04 NMPR_Cohort 5 2004-09-04 1997-10-05 NA NA Other 6 2004-10-25 NA NA 2011-11-04 Other # read the data object xanloid_set - read.table(textConnection(temp), colClasses=c(character, Date, Date, Date, Date, character), header=TRUE, as.is=TRUE ) # print the data object xanloid_set # Define user-defined function myFunc - function (newdata, oridata, cohort, value, xdate_to_int_time, xflag, idate, xdate, xdate_to_int_time_cat, year) { newdata -filter (oridata, cohort== value ) %% mutate(xdate_to_int_time = ifelse(xflag==1, (idate-xdate)/365.25, NA), xdate_to_int_time_cat = cut(xdate_to_int_time, breaks=c(0,1,2,3,4,5,6,7), include.lowest=TRUE, stringsAsFactors = FALSE) ) addmargins(with(newdata, table(year, xdate_to_int_time_cat, na.rm=TRUE))) } # invoke user defined function myFunc ( newdata=nmpr_nmproid, oridata=xanloid_set, cohort=xanloid_set$cohort_type, value= NMPR_Cohort, xdate_to_int_time=anl_to_int_time, xflag=xanloid_set$anlflag, idate=xanloid_set$intdate, xdate=xanloid_set$anldate, xdate_to_int_time_cat=xanloid_set$anl_to_int_time_cat, year=xanloid_set$xyear ) # tabulate cohort_type addmargins(table(xanloid_set$cohort_type)) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R Error: wrong result size (...), expected ... or 1”
Hello, I have written a user-defined function (myFunc below) with ten arguments. When calling the function, I get the following message: �Error: wrong result size (816841), expected 52939 or 1�. myFunc involves a data frame (named xanloid_set), which has 816841 rows. R is correct to say that I was expecting only 52939 rows because of the filter() function. These results are from the following code outside myFunc: addmargins(table(xanloid_set$cohort_type)) NMPR_Cohort OID_Cohort Others Sum 52939 158192 605710 816841 How would I resolve the issue: error message from the muFunc? Any hints would be appreciated. Thanks, Pradip Muhuri #count_nmpr_oid_nmproid_by_year.R setwd (H:/R/cis_data) library(dplyr) library(knitr) rm(list = ls()) myFunc - function (newdata, oridata, cohort, value, xdate_to_int_time, xflag, idate, xdate, xdate_to_int_time_cat, year) { newdata -filter (oridata, cohort== value ) %% mutate(xdate_to_int_time = ifelse(xflag==1, (idate-xdate)/365.25, NA), xdate_to_int_time_cat = cut(xdate_to_int_time, breaks=c(0,1,2,3,4,5,6,7), include.lowest=TRUE, stringsAsFactors = FALSE) ) addmargins(with(newdata, table(year, xdate_to_int_time_cat))) } load(xanloid_set.rdata) myFunc ( newdata=nmpr_nmproid, oridata=xanloid_set, cohort=xanloid_set$cohort_type, value= NMPR_Cohort, xdate_to_int_time=anl_to_int_time, xflag=xanloid_set$anlflag, idate=xanloid_set$intdate, xdate=xanloid_set$anldate, xdate_to_int_time_cat=xanloid_set$anl_to_int_time_cat, year=xanloid_set$xyear ) Pradip K. Muhuri, PhD SAMHSA/CBHSQ 1 Choke Cherry Road, Room 2-1071 Rockville, MD 20857 Tel: 240-276-1070 Fax: 240-276-1260 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R example codes for direct standardization of rates
Hello Terry, Thank you so much for sending me this reference. Pradip Pradip K. Muhuri, PhD SAMHSA/CBHSQ 1 Choke Cherry Road, Room 2-1071 Rockville, MD 20857 Tel: 240-276-1070 Fax: 240-276-1260 From: Therneau, Terry M., Ph.D. [mailto:thern...@mayo.edu] Sent: Wednesday, January 07, 2015 4:39 PM To: Muhuri, Pradip (SAMHSA/CBHSQ); r-help@R-project.org Subject: Re: R example codes for direct standardization of rates The pyears() and survexp() routines in the survival package are designed for these calculations. See the technical report #63 of the Mayo Biostat group for examples http://www.mayo.edu/research/departments-divisions/department-health-sciences-research/division-biomedical-statistics-infomatics/technical-reportshttp://www.mayo.edu/research/departments-divisions/department-health-sciences-research/division-biomedical-statistics-informatics/technical-reports Terry Therneau -- begin included message --- I am looking for R example codes to compute age-standardized death rates by smoking and psychological distress status using person-years of observation created from the National Health Interview Survey Linked Mortality Files. Any help with the example codes or references will be appreciated. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R function to convert person-level observations to person-period observations
Hello David, Thank you so much for your advice.The revision of the code as reve - data[, event] in the function (but with no changing of the example data) seems to provide the desired results (shown below). These 3 subjects are followed for 5 years. Subject A experienced the event in year 2, and subject C experienced the event in year 3 while subject B were censored at the end follow-up period (i.e., year 5). The person-period observations now seem to be consistent with the person-level observations. Do you see any issues? Regards, Pradip ### ## person-level observations ID dead studyyrs 1 A12 2 B05 3 C13 ## person-period observation ID dead studyyrs 1 A01 2 A12 3 B01 4 B02 5 B03 6 B04 7 B05 8 C01 9 C02 10 C13 Pradip K. Muhuri, PhD SAMHSA/CBHSQ 1 Choke Cherry Road, Room 2-1071 Rockville, MD 20857 Tel: 240-276-1070 Fax: 240-276-1260 -Original Message- From: David Barron [mailto:dnbar...@gmail.com] Sent: Saturday, January 03, 2015 10:19 AM To: Muhuri, Pradip (SAMHSA/CBHSQ) Cc: r-help@r-project.org Subject: Re: [R] R function to convert person-level observations to person-period observations Your data are wrong. The 'event' variable (dead in your example) needs to be 1 for cases that end in an event and 0 for spells that are censored: yours is the other way around. If you change the 'dead' variable to c(1,0,1) you will get the desired result. If you really need to reverse the behaviour of the function, change the line reve - !data[, event] to reve - data[, event] David On 3 January 2015 at 13:20, Muhuri, Pradip (SAMHSA/CBHSQ) pradip.muh...@samhsa.hhs.gov wrote: Hello, I was trying to convert person-level observations to person-period observations using an R custom function obtained from the UCLA web site (http://www.ats.ucla.edu/stat/r/faq/person_period.htm). Please see my reproducible example below. The function (PLPP) in the R script takes five arguments. 1) data (i.e., the data set to be converted) 2) id (i.e., the identifier for each observation) 3) period (i.e., number pf periods the person or observation was followed-up) 4) event (i.e., the variable that indicates whether the event occurred or not or whether the observation was censored (depending on which direction you are converting). 5) direction which indicates whether the function should go from person-level to person-period or from person-period to person-level. On my example data set, the R script ran successfully. Based on 3 person-level observations (A died in year 2, B is censored in year 5, C died in year 3), I get 10 period-level observations - correct results. But the issue is that the value of the dead indicator variable is incorrect. I have a gut feeling that the function needs to tweaked a bit to get desired results. Correct results ID dead studyyrs 1 A12 2 B05 3 C13 Incorrect results - the dead column ID deadstudyyrs 1 A01 2 A02 3 B01 4 B02 5 B03 6 B04 7 B15 8 C01 9 C02 10 C03 Desired results ID deadstudyyrs 1 A01 2 A12 3 B01 4 B02 5 B03 6 B04 7 B05 8 C01 9 C02 10 C13 I would appreciate receiving your help or hints for resolving the issue. Thanks, ## Below is my reproducible code is shown below) ## Below is my data frame (3 observations) df - data.frame( ID=LETTERS[1:3], dead=c(1,0,1), studyyrs=c(2,5,3) ) df ## Person-Level Person-Period Converter Function - Source: http://www.ats.ucla.edu/stat/r/faq/person_period.htm PLPP - function(data, id, period, event, direction = c(period, level)) { ## Data Checking and Verification Steps stopifnot(is.matrix(data) || is.data.frame(data)) stopifnot(c(id, period, event) %in% c(colnames(data), 1:ncol(data))) if (any(is.na(data[, c(id, period, event)]))) { stop(PLPP cannot currently handle missing data in the id, period, or event variables) } ## Do the conversion - Source: http://www.ats.ucla.edu/stat/r/faq/person_period.htm switch(match.arg(direction), period = { index - rep(1:nrow(data), data[, period]) idmax - cumsum(data[, period]) reve - !data[, event] dat - data[index, ] dat[, period] - ave(dat[, period], dat[, id], FUN = seq_along) dat[, event] - 0 dat[idmax
[R] R function to convert person-level observations to person-period observations
Hello, I was trying to convert person-level observations to person-period observations using an R custom function obtained from the UCLA web site (http://www.ats.ucla.edu/stat/r/faq/person_period.htm). Please see my reproducible example below. The function (PLPP) in the R script takes five arguments. 1) data (i.e., the data set to be converted) 2) id (i.e., the identifier for each observation) 3) period (i.e., number pf periods the person or observation was followed-up) 4) event (i.e., the variable that indicates whether the event occurred or not or whether the observation was censored (depending on which direction you are converting). 5) direction which indicates whether the function should go from person-level to person-period or from person-period to person-level. On my example data set, the R script ran successfully. Based on 3 person-level observations (A died in year 2, B is censored in year 5, C died in year 3), I get 10 period-level observations - correct results. But the issue is that the value of the dead indicator variable is incorrect. I have a gut feeling that the function needs to tweaked a bit to get desired results. Correct results ID dead studyyrs 1 A12 2 B05 3 C13 Incorrect results - the dead column ID deadstudyyrs 1 A01 2 A02 3 B01 4 B02 5 B03 6 B04 7 B15 8 C01 9 C02 10 C03 Desired results ID deadstudyyrs 1 A01 2 A12 3 B01 4 B02 5 B03 6 B04 7 B05 8 C01 9 C02 10 C13 I would appreciate receiving your help or hints for resolving the issue. Thanks, ## Below is my reproducible code is shown below) ## Below is my data frame (3 observations) df - data.frame( ID=LETTERS[1:3], dead=c(1,0,1), studyyrs=c(2,5,3) ) df ## Person-Level Person-Period Converter Function - Source: http://www.ats.ucla.edu/stat/r/faq/person_period.htm PLPP - function(data, id, period, event, direction = c(period, level)) { ## Data Checking and Verification Steps stopifnot(is.matrix(data) || is.data.frame(data)) stopifnot(c(id, period, event) %in% c(colnames(data), 1:ncol(data))) if (any(is.na(data[, c(id, period, event)]))) { stop(PLPP cannot currently handle missing data in the id, period, or event variables) } ## Do the conversion - Source: http://www.ats.ucla.edu/stat/r/faq/person_period.htm switch(match.arg(direction), period = { index - rep(1:nrow(data), data[, period]) idmax - cumsum(data[, period]) reve - !data[, event] dat - data[index, ] dat[, period] - ave(dat[, period], dat[, id], FUN = seq_along) dat[, event] - 0 dat[idmax, event] - reve}, level = { tmp - cbind(data[, c(period, id)], i = 1:nrow(data)) index - as.vector(by(tmp, tmp[, id], FUN = function(x) x[which.max(x[, period]), i])) dat - data[index, ] dat[, event] - as.integer(!dat[, event]) }) rownames(dat) - NULL return(dat) } tpp - PLPP(data = df, id = ID, period = studyyrs, event = dead, direction = period) tpp Pradip K. Muhuri, SAMHSA/CBHSQ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R example codes for direct standardization of rates (Reference: Thoma's Lumley's survey package)
Hello, I am looking for R example codes to compute age-standardized death rates by smoking and psychological distress status using person-years of observation created from the National Health Interview Survey Linked Mortality Files. Any help with the example codes or references will be appreciated. Thanks, Pradip Pradip K. Muhuri SAMHSA/CBHSQ 1 Choke Cherry Road, Room 2-1071 Rockville, MD 20857 Tel: 240-276-1070 Fax: 240-276-1260 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R example codes for direct standardization of rates (Reference: Thoma's Lumley's survey package)
Hi Anthony, Thank you for sending me your well-documented R scripts that are meant for age-adjusted rate calculations. I will keep you posted on the implementation of these scripts in the context of my analyses. Pradip Pradip K. Muhuri, PhD SAMHSA/CBHSQ 1 Choke Cherry Road, Room 2-1071 Rockville, MD 20857 Tel: 240-276-1070 Fax: 240-276-1260 From: Anthony Damico [mailto:ajdam...@gmail.com] Sent: Tuesday, December 30, 2014 3:01 PM To: Muhuri, Pradip (SAMHSA/CBHSQ) Cc: r-help@r-project.org Subject: Re: [R] R example codes for direct standardization of rates (Reference: Thoma's Lumley's survey package) hi pradip hope you're doing well! these two scripts have age adjustment calculations, but neither are specific to nhis. the nhanes example is probably closer to what you're trying to do :) https://github.com/ajdamico/usgsd/blob/master/National%20Health%20and%20Nutrition%20Examination%20Survey/2009-2010%20interview%20plus%20laboratory%20-%20download%20and%20analyze.R https://github.com/ajdamico/usgsd/blob/master/National%20Vital%20Statistics%20System/replicate%20age-adjusted%20death%20rate.R On Tue, Dec 30, 2014 at 2:55 PM, Muhuri, Pradip (SAMHSA/CBHSQ) pradip.muh...@samhsa.hhs.govmailto:pradip.muh...@samhsa.hhs.gov wrote: Hello, I am looking for R example codes to compute age-standardized death rates by smoking and psychological distress status using person-years of observation created from the National Health Interview Survey Linked Mortality Files. Any help with the example codes or references will be appreciated. Thanks, Pradip Pradip K. Muhuri SAMHSA/CBHSQ 1 Choke Cherry Road, Room 2-1071 Rockville, MD 20857 Tel: 240-276-1070tel:240-276-1070 Fax: 240-276-1260tel:240-276-1260 [[alternative HTML version deleted]] __ R-help@r-project.orgmailto:R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Getting the most recent dates in a new column from dates in four columns using the dplyr package (mutate verb)
Hello Jeff, Your code has given me desired results, and your advice is well taken. I agree with you regarding the use of logical indexing for testing conditions. Thank you so much for your time and advice. Pradip Pradip K. Muhuri SAMHSA/CBHSQ 1 Choke Cherry Road, Room 2-1071 Rockville, MD 20857 Tel: 240-276-1070 Fax: 240-276-1260 -Original Message- From: Jeff Newmiller [mailto:jdnew...@dcn.davis.ca.us] Sent: Thursday, December 04, 2014 1:20 PM To: Muhuri, Pradip (SAMHSA/CBHSQ); r-help@r-project.org Subject: Re: [R] Getting the most recent dates in a new column from dates in four columns using the dplyr package (mutate verb) There is something weird going on with mutate's interaction with the scalar Date objects. It seems to be passing them to max as constants of mode double. Regardless, use of rowwise should be very rare, and you are definitely abusing it. Learn to work with vectors of values rather than one value at a time. new3 - example.data %% mutate( oiddate = pmax( mrjdate, cocdate, inhdate, haldate, na.rm=TRUE) , na.date.cases= as.numeric( !is.na( oiddate ) ) ) You might find it more useful to not convert the result of is.na to numeric... logical indexing can use that more efficiently than testing which rows have na.date.cases==1. --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. On December 3, 2014 7:43:37 PM PST, Muhuri, Pradip (SAMHSA/CBHSQ) pradip.muh...@samhsa.hhs.gov wrote: Hello Chel and David, Thank you very much for providing new insights into this issue. Here is one more question. Why does the mutate () give incorrect results here? # The following gives INCORRECT results - mutated()ed object na.date.cases = ifelse(!is.na(oiddate),1,0) # The following gives CORRECT results new2$na.date.cases = ifelse(!is.na(new2$oiddate),1,0) ### reproducible example - slightly revised/modified ### library(dplyr) # data object - description temp - id mrjdate cocdate inhdate haldate 1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2 NA NA NA NA 3 2009-10-24 NA 2011-10-13 NA 4 2007-10-10 NA NA NA 5 2006-09-01 2005-08-10 NA NA 6 2007-09-04 2011-10-05 NA NA 7 2005-10-25 NA NA 2011-11-04 # read the data object example.data - read.table(textConnection(temp), colClasses=c(character, Date, Date, Date, Date), header=TRUE, as.is=TRUE ) # create a new column -dplyr solution (Acknowledgement: Arun) new1 - example.data %% rowwise() %% mutate(oiddate=as.Date(max(mrjdate,cocdate, inhdate, haldate, na.rm=TRUE), origin='1970-01-01'), na.date.cases = ifelse(!is.na(oiddate),1,0) ) # create a new column - Base R solution (Acknowlegement: Mark Sharp) new2 - example.data new2$oiddate - as.Date(sapply(seq_along(new2$id), function(row) { if (all(is.na(unlist(example.data[row, c('mrjdate','cocdate', 'inhdate', 'haldate')] { max_d - NA } else { max_d - max(unlist(example.data[row, c('mrjdate','cocdate', 'inhdate', 'haldate')]), na.rm = TRUE) } max_d}), origin = 1970-01-01) new2$na.date.cases = ifelse(!is.na(new2$oiddate),1,0) identical(new1, new2) table(new1$oiddate) table(new2$oiddate) # print records print (new1); print(new2) Pradip K. Muhuri, PhD SAMHSA/CBHSQ 1 Choke Cherry Road, Room 2-1071 Rockville, MD 20857 Tel: 240-276-1070 Fax: 240-276-1260 -Original Message- From: Chel Hee Lee [mailto:chl...@mail.usask.ca] Sent: Wednesday, December 03, 2014 8:48 PM To: Muhuri, Pradip (SAMHSA/CBHSQ); r-help@r-project.org Subject: Re: [R] Getting the most recent dates in a new column from dates in four columns using the dplyr package (mutate verb) The output in the object 'new1' are apparently same the output in the object 'new2'. Are you trying to compare the entries of two outputs 'new1' and 'new2'? If so, the function 'all()' would be useful: all(new1 == new2, na.rm=TRUE) [1] TRUE If you are interested in the comparison of two objects in terms of class, then the function 'identical()' is useful: attributes(new1) $names [1] id mrjdate cocdate inhdate haldate oldflag $class [1] rowwise_df tbl_df tbldata.frame $row.names [1] 1 2 3 4 5 6 7 attributes(new2) $names [1] id mrjdate cocdate inhdate haldate oiddate
Re: [R] Getting the most recent dates in a new column from dates in four columns using the dplyr package (mutate verb)
Hello, Two alternative approaches - mutate() vs. sapply() - were used to get the desired results (i.e., creating a new column of the most recent date from 4 dates ) with help from Arun and Mark on this forum. I now find that the two data objects (created using two different approaches) are not identical although results are exactly the same. identical(new1, new2) [1] FALSE Please see the reproducible example below. I don't understand why the code returns FALSE here. Any hints/comments will be appreciated. Thanks, Pradip # reproducible example library(dplyr) # data object - description temp - id mrjdate cocdate inhdate haldate 1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2 NA NA NA NA 3 2009-10-24 NA 2011-10-13 NA 4 2007-10-10 NA NA NA 5 2006-09-01 2005-08-10 NA NA 6 2007-09-04 2011-10-05 NA NA 7 2005-10-25 NA NA 2011-11-04 # read the data object example.data - read.table(textConnection(temp), colClasses=c(character, Date, Date, Date, Date), header=TRUE, as.is=TRUE ) # create a new column -dplyr solution (Acknowledgement: Arun) new1 - example.data %% rowwise() %% mutate(oldflag=as.Date(max(mrjdate,cocdate, inhdate, haldate, na.rm=TRUE), origin='1970-01-01')) # create a new column - Base R solution (Acknowlegement: Mark Sharp) new2 - example.data new2$oiddate - as.Date(sapply(seq_along(new2$id), function(row) { if (all(is.na(unlist(example.data[row, c('mrjdate','cocdate', 'inhdate', 'haldate')] { max_d - NA } else { max_d - max(unlist(example.data[row, c('mrjdate','cocdate', 'inhdate', 'haldate')]), na.rm = TRUE) } max_d}), origin = 1970-01-01) identical(new1, new2) # print records print (new1); print(new2) Pradip K. Muhuri SAMHSA/CBHSQ 1 Choke Cherry Road, Room 2-1071 Rockville, MD 20857 Tel: 240-276-1070 Fax: 240-276-1260 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Muhuri, Pradip (SAMHSA/CBHSQ) Sent: Sunday, November 09, 2014 6:11 AM To: 'Mark Sharp' Cc: r-help@r-project.org Subject: Re: [R] Getting the most recent dates in a new column from dates in four columns using the dplyr package (mutate verb) Hi Mark, Your code has also given me the results I expected. Thank you so much for your help. Regards, Pradip Pradip K. Muhuri, PhD SAMHSA/CBHSQ 1 Choke Cherry Road, Room 2-1071 Rockville, MD 20857 Tel: 240-276-1070 Fax: 240-276-1260 -Original Message- From: Mark Sharp [mailto:msh...@txbiomed.org] Sent: Sunday, November 09, 2014 3:01 AM To: Muhuri, Pradip (SAMHSA/CBHSQ) Cc: r-help@r-project.org Subject: Re: [R] Getting the most recent dates in a new column from dates in four columns using the dplyr package (mutate verb) Pradip, mutate() works on the entire column as a vector so that you find the maximum of the entire data set. I am almost certain there is some nice way to handle this, but the sapply() function is a standard approach. max() does not want a dataframe thus the use of unlist(). Using your definition of data1: data3 - data1 data3$oidflag - as.Date(sapply(seq_along(data3$id), function(row) { if (all(is.na(unlist(data1[row, -1] { max_d - NA } else { max_d - max(unlist(data1[row, -1]), na.rm = TRUE) } max_d}), origin = 1970-01-01) data3 idmrjdatecocdateinhdatehaldateoidflag 1 1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2008-07-18 2 2 NA NA NA NA NA 3 3 2009-10-24 NA 2011-10-13 NA 2011-10-13 4 4 2007-10-10 NA NA NA 2007-10-10 5 5 2006-09-01 2005-08-10 NA NA 2006-09-01 6 6 2007-09-04 2011-10-05 NA NA 2011-10-05 7 7 2005-10-25 NA NA 2011-11-04 2011-11-04 R. Mark Sharp, Ph.D. Director of Primate Records Database Southwest National Primate Research Center Texas Biomedical Research Institute P.O. Box 760549 San Antonio, TX 78245-0549 Telephone: (210)258-9476 e-mail: msh...@txbiomed.org NOTICE: This E-Mail (including attachments) is confidential and may be legally privileged. It is covered by the Electronic Communications Privacy Act, 18 U.S.C.2510-2521. If you are not the intended recipient, you are hereby notified that any retention, dissemination, distribution or copying of this communication is strictly prohibited. Please reply to the sender that you have received this message in error, then delete it. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
Re: [R] Getting the most recent dates in a new column from dates in four columns using the dplyr package (mutate verb)
Hello Chel and David, Thank you very much for providing new insights into this issue. Here is one more question. Why does the mutate () give incorrect results here? # The following gives INCORRECT results - mutated()ed object na.date.cases = ifelse(!is.na(oiddate),1,0) # The following gives CORRECT results new2$na.date.cases = ifelse(!is.na(new2$oiddate),1,0) ### reproducible example - slightly revised/modified ### library(dplyr) # data object - description temp - id mrjdate cocdate inhdate haldate 1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2 NA NA NA NA 3 2009-10-24 NA 2011-10-13 NA 4 2007-10-10 NA NA NA 5 2006-09-01 2005-08-10 NA NA 6 2007-09-04 2011-10-05 NA NA 7 2005-10-25 NA NA 2011-11-04 # read the data object example.data - read.table(textConnection(temp), colClasses=c(character, Date, Date, Date, Date), header=TRUE, as.is=TRUE ) # create a new column -dplyr solution (Acknowledgement: Arun) new1 - example.data %% rowwise() %% mutate(oiddate=as.Date(max(mrjdate,cocdate, inhdate, haldate, na.rm=TRUE), origin='1970-01-01'), na.date.cases = ifelse(!is.na(oiddate),1,0) ) # create a new column - Base R solution (Acknowlegement: Mark Sharp) new2 - example.data new2$oiddate - as.Date(sapply(seq_along(new2$id), function(row) { if (all(is.na(unlist(example.data[row, c('mrjdate','cocdate', 'inhdate', 'haldate')] { max_d - NA } else { max_d - max(unlist(example.data[row, c('mrjdate','cocdate', 'inhdate', 'haldate')]), na.rm = TRUE) } max_d}), origin = 1970-01-01) new2$na.date.cases = ifelse(!is.na(new2$oiddate),1,0) identical(new1, new2) table(new1$oiddate) table(new2$oiddate) # print records print (new1); print(new2) Pradip K. Muhuri, PhD SAMHSA/CBHSQ 1 Choke Cherry Road, Room 2-1071 Rockville, MD 20857 Tel: 240-276-1070 Fax: 240-276-1260 -Original Message- From: Chel Hee Lee [mailto:chl...@mail.usask.ca] Sent: Wednesday, December 03, 2014 8:48 PM To: Muhuri, Pradip (SAMHSA/CBHSQ); r-help@r-project.org Subject: Re: [R] Getting the most recent dates in a new column from dates in four columns using the dplyr package (mutate verb) The output in the object 'new1' are apparently same the output in the object 'new2'. Are you trying to compare the entries of two outputs 'new1' and 'new2'? If so, the function 'all()' would be useful: all(new1 == new2, na.rm=TRUE) [1] TRUE If you are interested in the comparison of two objects in terms of class, then the function 'identical()' is useful: attributes(new1) $names [1] id mrjdate cocdate inhdate haldate oldflag $class [1] rowwise_df tbl_df tbldata.frame $row.names [1] 1 2 3 4 5 6 7 attributes(new2) $names [1] id mrjdate cocdate inhdate haldate oiddate $row.names [1] 1 2 3 4 5 6 7 $class [1] data.frame I hope this helps. Chel Hee Lee On 12/03/2014 04:10 PM, Muhuri, Pradip (SAMHSA/CBHSQ) wrote: Hello, Two alternative approaches - mutate() vs. sapply() - were used to get the desired results (i.e., creating a new column of the most recent date from 4 dates ) with help from Arun and Mark on this forum. I now find that the two data objects (created using two different approaches) are not identical although results are exactly the same. identical(new1, new2) [1] FALSE Please see the reproducible example below. I don't understand why the code returns FALSE here. Any hints/comments will be appreciated. Thanks, Pradip # reproducible example library(dplyr) # data object - description temp - id mrjdate cocdate inhdate haldate 1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2 NA NA NA NA 3 2009-10-24 NA 2011-10-13 NA 4 2007-10-10 NA NA NA 5 2006-09-01 2005-08-10 NA NA 6 2007-09-04 2011-10-05 NA NA 7 2005-10-25 NA NA 2011-11-04 # read the data object example.data - read.table(textConnection(temp), colClasses=c(character, Date, Date, Date, Date), header=TRUE, as.is=TRUE ) # create a new column -dplyr solution (Acknowledgement: Arun) new1 - example.data %% rowwise() %% mutate(oldflag=as.Date(max(mrjdate,cocdate, inhdate, haldate, na.rm=TRUE), origin='1970-01-01')) # create a new column - Base R solution (Acknowlegement: Mark Sharp) new2 - example.data new2$oiddate - as.Date(sapply(seq_along(new2$id), function(row) { if (all(is.na(unlist(example.data
[R] no non-missing arguments to max; returning -Inf [2(dplyr/mutate()]
Hello, With dplyr mutate(), the code below creates a new column (oiddate), which is the maximum of the four dates (mrjdate,cocdate, inhdate, haldate). The code seems to provide the results (presented below) I desired. But, the issue is that I am getting the following warning message: 1: In max(13113, NA_real_, 14336, NA_real_, na.rm = TRUE) : no non-missing arguments to max; returning -Inf 2. Is this warning message harmful? Any hints how to tweak the code in order to correct the problem or avoid this message? Please note that I did not get this warning message when I executed the code on the reproducible example data posted to this forum in the past and that I am now getting this warning when applying the code on the actual working data file. Thanks to Arun, Mark and others on this forum for their help with tweaking the code in the past. Sorry for not providing the reproducible example this time. Thanks, Pradip Muhuri # R script followed by console (log and output) # setwd (H:/R/cis_study) library(dplyr) load(xd2012.rdata) # create a new column of the max date from four dates test - xd2012 %% rowwise() %% mutate( oidflag= ifelse(mrjflag==1 | cocflag==1 | inhflag==1 | halflag==1, 1, 0), oiddate=as.Date(max(mrjdate,cocdate, inhdate, haldate, na.rm=TRUE), origin='1970-01-01')) %% filter(oidflag==1) %% select( mrjdate, cocdate, inhdate, haldate, oiddate) head(test) warnings(2) ## below is from the console load(xd2012.rdata) # create a new column of the max date from four dates test - xd2012 %% + rowwise() %% + mutate( oidflag= ifelse(mrjflag==1 | cocflag==1 | inhflag==1 | halflag==1, 1, 0), + oiddate=as.Date(max(mrjdate,cocdate, inhdate, haldate, na.rm=TRUE), origin='1970-01-01')) %% + filter(oidflag==1) %% + select( mrjdate, cocdate, inhdate, haldate, oiddate) There were 50 or more warnings (use warnings() to see the first 50) head(test) Source: local data frame [6 x 5] mrjdate cocdateinhdatehaldateoiddate 1 2003-02-22NA 2006-03-10 2005-09-17 2006-03-10 2 2007-12-07NA NA NA 2007-12-07 3 1994-05-15NA NA NA 1994-05-15 4 2003-04-19NA NA NA 2003-04-19 5 2009-11-13NA NA NA 2009-11-13 6 1973-10-08NA NA 1974-01-04 1974-01-04 warnings(2) Warning messages: 1: In max(13113, NA_real_, 14336, NA_real_, na.rm = TRUE) : no non-missing arguments to max; returning -Inf 2 2: In max(13113, NA_real_, 14336, NA_real_, na.rm = TRUE) : no non-missing arguments to max; returning -Inf 2 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Muhuri, Pradip (SAMHSA/CBHSQ) Sent: Monday, November 10, 2014 1:09 PM To: 'Mark Sharp' Cc: r-help@r-project.org Subject: Re: [R] range () does not remove NA's with complete.cases() for dates (dplyr/mutate) Mark, Thank you very much for further looking into this issue. So, the ugly solution is better! Would you like to bring to Hadley's attention that mutate does set the NA value for the new column? Regards, Pradip Pradip K. Muhuri, PhD SAMHSA/CBHSQ 1 Choke Cherry Road, Room 2-1071 Rockville, MD 20857 Tel: 240-276-1070 Fax: 240-276-1260 -Original Message- From: Mark Sharp [mailto:msh...@txbiomed.org] Sent: Monday, November 10, 2014 12:23 PM To: Muhuri, Pradip (SAMHSA/CBHSQ) Cc: r-help@r-project.org Subject: Re: [R] range () does not remove NA's with complete.cases() for dates (dplyr/mutate) Pradip, For some reason mutate is not setting the is.NA value for the new column. Note the output below using your data structures. ## It looks at first as if the second element of both columns are NA. data2$mrjdate[2] [1] NA data2$oiddate[2] [1] NA ## for convenience mrj - data2$mrjdate[2] oid - data2$oiddate[2] mode(mrj) [1] numeric mode(oid) [1] numeric str(mrj) Date[1:1], format: NA str(oid) Date[1:1], format: NA class(mrj) [1] Date class(oid) [1] Date ## But note: identical(mrj, oid) [1] FALSE all.equal(mrj, oid) [1] 'is.NA' value mismatch: 0 in current 1 in target ## functioning code data2$mrjdate[2] data2$oiddate[2] mrj - data2$mrjdate[2] oid - data2$oiddate[2] mode(mrj) mode(oid) str(mrj) str(oid) class(mrj) class(oid) # But note: identical(mrj, oid) all.equal(mrj, oid) ## This ugly solution does not have the problem. data3 - data1 data3$oiddate - as.Date(sapply(seq_along(data3$id), function(row) { + if (all(is.na(unlist(data1[row, -1] { + max_d - NA + } else { + max_d - max(unlist(data1[row, -1]), na.rm = TRUE) + } + max_d}), + origin = 1970-01-01) range(data3$mrjdate[complete.cases(data3$mrjdate)]) [1] 2004-11-04 2009-10-24 range(data3$cocdate[complete.cases(data3$cocdate)]) [1] 2005-08-10 2011-10-05 range(data3$inhdate[complete.cases(data3$inhdate)]) [1] 2005-07-07 2011-10-13 range(data3$haldate
[R] R dplyr solution vs. Base R solution for the slect column total
Hello, I am looking for a dplyr or base R solution for the column total - JUST FOR THE LAST COLUMN in the example below. The following code works, giving me the total for each column - This is not exactly what I want. rbind(test, colSums(test)) I only want the total for the very last column. I am struggling with this part of the code: rbind(test, c(Total, colSums(test, ...))) I have searched for a solution on Stack Oveflow. I found some mutate() code for the cumsum but no luck for the select column total. Is there a dplyr solution for the select column total? Any hints will be appreciated. Thanks, Pradip Muhuri ### The following is from the console - the R script with reproducible example is also appended. mrjflag cocflag inhflag halflag oidflag count 10 0 0 0 0 256 20 0 0 1 1 256 30 0 1 0 1 256 40 0 1 1 1 256 50 1 0 0 1 256 60 1 0 1 1 256 70 1 1 0 1 256 80 1 1 1 1 256 91 0 0 0 1 256 10 1 0 0 1 1 256 11 1 0 1 0 1 256 12 1 0 1 1 1 256 13 1 1 0 0 1 256 14 1 1 0 1 1 256 15 1 1 1 0 1 256 16 1 1 1 1 1 256 17 8 8 8 8 15 4096 ### below is the reproducible example library(dplyr) # generate data dlist - rep( list( 0:1 ), 4 ) data - do.call(expand.grid, drbind) data$id - 1:nrow(data) names(data) - c('mrjflag', 'cocflag', 'inhflag', 'halflag') # mutate a column and then sumamrize test - data %% mutate(oidflag= ifelse(mrjflag==1 | cocflag==1 | inhflag==1 | halflag==1, 1, 0)) %% group_by(mrjflag,cocflag, inhflag, halflag, oidflag) %% summarise(count=n()) %% arrange(mrjflag,cocflag, inhflag, halflag, oidflag) # This works, giving me the total for each column - This is not what I exactly want. rbind(test, colSums(test)) # I only want the total for the very last column rbind(test, c(Total, colSums(test, ...))) Pradip K. Muhuri, PhD SAMHSA/CBHSQ 1 Choke Cherry Road, Room 2-1071 Rockville, MD 20857 Tel: 240-276-1070 Fax: 240-276-1260 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R dplyr solution vs. Base R solution for the slect column total
Hi Boris, That gives me the total for each of the 6 columns of the data frame. I want the column sum just for the last column. Thanks, Pradip Muhuri -Original Message- From: Boris Steipe [mailto:boris.ste...@utoronto.ca] Sent: Sunday, November 30, 2014 12:50 PM To: Muhuri, Pradip (SAMHSA/CBHSQ) Cc: r-help@r-project.org Subject: Re: [R] R dplyr solution vs. Base R solution for the slect column total try: sum(test$count) B. On Nov 30, 2014, at 12:01 PM, Muhuri, Pradip (SAMHSA/CBHSQ) pradip.muh...@samhsa.hhs.gov wrote: Hello, I am looking for a dplyr or base R solution for the column total - JUST FOR THE LAST COLUMN in the example below. The following code works, giving me the total for each column - This is not exactly what I want. rbind(test, colSums(test)) I only want the total for the very last column. I am struggling with this part of the code: rbind(test, c(Total, colSums(test, ...))) I have searched for a solution on Stack Oveflow. I found some mutate() code for the cumsum but no luck for the select column total. Is there a dplyr solution for the select column total? Any hints will be appreciated. Thanks, Pradip Muhuri ### The following is from the console - the R script with reproducible example is also appended. mrjflag cocflag inhflag halflag oidflag count 10 0 0 0 0 256 20 0 0 1 1 256 30 0 1 0 1 256 40 0 1 1 1 256 50 1 0 0 1 256 60 1 0 1 1 256 70 1 1 0 1 256 80 1 1 1 1 256 91 0 0 0 1 256 10 1 0 0 1 1 256 11 1 0 1 0 1 256 12 1 0 1 1 1 256 13 1 1 0 0 1 256 14 1 1 0 1 1 256 15 1 1 1 0 1 256 16 1 1 1 1 1 256 17 8 8 8 8 15 4096 ### below is the reproducible example library(dplyr) # generate data dlist - rep( list( 0:1 ), 4 ) data - do.call(expand.grid, drbind) data$id - 1:nrow(data) names(data) - c('mrjflag', 'cocflag', 'inhflag', 'halflag') # mutate a column and then sumamrize test - data %% mutate(oidflag= ifelse(mrjflag==1 | cocflag==1 | inhflag==1 | halflag==1, 1, 0)) %% group_by(mrjflag,cocflag, inhflag, halflag, oidflag) %% summarise(count=n()) %% arrange(mrjflag,cocflag, inhflag, halflag, oidflag) # This works, giving me the total for each column - This is not what I exactly want. rbind(test, colSums(test)) # I only want the total for the very last column rbind(test, c(Total, colSums(test, ...))) Pradip K. Muhuri, PhD SAMHSA/CBHSQ 1 Choke Cherry Road, Room 2-1071 Rockville, MD 20857 Tel: 240-276-1070 Fax: 240-276-1260 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R dplyr solution vs. Base R solution for the slect column total
Hi Boris, Sorry for not being explicit when replying to your first email. I wanted to say it does not work when row-binding. I want the following output. Thanks, Pradip 11 3 22 4 Total 7 ### Below is the console ## test - data.frame(first=c(1,2), second=c(3,4)) test first second 1 1 3 2 2 4 sum(test$second) [1] 7 rbind(test, sum(test$second)) first second 1 1 3 2 2 4 3 7 7 Pradip K. Muhuri, PhD SAMHSA/CBHSQ 1 Choke Cherry Road, Room 2-1071 Rockville, MD 20857 Tel: 240-276-1070 Fax: 240-276-1260 -Original Message- From: Boris Steipe [mailto:boris.ste...@utoronto.ca] Sent: Sunday, November 30, 2014 5:51 PM To: Muhuri, Pradip (SAMHSA/CBHSQ) Cc: r-help@r-project.org Subject: Re: [R] R dplyr solution vs. Base R solution for the slect column total No it doesn't ... consider: test - data.frame(first=c(1,2), second=c(3,4)) test first second 1 1 3 2 2 4 sum(test$second) [1] 7 On Nov 30, 2014, at 3:48 PM, Muhuri, Pradip (SAMHSA/CBHSQ) pradip.muh...@samhsa.hhs.gov wrote: Hi Boris, That gives me the total for each of the 6 columns of the data frame. I want the column sum just for the last column. Thanks, Pradip Muhuri -Original Message- From: Boris Steipe [mailto:boris.ste...@utoronto.ca] Sent: Sunday, November 30, 2014 12:50 PM To: Muhuri, Pradip (SAMHSA/CBHSQ) Cc: r-help@r-project.org Subject: Re: [R] R dplyr solution vs. Base R solution for the slect column total try: sum(test$count) B. On Nov 30, 2014, at 12:01 PM, Muhuri, Pradip (SAMHSA/CBHSQ) pradip.muh...@samhsa.hhs.gov wrote: Hello, I am looking for a dplyr or base R solution for the column total - JUST FOR THE LAST COLUMN in the example below. The following code works, giving me the total for each column - This is not exactly what I want. rbind(test, colSums(test)) I only want the total for the very last column. I am struggling with this part of the code: rbind(test, c(Total, colSums(test, ...))) I have searched for a solution on Stack Oveflow. I found some mutate() code for the cumsum but no luck for the select column total. Is there a dplyr solution for the select column total? Any hints will be appreciated. Thanks, Pradip Muhuri ### The following is from the console - the R script with reproducible example is also appended. mrjflag cocflag inhflag halflag oidflag count 10 0 0 0 0 256 20 0 0 1 1 256 30 0 1 0 1 256 40 0 1 1 1 256 50 1 0 0 1 256 60 1 0 1 1 256 70 1 1 0 1 256 80 1 1 1 1 256 91 0 0 0 1 256 10 1 0 0 1 1 256 11 1 0 1 0 1 256 12 1 0 1 1 1 256 13 1 1 0 0 1 256 14 1 1 0 1 1 256 15 1 1 1 0 1 256 16 1 1 1 1 1 256 17 8 8 8 8 15 4096 ### below is the reproducible example library(dplyr) # generate data dlist - rep( list( 0:1 ), 4 ) data - do.call(expand.grid, drbind) data$id - 1:nrow(data) names(data) - c('mrjflag', 'cocflag', 'inhflag', 'halflag') # mutate a column and then sumamrize test - data %% mutate(oidflag= ifelse(mrjflag==1 | cocflag==1 | inhflag==1 | halflag==1, 1, 0)) %% group_by(mrjflag,cocflag, inhflag, halflag, oidflag) %% summarise(count=n()) %% arrange(mrjflag,cocflag, inhflag, halflag, oidflag) # This works, giving me the total for each column - This is not what I exactly want. rbind(test, colSums(test)) # I only want the total for the very last column rbind(test, c(Total, colSums(test, ...))) Pradip K. Muhuri, PhD SAMHSA/CBHSQ 1 Choke Cherry Road, Room 2-1071 Rockville, MD 20857 Tel: 240-276-1070 Fax: 240-276-1260 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R dplyr solution vs. Base R solution for the slect column total
Hi Duncan, Thank you for sending your solution. Below is another way. Pradip test - data.frame(first=c(1,2), second=c(3,4)) total - c(, sum(test$second)) rbind(test, Total=total) first second 1 1 3 2 2 4 Total7 rbind(test, c(Total, colSums(test[,2, drop=FALSE]))) first second 1 1 3 2 2 4 3 Total 7 Pradip K. Muhuri, PhD SAMHSA/CBHSQ 1 Choke Cherry Road, Room 2-1071 Rockville, MD 20857 Tel: 240-276-1070 Fax: 240-276-1260 -Original Message- From: Duncan Murdoch [mailto:murdoch.dun...@gmail.com] Sent: Sunday, November 30, 2014 9:16 PM To: Muhuri, Pradip (SAMHSA/CBHSQ); 'Boris Steipe' Cc: r-help@r-project.org Subject: Re: [R] R dplyr solution vs. Base R solution for the slect column total On 30/11/2014, 8:45 PM, Muhuri, Pradip (SAMHSA/CBHSQ) wrote: Hi Boris, Sorry for not being explicit when replying to your first email. I wanted to say it does not work when row-binding. I want the following output. Thanks, Pradip 11 3 22 4 Total 7 You are mixing up the computation of results with the presentation of them. That's the spreadsheet way of thinking, and it's okay for simple things like this, but gets really bogged down when the computations get hard. In R you can do it, and it's not too hard: test - data.frame(first=c(1,2), second=c(3,4)) total - c(, sum(test$second)) rbind(test, Total=total) but this isn't a really sensible thing to do: you can't work with that final result at all. It makes more sense to leave it in the original form, and then think about how you want to present it, and write a function that displays the result, with nice formatting, etc. That probably won't happen in the R console, you should be using Sweave or knitr or some other package for presentation of the results. Duncan Murdoch ### Below is the console ## test - data.frame(first=c(1,2), second=c(3,4)) test first second 1 1 3 2 2 4 sum(test$second) [1] 7 rbind(test, sum(test$second)) first second 1 1 3 2 2 4 3 7 7 Pradip K. Muhuri, PhD SAMHSA/CBHSQ 1 Choke Cherry Road, Room 2-1071 Rockville, MD 20857 Tel: 240-276-1070 Fax: 240-276-1260 -Original Message- From: Boris Steipe [mailto:boris.ste...@utoronto.ca] Sent: Sunday, November 30, 2014 5:51 PM To: Muhuri, Pradip (SAMHSA/CBHSQ) Cc: r-help@r-project.org Subject: Re: [R] R dplyr solution vs. Base R solution for the slect column total No it doesn't ... consider: test - data.frame(first=c(1,2), second=c(3,4)) test first second 1 1 3 2 2 4 sum(test$second) [1] 7 On Nov 30, 2014, at 3:48 PM, Muhuri, Pradip (SAMHSA/CBHSQ) pradip.muh...@samhsa.hhs.gov wrote: Hi Boris, That gives me the total for each of the 6 columns of the data frame. I want the column sum just for the last column. Thanks, Pradip Muhuri -Original Message- From: Boris Steipe [mailto:boris.ste...@utoronto.ca] Sent: Sunday, November 30, 2014 12:50 PM To: Muhuri, Pradip (SAMHSA/CBHSQ) Cc: r-help@r-project.org Subject: Re: [R] R dplyr solution vs. Base R solution for the slect column total try: sum(test$count) B. On Nov 30, 2014, at 12:01 PM, Muhuri, Pradip (SAMHSA/CBHSQ) pradip.muh...@samhsa.hhs.gov wrote: Hello, I am looking for a dplyr or base R solution for the column total - JUST FOR THE LAST COLUMN in the example below. The following code works, giving me the total for each column - This is not exactly what I want. rbind(test, colSums(test)) I only want the total for the very last column. I am struggling with this part of the code: rbind(test, c(Total, colSums(test, ...))) I have searched for a solution on Stack Oveflow. I found some mutate() code for the cumsum but no luck for the select column total. Is there a dplyr solution for the select column total? Any hints will be appreciated. Thanks, Pradip Muhuri ### The following is from the console - the R script with reproducible example is also appended. mrjflag cocflag inhflag halflag oidflag count 10 0 0 0 0 256 20 0 0 1 1 256 30 0 1 0 1 256 40 0 1 1 1 256 50 1 0 0 1 256 60 1 0 1 1 256 70 1 1 0 1 256 80 1 1 1 1 256 91 0 0 0 1 256 10 1 0 0 1 1 256 11 1 0 1 0 1 256 12 1 0 1 1 1 256 13 1 1 0 0 1 256 14 1 1 0 1 1 256 15 1 1 1 0 1 256 16 1 1 1
Re: [R] R dplyr solution vs. Base R solution for the slect column total
Hi Boris, Excellent point. Yes, I want to convert it into to the numeric type. Your code has worked out well on the real data set. The issue is resolved. Thanks so much for your help! Pradip -Original Message- From: Boris Steipe [mailto:boris.ste...@utoronto.ca] Sent: Sunday, November 30, 2014 9:42 PM To: Muhuri, Pradip (SAMHSA/CBHSQ) Cc: r-help@r-project.org Subject: Re: [R] R dplyr solution vs. Base R solution for the slect column total What do you think should be in the empty cells? Zero? NA? Empty strings? There can't just be nothing... Here's an example with empty strings as the filler element - but do consider carefully what Duncan wrote. test - data.frame(first=c(1,2), second=c(3,4)) typeof(test[1,1]) # double # rbind() a vector that repeats the empty element one-less-then-ncols() times, # and has the column sum as its last element. test - rbind(test, c(rep(, ncol(test)-1), sum(test$second))) test first second 1 1 3 2 2 4 37 # but...! typeof(test[1,1]) # character! typeof(test[2,2]) # also character! By adding characters to your columns, you cast all of your data into character type! If you want to *do* anything with the number, you'll need to cast it back to numeric. Or use 0 or NA as the filler element. test - rbind(test, c(rep(NA, ncol(test)-1), sum(test$second))) But anyway ... as others have said, you may want to reconsider the logic of your approach. B. On Nov 30, 2014, at 8:45 PM, Muhuri, Pradip (SAMHSA/CBHSQ) pradip.muh...@samhsa.hhs.gov wrote: Hi Boris, Sorry for not being explicit when replying to your first email. I wanted to say it does not work when row-binding. I want the following output. Thanks, Pradip 11 3 22 4 Total 7 ### Below is the console ## test - data.frame(first=c(1,2), second=c(3,4)) test first second 1 1 3 2 2 4 sum(test$second) [1] 7 rbind(test, sum(test$second)) first second 1 1 3 2 2 4 3 7 7 Pradip K. Muhuri, PhD SAMHSA/CBHSQ 1 Choke Cherry Road, Room 2-1071 Rockville, MD 20857 Tel: 240-276-1070 Fax: 240-276-1260 -Original Message- From: Boris Steipe [mailto:boris.ste...@utoronto.ca] Sent: Sunday, November 30, 2014 5:51 PM To: Muhuri, Pradip (SAMHSA/CBHSQ) Cc: r-help@r-project.org Subject: Re: [R] R dplyr solution vs. Base R solution for the slect column total No it doesn't ... consider: test - data.frame(first=c(1,2), second=c(3,4)) test first second 1 1 3 2 2 4 sum(test$second) [1] 7 On Nov 30, 2014, at 3:48 PM, Muhuri, Pradip (SAMHSA/CBHSQ) pradip.muh...@samhsa.hhs.gov wrote: Hi Boris, That gives me the total for each of the 6 columns of the data frame. I want the column sum just for the last column. Thanks, Pradip Muhuri -Original Message- From: Boris Steipe [mailto:boris.ste...@utoronto.ca] Sent: Sunday, November 30, 2014 12:50 PM To: Muhuri, Pradip (SAMHSA/CBHSQ) Cc: r-help@r-project.org Subject: Re: [R] R dplyr solution vs. Base R solution for the slect column total try: sum(test$count) B. On Nov 30, 2014, at 12:01 PM, Muhuri, Pradip (SAMHSA/CBHSQ) pradip.muh...@samhsa.hhs.gov wrote: Hello, I am looking for a dplyr or base R solution for the column total - JUST FOR THE LAST COLUMN in the example below. The following code works, giving me the total for each column - This is not exactly what I want. rbind(test, colSums(test)) I only want the total for the very last column. I am struggling with this part of the code: rbind(test, c(Total, colSums(test, ...))) I have searched for a solution on Stack Oveflow. I found some mutate() code for the cumsum but no luck for the select column total. Is there a dplyr solution for the select column total? Any hints will be appreciated. Thanks, Pradip Muhuri ### The following is from the console - the R script with reproducible example is also appended. mrjflag cocflag inhflag halflag oidflag count 10 0 0 0 0 256 20 0 0 1 1 256 30 0 1 0 1 256 40 0 1 1 1 256 50 1 0 0 1 256 60 1 0 1 1 256 70 1 1 0 1 256 80 1 1 1 1 256 91 0 0 0 1 256 10 1 0 0 1 1 256 11 1 0 1 0 1 256 12 1 0 1 1 1 256 13 1 1 0 0 1 256 14 1 1 0 1 1 256 15 1 1 1 0 1 256 16 1 1 1 1 1 256 17 8 8 8 8 15
[R] file.copy
Hello, Here is something (file.copy) trivial but does not seem to work. I could not figure out what I am doing wrong. The R script below creates folders (fromFolder and toFolder) and finds the list of files (list.of.files) to be copied to the toFolder, which I have verified using the print () command. But, the issue is that the file.copy() command does not work. Both the R.script and the console are shown below. Any help/hints will be appreciated. Thanks, Pradip Muhuri # R script # #file.copy.R #identify the folders fromFolder - H:/R/cis_study toFolder - F:/cis_study_backup # find the list of files to copy list.of.files - list.files(fromFolder, .R$) # print objects print(c(fromFolder, toFolder, list.of.files)) options(warn=1) # copy the files to the toFolder - THIS DOES NOT WORK WHILE EVERYTHING PRIOR HAS WORKED file.copy(list.of.files, toFolder) # Below is from console ### #file.copy.R #identify the folders fromFolder - H:/R/cis_study toFolder - F:/cis_study_backup # find the list of files to copy list.of.files - list.files(fromFolder, .R$) # print objects print(c(fromFolder, toFolder, list.of.files)) [1] H:/R/cis_study F:/cis_study_backup [3] anl.in.scope_14.R create.oid.data.frame.R [5] create_xd2012.R file.copy.R [7] further.data.R mrj.in.scope_111214.R [9] oid.in.scope_14.R oid_cohort.R [11] warning.max.R xdate.R [13] years.before.anl.init.R years.before.mrj.init.R [15] years.before.oid.init.R options(warn=1) # copy the files to the toFolder - THIS DOES NOT WORK WHILE EVERYTHING PRIOR HAS WORKED file.copy(list.of.files, toFolder) Warning in file.copy(list.of.files, toFolder) : problem copying .\anl.in.scope_14.R to F:\cis_study_backup\anl.in.scope_14.R: No such file or directory (other similar warning messages are not shown) Pradip K. Muhuri, PhD SAMHSA/CBHSQ 1 Choke Cherry Road, Room 2-1071 Rockville, MD 20857 Tel: 240-276-1070 Fax: 240-276-1260 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] file.copy
Jeff, Thank you so much for your help. Below are the revised code (done with your hints) that has worked and the console. I have just added - overwrite=TRUE) to file.copy(). Pradip ### #file.copy.jn.way.R #identify the folders fromFolder - H:/R/cis_study toFolder - F:/cis_study_backup # find the list of files to copy list.of.files - list.files(fromFolder, .R$) # print objects print(c(fromFolder, toFolder, list.of.files)) options(warn=1) # copy the files to the toFolder - THIS DOES NOT WORK WHILE EVERYTHING PRIOR HAS WORKED file.copy(file.path(fromFolder,list.of.files), toFolder, overwrite=TRUE) ### revised console #file.copy.jn.way.R #identify the folders fromFolder - H:/R/cis_study toFolder - F:/cis_study_backup # find the list of files to copy list.of.files - list.files(fromFolder, .R$) # print objects print(c(fromFolder, toFolder, list.of.files)) [1] H:/R/cis_study F:/cis_study_backup [3] anl.in.scope_14.R create.oid.data.frame.R [5] create_xd2012.R file.copy.R [7] file.copy_Duncan_way.R further.data.R [9] mrj.in.scope_111214.R oid.in.scope_14.R [11] oid_cohort.Rwarning.max.R [13] xdate.R years.before.anl.init.R [15] years.before.mrj.init.R years.before.oid.init.R options(warn=1) # copy the files to the toFolder - THIS DOES NOT WORK WHILE EVERYTHING PRIOR HAS WORKED file.copy(file.path(fromFolder,list.of.files), toFolder, overwrite=TRUE) [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE Pradip K. Muhuri, PhD SAMHSA/CBHSQ 1 Choke Cherry Road, Room 2-1071 Rockville, MD 20857 Tel: 240-276-1070 Fax: 240-276-1260 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] file.copy
Hello Duncan, Jeff's tweaks to my code has worked. Now I am trying your way. Below are the R script and console. The issue is that the object (list.of.files) has not been created. Any thoughts? Thanks, ### R script ## #file.copy.dm.way.R #identify the folders fromFolder - file.path(H:, cis_study) toFolder - file.path(F:, cis_study) # find the list of files to copied list.of.files - list.files(fromFolder, .R$) # print objects print(fromFolder, list.of.files, toFolder) # copy the files file.copy(list.of.files, toFiles) ### console ### #file.copy.dm.way.R #identify the folders fromFolder - file.path(H:, cis_study) toFolder - file.path(F:, cis_study) # find the list of files to copied list.of.files - list.files(fromFolder, .R$) # print objects print(fromFolder, list.of.files, toFolder) Error in print.default(fromFolder, list.of.files, toFolder) : invalid 'digits' argument # copy the files file.copy(list.of.files, toFiles) logical(0) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Getting the most recent dates in a new column from dates in four columns using the dplyr package (mutate verb)
Hi Mark, Your code has also given me the results I expected. Thank you so much for your help. Regards, Pradip Pradip K. Muhuri, PhD SAMHSA/CBHSQ 1 Choke Cherry Road, Room 2-1071 Rockville, MD 20857 Tel: 240-276-1070 Fax: 240-276-1260 -Original Message- From: Mark Sharp [mailto:msh...@txbiomed.org] Sent: Sunday, November 09, 2014 3:01 AM To: Muhuri, Pradip (SAMHSA/CBHSQ) Cc: r-help@r-project.org Subject: Re: [R] Getting the most recent dates in a new column from dates in four columns using the dplyr package (mutate verb) Pradip, mutate() works on the entire column as a vector so that you find the maximum of the entire data set. I am almost certain there is some nice way to handle this, but the sapply() function is a standard approach. max() does not want a dataframe thus the use of unlist(). Using your definition of data1: data3 - data1 data3$oidflag - as.Date(sapply(seq_along(data3$id), function(row) { if (all(is.na(unlist(data1[row, -1] { max_d - NA } else { max_d - max(unlist(data1[row, -1]), na.rm = TRUE) } max_d}), origin = 1970-01-01) data3 idmrjdatecocdateinhdatehaldateoidflag 1 1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2008-07-18 2 2 NA NA NA NA NA 3 3 2009-10-24 NA 2011-10-13 NA 2011-10-13 4 4 2007-10-10 NA NA NA 2007-10-10 5 5 2006-09-01 2005-08-10 NA NA 2006-09-01 6 6 2007-09-04 2011-10-05 NA NA 2011-10-05 7 7 2005-10-25 NA NA 2011-11-04 2011-11-04 R. Mark Sharp, Ph.D. Director of Primate Records Database Southwest National Primate Research Center Texas Biomedical Research Institute P.O. Box 760549 San Antonio, TX 78245-0549 Telephone: (210)258-9476 e-mail: msh...@txbiomed.org NOTICE: This E-Mail (including attachments) is confidential and may be legally privileged. It is covered by the Electronic Communications Privacy Act, 18 U.S.C.2510-2521. If you are not the intended recipient, you are hereby notified that any retention, dissemination, distribution or copying of this communication is strictly prohibited. Please reply to the sender that you have received this message in error, then delete it. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] range () does not remove NA's with complete.cases() for dates (dplyr/mutate)
Hello, The range() with complete.cases() removes NA's for the date variables that are read from a data frame. However, the issue is that the same function does not remove NA's for the other date variable that is created using the dplyr/mutate(). The console and the reproducible example are given below. Any advice how to resolve this issue would be appreciated. Thanks, Pradip Muhuri # cut and pasted from the R console idmrjdatecocdateinhdatehaldateoiddate 1 1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2008-07-18 2 2 NA NA NA NA NA 3 3 2009-10-24 NA 2011-10-13 NA 2011-10-13 4 4 2007-10-10 NA NA NA 2007-10-10 5 5 2006-09-01 2005-08-10 NA NA 2006-09-01 6 6 2007-09-04 2011-10-05 NA NA 2011-10-05 7 7 2005-10-25 NA NA 2011-11-04 2011-11-04 # range of dates range(data2$mrjdate[complete.cases(data2$mrjdate)]) [1] 2004-11-04 2009-10-24 range(data2$cocdate[complete.cases(data2$cocdate)]) [1] 2005-08-10 2011-10-05 range(data2$inhdate[complete.cases(data2$inhdate)]) [1] 2005-07-07 2011-10-13 range(data2$haldate[complete.cases(data2$haldate)]) [1] 2007-11-07 2011-11-04 range(data2$oiddate[complete.cases(data2$oiddate)]) [1] NA 2011-11-04 reproducible code # library(dplyr) library(lubridate) library(zoo) # data object - description of the temp - id mrjdate cocdate inhdate haldate 1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2 NA NA NA NA 3 2009-10-24 NA 2011-10-13 NA 4 2007-10-10 NA NA NA 5 2006-09-01 2005-08-10 NA NA 6 2007-09-04 2011-10-05 NA NA 7 2005-10-25 NA NA 2011-11-04 # read the data object data1 - read.table(textConnection(temp), colClasses=c(character, Date, Date, Date, Date), header=TRUE, as.is=TRUE ) # create a new column data2 - data1 %% rowwise() %% mutate(oiddate=as.Date(max(mrjdate,cocdate, inhdate, haldate, na.rm=TRUE), origin='1970-01-01')) # print records print (data2) # range of dates range(data2$mrjdate[complete.cases(data2$mrjdate)]) range(data2$cocdate[complete.cases(data2$cocdate)]) range(data2$inhdate[complete.cases(data2$inhdate)]) range(data2$haldate[complete.cases(data2$haldate)]) range(data2$oiddate[complete.cases(data2$oiddate)]) Pradip K. Muhuri, PhD SAMHSA/CBHSQ 1 Choke Cherry Road, Room 2-1071 Rockville, MD 20857 Tel: 240-276-1070 Fax: 240-276-1260 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] range () does not remove NA's with complete.cases() for dates (dplyr/mutate)
Hello Arun, Thank you so much for your help. Regards, Pradip Pradip K. Muhuri, PhD SAMHSA/CBHSQ 1 Choke Cherry Road, Room 2-1071 Rockville, MD 20857 Tel: 240-276-1070 Fax: 240-276-1260 -Original Message- From: arun [mailto:smartpink...@yahoo.com] Sent: Monday, November 10, 2014 11:30 AM To: Muhuri, Pradip (SAMHSA/CBHSQ); r-help@r-project.org Subject: Re: [R] range () does not remove NA's with complete.cases() for dates (dplyr/mutate) Try range(data2$oiddate[complete.cases(data2$oiddate) is.finite(data2$oiddate)]) #[1] 2006-09-01 2011-11-04 If you look at the `dput` output, it is `Inf` for oiddate dput(data2$oiddate) structure(c(14078, -Inf, 15260, 13796, 13392, 15252, 15282), class = Date) A.K. On Monday, November 10, 2014 11:15 AM, Muhuri, Pradip (SAMHSA/CBHSQ) pradip.muh...@samhsa.hhs.gov wrote: Hello, The range() with complete.cases() removes NA's for the date variables that are read from a data frame. However, the issue is that the same function does not remove NA's for the other date variable that is created using the dplyr/mutate(). The console and the reproducible example are given below. Any advice how to resolve this issue would be appreciated. Thanks, Pradip Muhuri # cut and pasted from the R console idmrjdatecocdateinhdatehaldateoiddate 1 1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2008-07-18 2 2 NA NA NA NA NA 3 3 2009-10-24 NA 2011-10-13 NA 2011-10-13 4 4 2007-10-10 NA NA NA 2007-10-10 5 5 2006-09-01 2005-08-10 NA NA 2006-09-01 6 6 2007-09-04 2011-10-05 NA NA 2011-10-05 7 7 2005-10-25 NA NA 2011-11-04 2011-11-04 # range of dates range(data2$mrjdate[complete.cases(data2$mrjdate)]) [1] 2004-11-04 2009-10-24 range(data2$cocdate[complete.cases(data2$cocdate)]) [1] 2005-08-10 2011-10-05 range(data2$inhdate[complete.cases(data2$inhdate)]) [1] 2005-07-07 2011-10-13 range(data2$haldate[complete.cases(data2$haldate)]) [1] 2007-11-07 2011-11-04 range(data2$oiddate[complete.cases(data2$oiddate)]) [1] NA 2011-11-04 reproducible code # library(dplyr) library(lubridate) library(zoo) # data object - description of the temp - id mrjdate cocdate inhdate haldate 1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2 NA NA NA NA 3 2009-10-24 NA 2011-10-13 NA 4 2007-10-10 NA NA NA 5 2006-09-01 2005-08-10 NA NA 6 2007-09-04 2011-10-05 NA NA 7 2005-10-25 NA NA 2011-11-04 # read the data object data1 - read.table(textConnection(temp), colClasses=c(character, Date, Date, Date, Date), header=TRUE, as.is=TRUE ) # create a new column data2 - data1 %% rowwise() %% mutate(oiddate=as.Date(max(mrjdate,cocdate, inhdate, haldate, na.rm=TRUE), origin='1970-01-01')) # print records print (data2) # range of dates range(data2$mrjdate[complete.cases(data2$mrjdate)]) range(data2$cocdate[complete.cases(data2$cocdate)]) range(data2$inhdate[complete.cases(data2$inhdate)]) range(data2$haldate[complete.cases(data2$haldate)]) range(data2$oiddate[complete.cases(data2$oiddate)]) Pradip K. Muhuri, PhD SAMHSA/CBHSQ 1 Choke Cherry Road, Room 2-1071 Rockville, MD 20857 Tel: 240-276-1070 Fax: 240-276-1260 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] range () does not remove NA's with complete.cases() for dates (dplyr/mutate)
Mark, Thank you very much for further looking into this issue. So, the ugly solution is better! Would you like to bring to Hadley's attention that mutate does set the NA value for the new column? Regards, Pradip Pradip K. Muhuri, PhD SAMHSA/CBHSQ 1 Choke Cherry Road, Room 2-1071 Rockville, MD 20857 Tel: 240-276-1070 Fax: 240-276-1260 -Original Message- From: Mark Sharp [mailto:msh...@txbiomed.org] Sent: Monday, November 10, 2014 12:23 PM To: Muhuri, Pradip (SAMHSA/CBHSQ) Cc: r-help@r-project.org Subject: Re: [R] range () does not remove NA's with complete.cases() for dates (dplyr/mutate) Pradip, For some reason mutate is not setting the is.NA value for the new column. Note the output below using your data structures. ## It looks at first as if the second element of both columns are NA. data2$mrjdate[2] [1] NA data2$oiddate[2] [1] NA ## for convenience mrj - data2$mrjdate[2] oid - data2$oiddate[2] mode(mrj) [1] numeric mode(oid) [1] numeric str(mrj) Date[1:1], format: NA str(oid) Date[1:1], format: NA class(mrj) [1] Date class(oid) [1] Date ## But note: identical(mrj, oid) [1] FALSE all.equal(mrj, oid) [1] 'is.NA' value mismatch: 0 in current 1 in target ## functioning code data2$mrjdate[2] data2$oiddate[2] mrj - data2$mrjdate[2] oid - data2$oiddate[2] mode(mrj) mode(oid) str(mrj) str(oid) class(mrj) class(oid) # But note: identical(mrj, oid) all.equal(mrj, oid) ## This ugly solution does not have the problem. data3 - data1 data3$oiddate - as.Date(sapply(seq_along(data3$id), function(row) { + if (all(is.na(unlist(data1[row, -1] { + max_d - NA + } else { + max_d - max(unlist(data1[row, -1]), na.rm = TRUE) + } + max_d}), + origin = 1970-01-01) range(data3$mrjdate[complete.cases(data3$mrjdate)]) [1] 2004-11-04 2009-10-24 range(data3$cocdate[complete.cases(data3$cocdate)]) [1] 2005-08-10 2011-10-05 range(data3$inhdate[complete.cases(data3$inhdate)]) [1] 2005-07-07 2011-10-13 range(data3$haldate[complete.cases(data3$haldate)]) [1] 2007-11-07 2011-11-04 range(data3$oiddate[complete.cases(data3$oiddate)]) [1] 2006-09-01 2011-11-04 Working code below. data3 - data1 data3$oiddate - as.Date(sapply(seq_along(data3$id), function(row) { if (all(is.na(unlist(data1[row, -1] { max_d - NA } else { max_d - max(unlist(data1[row, -1]), na.rm = TRUE) } max_d}), origin = 1970-01-01) range(data3$mrjdate[complete.cases(data3$mrjdate)]) range(data3$cocdate[complete.cases(data3$cocdate)]) range(data3$inhdate[complete.cases(data3$inhdate)]) range(data3$haldate[complete.cases(data3$haldate)]) range(data3$oiddate[complete.cases(data3$oiddate)]) On Nov 10, 2014, at 10:10 AM, Muhuri, Pradip (SAMHSA/CBHSQ) wrote: Hello, The range() with complete.cases() removes NA's for the date variables that are read from a data frame. However, the issue is that the same function does not remove NA's for the other date variable that is created using the dplyr/mutate(). The console and the reproducible example are given below. Any advice how to resolve this issue would be appreciated. Thanks, Pradip Muhuri # cut and pasted from the R console idmrjdatecocdateinhdatehaldateoiddate 1 1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2008-07-18 2 2 NA NA NA NA NA 3 3 2009-10-24 NA 2011-10-13 NA 2011-10-13 4 4 2007-10-10 NA NA NA 2007-10-10 5 5 2006-09-01 2005-08-10 NA NA 2006-09-01 6 6 2007-09-04 2011-10-05 NA NA 2011-10-05 7 7 2005-10-25 NA NA 2011-11-04 2011-11-04 # range of dates range(data2$mrjdate[complete.cases(data2$mrjdate)]) [1] 2004-11-04 2009-10-24 range(data2$cocdate[complete.cases(data2$cocdate)]) [1] 2005-08-10 2011-10-05 range(data2$inhdate[complete.cases(data2$inhdate)]) [1] 2005-07-07 2011-10-13 range(data2$haldate[complete.cases(data2$haldate)]) [1] 2007-11-07 2011-11-04 range(data2$oiddate[complete.cases(data2$oiddate)]) [1] NA 2011-11-04 reproducible code # library(dplyr) library(lubridate) library(zoo) # data object - description of the temp - id mrjdate cocdate inhdate haldate 1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2 NA NA NA NA 3 2009-10-24 NA 2011-10-13 NA 4 2007-10-10 NA NA NA 5 2006-09-01 2005-08-10 NA NA 6 2007-09-04 2011-10-05 NA NA 7 2005-10-25 NA NA 2011-11-04 # read the data object data1 - read.table(textConnection(temp), colClasses=c(character, Date, Date, Date, Date), header=TRUE, as.is=TRUE ) # create a new column data2 - data1 %% rowwise() %% mutate(oiddate=as.Date(max(mrjdate,cocdate, inhdate, haldate
Re: [R] range () does not remove NA's with complete.cases() for dates (dplyr/mutate)
Hi Bill and mark, I meant the mutate does NOT set the NA value – sorry for the confusion. Thank you for your clarifications that this may not be mutate()’s problem. This thread is now closed from my end. Thanks, Pradip Pradip K. Muhuri, PhD SAMHSA/CBHSQ 1 Choke Cherry Road, Room 2-1071 Rockville, MD 20857 Tel: 240-276-1070 Fax: 240-276-1260 From: William Dunlap [mailto:wdun...@tibco.com] Sent: Monday, November 10, 2014 1:30 PM To: Muhuri, Pradip (SAMHSA/CBHSQ) Cc: Mark Sharp; r-help@r-project.org Subject: Re: [R] range () does not remove NA's with complete.cases() for dates (dplyr/mutate) Would you like to bring to Hadley's attention that mutate does set the NA value for the new column? This may not be mutate()'s problem. The Date class is messed up with regard to NA's and Inf's. E.g., what gets printed as NA does not correspond to what is.nahttp://is.na() returns and its range() method does not appear to pass the finite=TRUE argument to range.default: d - as.Date(c(2014-10-31, c(2014-11-10))) d1 - range(d[0], finite=TRUE) Warning messages: 1: In min.default(numeric(0), na.rm = FALSE) : no non-missing arguments to min; returning Inf 2: In max.default(numeric(0), na.rm = FALSE) : no non-missing arguments to max; returning -Inf d1 [1] NA NA is.nahttp://is.na(d1) [1] FALSE FALSE dput(d1) structure(c(Inf, -Inf), class = Date) range(c(d1, d), finite=TRUE) [1] NA NA range(c(d1, d), finite=TRUE, na.rm=TRUE) [1] NA NA Bill Dunlap TIBCO Software wdunlap tibco.comhttp://tibco.com On Mon, Nov 10, 2014 at 10:09 AM, Muhuri, Pradip (SAMHSA/CBHSQ) pradip.muh...@samhsa.hhs.govmailto:pradip.muh...@samhsa.hhs.gov wrote: Mark, Thank you very much for further looking into this issue. So, the ugly solution is better! Would you like to bring to Hadley's attention that mutate does set the NA value for the new column? Regards, Pradip Pradip K. Muhuri, PhD SAMHSA/CBHSQ 1 Choke Cherry Road, Room 2-1071 Rockville, MD 20857 Tel: 240-276-1070tel:240-276-1070 Fax: 240-276-1260tel:240-276-1260 -Original Message- From: Mark Sharp [mailto:msh...@txbiomed.orgmailto:msh...@txbiomed.org] Sent: Monday, November 10, 2014 12:23 PM To: Muhuri, Pradip (SAMHSA/CBHSQ) Cc: r-help@r-project.orgmailto:r-help@r-project.org Subject: Re: [R] range () does not remove NA's with complete.cases() for dates (dplyr/mutate) Pradip, For some reason mutate is not setting the is.NA value for the new column. Note the output below using your data structures. ## It looks at first as if the second element of both columns are NA. data2$mrjdate[2] [1] NA data2$oiddate[2] [1] NA ## for convenience mrj - data2$mrjdate[2] oid - data2$oiddate[2] mode(mrj) [1] numeric mode(oid) [1] numeric str(mrj) Date[1:1], format: NA str(oid) Date[1:1], format: NA class(mrj) [1] Date class(oid) [1] Date ## But note: identical(mrj, oid) [1] FALSE all.equal(mrj, oid) [1] 'is.NA' value mismatch: 0 in current 1 in target ## functioning code data2$mrjdate[2] data2$oiddate[2] mrj - data2$mrjdate[2] oid - data2$oiddate[2] mode(mrj) mode(oid) str(mrj) str(oid) class(mrj) class(oid) # But note: identical(mrj, oid) all.equal(mrj, oid) ## This ugly solution does not have the problem. data3 - data1 data3$oiddate - as.Date(sapply(seq_along(data3$id), function(row) { + if (all(is.nahttp://is.na(unlist(data1[row, -1] { + max_d - NA + } else { + max_d - max(unlist(data1[row, -1]), na.rm = TRUE) + } + max_d}), + origin = 1970-01-01) range(data3$mrjdate[complete.cases(data3$mrjdate)]) [1] 2004-11-04 2009-10-24 range(data3$cocdate[complete.cases(data3$cocdate)]) [1] 2005-08-10 2011-10-05 range(data3$inhdate[complete.cases(data3$inhdate)]) [1] 2005-07-07 2011-10-13 range(data3$haldate[complete.cases(data3$haldate)]) [1] 2007-11-07 2011-11-04 range(data3$oiddate[complete.cases(data3$oiddate)]) [1] 2006-09-01 2011-11-04 Working code below. data3 - data1 data3$oiddate - as.Date(sapply(seq_along(data3$id), function(row) { if (all(is.nahttp://is.na(unlist(data1[row, -1] { max_d - NA } else { max_d - max(unlist(data1[row, -1]), na.rm = TRUE) } max_d}), origin = 1970-01-01) range(data3$mrjdate[complete.cases(data3$mrjdate)]) range(data3$cocdate[complete.cases(data3$cocdate)]) range(data3$inhdate[complete.cases(data3$inhdate)]) range(data3$haldate[complete.cases(data3$haldate)]) range(data3$oiddate[complete.cases(data3$oiddate)]) On Nov 10, 2014, at 10:10 AM, Muhuri, Pradip (SAMHSA/CBHSQ) wrote: Hello, The range() with complete.cases() removes NA's for the date variables that are read from a data frame. However, the issue is that the same function does not remove NA's for the other date variable that is created using the dplyr/mutate(). The console and the reproducible example are given below. Any advice how to resolve this issue would be appreciated. Thanks, Pradip Muhuri # cut and pasted from the R console
Re: [R] Getting the most recent dates in a new column from dates in four columns using the dplyr package (mutate verb)
Hi Dan, Thank you so much for sending me your code that provides me desired results. But, I don't understand why I am getting the follow warning message, In FUN(newX[, i], ...) : no non-missing arguments, returning NA. Any thoughts? Regards, Pradip data2x - within(data1, oidflag - apply(data1[,-1], 1, max, na.rm=TRUE)) Warning message: In FUN(newX[, i], ...) : no non-missing arguments, returning NA data2x idmrjdatecocdateinhdatehaldateoidflag 1 1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2008-07-18 2 2 NA NA NA NA NA 3 3 2009-10-24 NA 2011-10-13 NA 2011-10-13 4 4 2007-10-10 NA NA NA 2007-10-10 5 5 2006-09-01 2005-08-10 NA NA 2006-09-01 6 6 2007-09-04 2011-10-05 NA NA 2011-10-05 7 7 2005-10-25 NA NA 2011-11-04 2011-11-04 Pradip K. Muhuri, PhD SAMHSA/CBHSQ 1 Choke Cherry Road, Room 2-1071 Rockville, MD 20857 Tel: 240-276-1070 Fax: 240-276-1260 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Daniel Nordlund Sent: Sunday, November 09, 2014 5:33 AM To: r-help@r-project.org Subject: Re: [R] Getting the most recent dates in a new column from dates in four columns using the dplyr package (mutate verb) On 11/8/2014 8:40 PM, Muhuri, Pradip (SAMHSA/CBHSQ) wrote: Hello, The example data frame in the reproducible code below has 5 columns (1 column for id and 4 columns for dates), and there are 7 observations. I would like to insert the most recent date from those 4 date columns into a new column (oiddate) using the mutate() function in the dplyr package. I am getting correct results (NA in the new column) if a given row has all NA's in the four columns. However, the issue is that the date value inserted into the new column (oidflag) is incorrect for 5 of the remaining 6 rows (with a non-NA value in at least 1 of the four columns). I would appreciate receiving your help toward resolving the issue. Please see the R console and the R script (reproducible example)below. Thanks in advance. Pradip ## from the console print (data2) idmrjdatecocdateinhdatehaldateoidflag 1 1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2011-11-04 2 2 NA NA NA NA NA 3 3 2009-10-24 NA 2011-10-13 NA 2011-11-04 4 4 2007-10-10 NA NA NA 2011-11-04 5 5 2006-09-01 2005-08-10 NA NA 2011-11-04 6 6 2007-09-04 2011-10-05 NA NA 2011-11-04 7 7 2005-10-25 NA NA 2011-11-04 2011-11-04 ## Reproducible code and data # library(dplyr) library(lubridate) library(zoo) # data object - description of the temp - id mrjdate cocdate inhdate haldate 1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2 NA NA NA NA 3 2009-10-24 NA 2011-10-13 NA 4 2007-10-10 NA NA NA 5 2006-09-01 2005-08-10 NA NA 6 2007-09-04 2011-10-05 NA NA 7 2005-10-25 NA NA 2011-11-04 # read the data object data1 - read.table(textConnection(temp), colClasses=c(character, Date, Date, Date, Date), header=TRUE, as.is=TRUE ) # create a new column data2 - mutate(data1, oidflag= ifelse(is.na(mrjdate) is.na(cocdate) is.na(inhdate) is.na(haldate), NA, max(mrjdate, cocdate, inhdate, haldate,na.rm=TRUE ) ) ) # convert to date data2$oidflag = as.Date(data2$oidflag, origin=1970-01-01) # print records print (data2) Pradip K. Muhuri, PhD SAMHSA/CBHSQ 1 Choke Cherry Road, Room 2-1071 Rockville, MD 20857 Tel: 240-276-1070 Fax: 240-276-1260 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. I am not familiar with the mutate() function from dplyr, but you can get your wanted results as follows: data2 - within(data1, oidflag - apply(data1[,-1], 1, max, na.rm=TRUE)) Hope this is helpful, Dan Daniel Nordlund Bothell, WA USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https
Re: [R] Getting the most recent dates in a new column from dates in four columns using the dplyr package (mutate verb)
Dear Arun, Thank you so much for sending me the dplyr/mutate() solution to my code. But, I am getting the following warning message. Any suggestions on how to avoid this message? Pradip Warning message: In max(13081, NA_real_, NA_real_, 15282, na.rm = TRUE) : no non-missing arguments to max; returning -Inf # data1 %% + + rowwise() %% + mutate(oldflag=as.Date(max(mrjdate,cocdate, inhdate, haldate, + na.rm=TRUE), origin='1970-01-01')) Source: local data frame [7 x 6] Groups: by row idmrjdatecocdateinhdatehaldateoldflag 1 1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2008-07-18 2 2 NA NA NA NA NA 3 3 2009-10-24 NA 2011-10-13 NA 2011-10-13 4 4 2007-10-10 NA NA NA 2007-10-10 5 5 2006-09-01 2005-08-10 NA NA 2006-09-01 6 6 2007-09-04 2011-10-05 NA NA 2011-10-05 7 7 2005-10-25 NA NA 2011-11-04 2011-11-04 Warning message: In max(13081, NA_real_, NA_real_, 15282, na.rm = TRUE) : no non-missing arguments to max; returning -Inf Pradip K. Muhuri, PhD SAMHSA/CBHSQ 1 Choke Cherry Road, Room 2-1071 Rockville, MD 20857 Tel: 240-276-1070 Fax: 240-276-1260 -Original Message- From: arun [mailto:smartpink...@yahoo.com] Sent: Sunday, November 09, 2014 7:00 AM To: Muhuri, Pradip (SAMHSA/CBHSQ); r-help@r-project.org Subject: Re: [R] Getting the most recent dates in a new column from dates in four columns using the dplyr package (mutate verb) You could try library(dplyr) data1 %% rowwise() %% mutate(oldflag=as.Date(max(mrjdate,cocdate, inhdate, haldate, na.rm=TRUE), origin='1970-01-01')) Source: local data frame [7 x 6] Groups: by row idmrjdatecocdateinhdatehaldateoldflag 1 1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2008-07-18 2 2 NA NA NA NA NA 3 3 2009-10-24 NA 2011-10-13 NA 2011-10-13 4 4 2007-10-10 NA NA NA 2007-10-10 5 5 2006-09-01 2005-08-10 NA NA 2006-09-01 6 6 2007-09-04 2011-10-05 NA NA 2011-10-05 7 7 2005-10-25 NA NA 2011-11-04 2011-11-04 A.K. On Saturday, November 8, 2014 11:42 PM, Muhuri, Pradip (SAMHSA/CBHSQ) pradip.muh...@samhsa.hhs.gov wrote: Hello, The example data frame in the reproducible code below has 5 columns (1 column for id and 4 columns for dates), and there are 7 observations. I would like to insert the most recent date from those 4 date columns into a new column (oiddate) using the mutate() function in the dplyr package. I am getting correct results (NA in the new column) if a given row has all NA's in the four columns. However, the issue is that the date value inserted into the new column (oidflag) is incorrect for 5 of the remaining 6 rows (with a non-NA value in at least 1 of the four columns). I would appreciate receiving your help toward resolving the issue. Please see the R console and the R script (reproducible example)below. Thanks in advance. Pradip ## from the console print (data2) idmrjdatecocdateinhdatehaldateoidflag 1 1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2011-11-04 2 2 NA NA NA NA NA 3 3 2009-10-24 NA 2011-10-13 NA 2011-11-04 4 4 2007-10-10 NA NA NA 2011-11-04 5 5 2006-09-01 2005-08-10 NA NA 2011-11-04 6 6 2007-09-04 2011-10-05 NA NA 2011-11-04 7 7 2005-10-25 NA NA 2011-11-04 2011-11-04 ## Reproducible code and data # library(dplyr) library(lubridate) library(zoo) # data object - description of the temp - id mrjdate cocdate inhdate haldate 1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2 NA NA NA NA 3 2009-10-24 NA 2011-10-13 NA 4 2007-10-10 NA NA NA 5 2006-09-01 2005-08-10 NA NA 6 2007-09-04 2011-10-05 NA NA 7 2005-10-25 NA NA 2011-11-04 # read the data object data1 - read.table(textConnection(temp), colClasses=c(character, Date, Date, Date, Date), header=TRUE, as.is=TRUE ) # create a new column data2 - mutate(data1, oidflag= ifelse(is.na(mrjdate) is.na(cocdate) is.na(inhdate) is.na(haldate), NA, max(mrjdate, cocdate, inhdate, haldate,na.rm=TRUE ) ) ) # convert to date data2$oidflag = as.Date(data2$oidflag, origin=1970-01-01) # print records print (data2) Pradip K. Muhuri, PhD SAMHSA/CBHSQ 1 Choke Cherry Road, Room 2-1071 Rockville, MD 20857 Tel: 240-276
Re: [R] Getting the most recent dates in a new column from dates in four columns using the dplyr package (mutate verb)
Hi Arun and Dennis, This is just an FYI. You're right - In one row, there are all NA's in the four date columns. I have tested below the TRUEness of the condition Arun has set. is.logical(data1[rowSums(is.na(data1[,-1]))!=4,]) [1] FALSE All these 3 approaches below provide the exact same results. # Approach 1 (suggested by Arun): The code gives the expected results, but with a warning message. data1 %% rowwise() %% mutate(oldflag=as.Date(max(mrjdate,cocdate, inhdate, haldate, na.rm=TRUE), origin='1970-01-01')) # Approach 2: This code (suggested by Dan) does not provide now a warning message although it provided such message earlier. data2x - within(data1, oidflag - apply(data1[,-1], 1, max, na.rm=TRUE)) # Approach 2: This code (suggested by Mark) does not provide a warning message data2 - data1 data2$oidflag - as.Date(sapply(seq_along(data2$id), function(row) { if (all(is.na(unlist(data1[row, -1] { max_d - NA } else { max_d - max(unlist(data1[row, -1]), na.rm = TRUE) } max_d}), origin = 1970-01-01) ## ends here Pradip K. Muhuri, PhD SAMHSA/CBHSQ 1 Choke Cherry Road, Room 2-1071 Rockville, MD 20857 Tel: 240-276-1070 Fax: 240-276-1260 -Original Message- From: arun [mailto:smartpink...@yahoo.com] Sent: Sunday, November 09, 2014 10:18 AM To: Muhuri, Pradip (SAMHSA/CBHSQ); r-help@r-project.org Subject: Re: [R] Getting the most recent dates in a new column from dates in four columns using the dplyr package (mutate verb) Dear Pradip, From the documentation of ?max: The minimum and maximum of a numeric empty set are ‘+Inf’ and ‘-Inf’ One of the rows in your dataset is all `NAs.` I am not sure you want to keep that row with all NAs. You could remove it and run the code or keep it and run with that warning. data1 - data1[rowSums(is.na(data1[,-1]))!=4,] data1 %% rowwise()%% mutate(oldflag= as.Date(max(mrjdate, cocdate, inhdate, haldate, na.rm=TRUE), origin='1970-01-01') A.K. On Sunday, November 9, 2014 9:16 AM, Muhuri, Pradip (SAMHSA/CBHSQ) pradip.muh...@samhsa.hhs.gov wrote: Dear Arun, Thank you so much for sending me the dplyr/mutate() solution to my code. But, I am getting the following warning message. Any suggestions on how to avoid this message? Pradip Warning message: In max(13081, NA_real_, NA_real_, 15282, na.rm = TRUE) : no non-missing arguments to max; returning -Inf # data1 %% + + rowwise() %% + mutate(oldflag=as.Date(max(mrjdate,cocdate, inhdate, haldate, + na.rm=TRUE), origin='1970-01-01')) Source: local data frame [7 x 6] Groups: by row idmrjdatecocdateinhdatehaldateoldflag 1 1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2008-07-18 2 2 NA NA NA NA NA 3 3 2009-10-24 NA 2011-10-13 NA 2011-10-13 4 4 2007-10-10 NA NA NA 2007-10-10 5 5 2006-09-01 2005-08-10 NA NA 2006-09-01 6 6 2007-09-04 2011-10-05 NA NA 2011-10-05 7 7 2005-10-25 NA NA 2011-11-04 2011-11-04 Warning message: In max(13081, NA_real_, NA_real_, 15282, na.rm = TRUE) : no non-missing arguments to max; returning -Inf Pradip K. Muhuri, PhD SAMHSA/CBHSQ 1 Choke Cherry Road, Room 2-1071 Rockville, MD 20857 Tel: 240-276-1070 Fax: 240-276-1260 -Original Message- From: arun [mailto:smartpink...@yahoo.com] Sent: Sunday, November 09, 2014 7:00 AM To: Muhuri, Pradip (SAMHSA/CBHSQ); r-help@r-project.org Subject: Re: [R] Getting the most recent dates in a new column from dates in four columns using the dplyr package (mutate verb) You could try library(dplyr) data1 %% rowwise() %% mutate(oldflag=as.Date(max(mrjdate,cocdate, inhdate, haldate, na.rm=TRUE), origin='1970-01-01')) Source: local data frame [7 x 6] Groups: by row idmrjdatecocdateinhdatehaldateoldflag 1 1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2008-07-18 2 2 NA NA NA NA NA 3 3 2009-10-24 NA 2011-10-13 NA 2011-10-13 4 4 2007-10-10 NA NA NA 2007-10-10 5 5 2006-09-01 2005-08-10 NA NA 2006-09-01 6 6 2007-09-04 2011-10-05 NA NA 2011-10-05 7 7 2005-10-25 NA NA 2011-11-04 2011-11-04 A.K. On Saturday, November 8, 2014 11:42 PM, Muhuri, Pradip (SAMHSA/CBHSQ) pradip.muh...@samhsa.hhs.gov wrote: Hello, The example data frame in the reproducible code below has 5 columns (1 column for id and 4 columns for dates), and there are 7 observations. I would like to insert the most recent date from those 4 date columns into a new column (oiddate) using the mutate() function in the dplyr package. I am getting correct results (NA in the new column) if a given row has all NA's in the four columns
[R] Getting the most recent dates in a new column from dates in four columns using the dplyr package (mutate verb)
Hello, The example data frame in the reproducible code below has 5 columns (1 column for id and 4 columns for dates), and there are 7 observations. I would like to insert the most recent date from those 4 date columns into a new column (oiddate) using the mutate() function in the dplyr package. I am getting correct results (NA in the new column) if a given row has all NA's in the four columns. However, the issue is that the date value inserted into the new column (oidflag) is incorrect for 5 of the remaining 6 rows (with a non-NA value in at least 1 of the four columns). I would appreciate receiving your help toward resolving the issue. Please see the R console and the R script (reproducible example)below. Thanks in advance. Pradip ## from the console print (data2) idmrjdatecocdateinhdatehaldateoidflag 1 1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2011-11-04 2 2 NA NA NA NA NA 3 3 2009-10-24 NA 2011-10-13 NA 2011-11-04 4 4 2007-10-10 NA NA NA 2011-11-04 5 5 2006-09-01 2005-08-10 NA NA 2011-11-04 6 6 2007-09-04 2011-10-05 NA NA 2011-11-04 7 7 2005-10-25 NA NA 2011-11-04 2011-11-04 ## Reproducible code and data # library(dplyr) library(lubridate) library(zoo) # data object - description of the temp - id mrjdate cocdate inhdate haldate 1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2 NA NA NA NA 3 2009-10-24 NA 2011-10-13 NA 4 2007-10-10 NA NA NA 5 2006-09-01 2005-08-10 NA NA 6 2007-09-04 2011-10-05 NA NA 7 2005-10-25 NA NA 2011-11-04 # read the data object data1 - read.table(textConnection(temp), colClasses=c(character, Date, Date, Date, Date), header=TRUE, as.is=TRUE ) # create a new column data2 - mutate(data1, oidflag= ifelse(is.na(mrjdate) is.na(cocdate) is.na(inhdate) is.na(haldate), NA, max(mrjdate, cocdate, inhdate, haldate,na.rm=TRUE ) ) ) # convert to date data2$oidflag = as.Date(data2$oidflag, origin=1970-01-01) # print records print (data2) Pradip K. Muhuri, PhD SAMHSA/CBHSQ 1 Choke Cherry Road, Room 2-1071 Rockville, MD 20857 Tel: 240-276-1070 Fax: 240-276-1260 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Adding labels to ColSums
Hello, I was trying to add labels to the colSums of the integers variable corresponding to a factor. Below are the warning message and the reproducible code. How would I tweak the code to replace the NA with the Total in the output? Your advice toward resolving the issue would be greatly appreciated. Thanks, Pradip Muhuri ### warning message - from the console rb.data - rbind(s.data2, c(Total, colSums(s.data2[,2, drop=FALSE]))) # row bind with the column total Warning message: In `[-.factor`(`*tmp*`, ri, value = Total) : invalid factor level, NA generated rb.data Source: local data frame [7 x 2] years.before.initiated.cat anl.count 1 [0,1]89 2 (1,2]73 3 (2,3]72 4 (3,4]82 5 (4,5]82 6 (5,6]86 7 NA 484 # reproducible code # library(dplyr) i.data2 - data.frame(sample(1:6, size=484, replace=T)) # simulate data to create a data frame colnames(i.data2) - years.before.initiated # add a column name m.data2 - mutate(i.data2, years.before.initiated.cat = cut(years.before.initiated, breaks=c(0,1,2,3,4,5,6),include.lowest=TRUE)) # create a new variable g.data2 - group_by(m.data2, years.before.initiated.cat) # group by years.before.initiated.cat s.data2 - summarise(g.data2, anl.count =n() ) # summarize to get the count rb.data - rbind(s.data2, c(Total, colSums(s.data2[,2, drop=FALSE]))) # row bind with the column total rb.data ### Pradip K. Muhuri, PhD SAMHSA/CBHSQ 1 Choke Cherry Road, Room 2-1071 Rockville, MD 20857 Tel: 240-276-1070 Fax: 240-276-1260 ommented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Error Reading from Connection
Hello, I am running Rx64 3.03 under Windows 8 environment. I have been getting the following error. when running some of my old R applications. Below is a mock-up example. Could someone please help me resolve the issue? Thanks, Pradip Muhuri setwd (D:/) #load Rdata file load(heroin.rdata) Error: error reading from connection str(heroin) Error in str(heroin) : object 'heroin' not found [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Error Reading from Connection
Hello, Thank you so much for your guidance. This time I am providing more information. R Script and R Console are appended below. The list.files() below provides evidence the existence of this file in the temp directory. Please note that the heroin.rdata file was created from the SAS data set using the Stat Transfer utility software. The file.access() below did not return mode=4. Does this mean that I don't have read access to the file? Is that the reason I could not load the file? I would appreciate receiving help resolve the issue. Pradip Muhuri R Script *** setwd (D:/temp) list.files() file.access(heroin.rdata, mode=4) load(heroin.rdata) * R Console * setwd (D:/temp) list.files() [1] heroin.rdata file.access(heroin.rdata, mode=4) heroin.rdata 0 load(heroin.rdata) Error: error reading from connection From: Jeff Newmiller [jdnew...@dcn.davis.ca.us] Sent: Tuesday, September 23, 2014 9:20 PM To: Muhuri, Pradip (SAMHSA/CBHSQ); r-help@r-project.org Subject: Re: [R] Error Reading from Connection Insufficient information, and irrelevant information (the second error is a direct consequence of the first). We have no way of knowing based on this input that your file is there. (?list.files). We also don't know if you have read access to that file (?file.access). Since you posted in HTML and failed to provide the requested minimum information, you should probably (re-)read the Posting Guide mentioned at the bottom of this (and any other) message on this mailing list. You should probably also follow the advice given there to update your R software to the latest version so we don't go chasing any problems in R for your operating system that have already been solved. --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. On September 23, 2014 5:36:59 PM PDT, Muhuri, Pradip (SAMHSA/CBHSQ) pradip.muh...@samhsa.hhs.gov wrote: Hello, I am running Rx64 3.03 under Windows 8 environment. I have been getting the following error. when running some of my old R applications. Below is a mock-up example. Could someone please help me resolve the issue? Thanks, Pradip Muhuri setwd (D:/) #load Rdata file load(heroin.rdata) Error: error reading from connection str(heroin) Error in str(heroin) : object 'heroin' not found [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Regression Tolerance Intervals - Dr. Young's Code
Uwe and Dennis - Thank you so much for your comments, tips and advice. The following reproducible code has worked and given me the desired results. Pradip ### Revised Code # setwd (C:/RAPP) require (tolerance) set.seed (100);x - runif (200,0,10); y - 20+5*x + rnorm (100,0,20); data.frame (cbind (x,y)) out - regtol.int (reg=lm(y~x), new.x=cbind (c(3,6,12)), side=2, alpha=.05, P=.90); plottol(out, x=cbind(1,x), y=y, side=two, x.lab=X, y.lab=Y ) From: Uwe Ligges [lig...@statistik.tu-dortmund.de] Sent: Sunday, June 09, 2013 11:54 AM To: Muhuri, Pradip (SAMHSA/CBHSQ) Cc: R help [r-help@r-project.org]; mridulb...@aol.com Subject: Re: [R] Regression Tolerance Intervals - Dr. Young's Code On 08.06.2013 05:17, Muhuri, Pradip (SAMHSA/CBHSQ) wrote: Hello, Below is a reproducible example to generate the output by using Dr. Young's R code on the above subject . As commented below, the issue is that part of the code (regtol.int and plottol) does not seem to work. I would appreciate receiving your advice toward resolving the issue. Thanks and regards, Pradip Muhuri setwd (E:/) require (tolerance) d1- xlndur ylnant 8.910797 0.33901690 9.001415 0.36464311 8.983936 0.53976194 8.948035 0.33901690 9.056784 0.39266961 9.018593 0.18617770 9.001415 0.53976194 8.983936 -0.11005034 8.966147 0.53102826 8.948035 0.59885086 6.90 NA xd1 - read.table(textConnection(d1), header=TRUE, as.is=TRUE) print (xd1); str (xd1) #This code works xout1 - regtol.int (reg=lm (formula=ylnant ~ xlndur, data=xd1), alpha=.05, P=0.99, side=2) print (xout1) #This code does not work xout2 - regtol.int (reg=lm (formula=ylnant ~ xlndur, data=xd1), new.xlndur = NULL, alpha=.05, P=0.99, side=2) Come on, start using your brain and replace new.xlndur by new.x? print (xout2) #This code does not work plottol(xout1, x=cbind(1,x), y=y, side=two, x.lab=X, y.lab=Y ) So replace x and y appropriately? Best, Uwe Ligges #This code does not work plottol(xout2, x=cbind(1,x), y=y, side=two, x.lab=X, y.lab=Y ) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Regression Tolerance Intervals - Dr. Young's Code
Hello, Below is a reproducible example to generate the output by using Dr. Young's R code on the above subject . As commented below, the issue is that part of the code (regtol.int and plottol) does not seem to work. I would appreciate receiving your advice toward resolving the issue. Thanks and regards, Pradip Muhuri setwd (E:/) require (tolerance) d1- xlndur ylnant 8.910797 0.33901690 9.001415 0.36464311 8.983936 0.53976194 8.948035 0.33901690 9.056784 0.39266961 9.018593 0.18617770 9.001415 0.53976194 8.983936 -0.11005034 8.966147 0.53102826 8.948035 0.59885086 6.90 NA xd1 - read.table(textConnection(d1), header=TRUE, as.is=TRUE) print (xd1); str (xd1) #This code works xout1 - regtol.int (reg=lm (formula=ylnant ~ xlndur, data=xd1), alpha=.05, P=0.99, side=2) print (xout1) #This code does not work xout2 - regtol.int (reg=lm (formula=ylnant ~ xlndur, data=xd1), new.xlndur = NULL, alpha=.05, P=0.99, side=2) print (xout2) #This code does not work plottol(xout1, x=cbind(1,x), y=y, side=two, x.lab=X, y.lab=Y ) #This code does not work plottol(xout2, x=cbind(1,x), y=y, side=two, x.lab=X, y.lab=Y ) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Applying a user-defined function
Hello List, My goal is to apply a user-defined function on several columns of a data frame. When testing the code on a reproducible example below, I get the following error message. #now Write a new function using the above cut ()/quantile function to apply on different columns of the data frame CutQuintiles - function(x) { + cut (test1$x,quantile (test1$x, (0:5/5)),include.lowest=TRUE) + } #apply the CutQuintile () on every odd-numbered columns of the test1 data frame newcols - sapply(test1 [, seq (1,6,2)], CutQuintiles) Error in cut.default(test1$x, quantile(test1$x, (0:5/5)), include.lowest = TRUE) : 'x' must be numeric I would appreciate receiving your advice. Thanks, Pradip ## The reproducible example begins here test1 - read.table (text= State,ObtMj_P,ObtMj_SE,ExpPrevMed_P,ExpPrevMed_SE,ParMon_P,ParMon_SE Alabama,49.60,1.37,80.00,0.91,12.10,0.68 Alaska,55.00,1.41,81.80,1.08,12.40,0.90 Arizona,52.50,1.56,79.60,1.20,15.80,1.08 Arkansas,50.50,1.22,78.00,0.78,12.80,0.72 California,51.10,0.65,80.50,0.53,13.00,0.41 Colorado,55.10,1.26,81.70,1.03,12.10,0.72 Connecticut,56.30,1.28,85.00,0.93,14.60,0.77 Delaware,53.60,1.30,79.50,1.04,14.70,0.97 District of Columbia,53.50,1.22,76.20,1.03,14.30,1.13 Florida,52.70,0.67,78.90,0.52,14.10,0.45 Georgia,52.50,1.15,79.30,1.02,15.90,0.98 Hawaii,49.40,1.33,83.80,1.12,16.00,1.06 Idaho,48.30,1.23,82.40,0.99,11.90,0.74 Illinois,52.70,0.63,81.00,0.46,13.60,0.40 Indiana,49.60,1.16,80.90,0.91,12.60,0.82 Iowa,46.30,1.37,82.10,1.01,13.60,0.87 Kansas,44.30,1.43,79.20,0.98,12.90,0.79 Kentucky,52.90,1.37,78.70,1.05,14.60,0.98 Louisiana,49.70,1.23,76.80,1.06,14.50,0.76 Maine,55.60,1.44,82.90,0.93,16.70,0.83 Maryland,53.90,1.46,83.60,0.95,14.00,0.80 Massachusetts,55.40,1.41,81.00,1.15,14.70,0.80 Michigan,52.40,0.62,80.50,0.47,15.00,0.43 Minnesota,51.50,1.20,84.40,0.87,14.40,0.86 Mississippi,43.20,1.14,76.60,0.91,12.30,0.78 Missouri,48.70,1.20,80.30,0.90,13.70,0.12 Montana,56.40,1.16,83.70,0.95,12.10,0.68 Nebraska,45.70,1.51,83.40,0.95,12.40,0.90 Nevada,54.20,1.17,80.60,1.07,15.80,1.08 New Hampshire,56.10,1.30,83.30,0.93,12.80,0.72 New Jersey,53.20,1.45,83.70,0.95,13.00,0.41 New Mexico,57.60,1.34,78.90,1.03,12.10,0.72 New York,53.70,0.67,82.60,0.48,14.60,0.77 North Carolina,52.20,1.26,81.90,0.84,14.70,0.97 North Dakota,48.60,1.34,84.20,0.88,14.30,1.13 Ohio,50.90,0.61,82.70,0.49,14.10,0.45 Oklahoma,47.20,1.42,78.80,1.33,15.90,0.98 Oregon,54.00,1.35,80.60,1.14,16.00,1.06 Pennsylvania,53.00,0.63,79.90,0.47,11.90,0.74 Rhode Island,57.20,1.20,79.50,1.02,13.60,0.40 South Carolina,50.50,1.21,79.50,0.95,12.60,0.82 South Dakota,43.40,1.30,81.70,1.05,13.60,0.87 Tennessee,48.90,1.35,78.40,1.35,12.90,0.79 Texas,48.70,0.62,79.00,0.48,14.60,0.98 Utah,42.00,1.49,85.00,0.93,14.50,0.76 Vermont,58.70,1.24,83.70,0.84,16.70,0.83 Virginia,51.80,1.18,82.00,1.04,14.00,0.80 Washington,53.50,1.39,84.10,0.96,14.70,0.80 West Virginia,52.80,1.07,79.80,0.93,15.00,0.43 Wisconsin,49.90,1.50,83.50,1.02,14.40,0.86 Wyoming,49.20,1.29,82.00,0.85,12.30,0.78 , sep=,, row.names='State', header=TRUE, as.is=TRUE) # Verify if The following function ctagorizes the obtmj_p values into one of the 5 equal sized groups- works fine. cut (test1$obtmj_p,quantile (test1$obtmj_p, (0:5/5)),include.lowest=TRUE) #now Write a new function using the above cut ()/quantile function to apply on different columns of the data frame CutQuintiles - function(x) { cut (test1$x,quantile (test1$x, (0:5/5)),include.lowest=TRUE) } #apply the CutQuintile () on every odd-numbered columns of the test1 data frame newcols - sapply(test1 [, seq (1,6,2)], CutQuintiles) # name 3 new columns based on the odd-numbered columns names(newcols) - paste (names(test1 [, seq (1,6,2)]), _cat) ## Pradip K. Muhuri, PhD Statistician Substance Abuse Mental Health Services Administration The Center for Behavioral Health Statistics and Quality Division of Population Surveys 1 Choke Cherry Road, Room 2-1071 Rockville, MD 20857 Tel: 240-276-1070 Fax: 240-276-1260 e-mail: pradip.muh...@samhsa.hhs.govmailto:pradip.muh...@samhsa.hhs.gov The Center for Behavioral Health Statistics and Quality your feedback. Please click on the following link to complete a brief customer survey: http://cbhsqsurvey.samhsa.govhttp://cbhsqsurvey.samhsa.gov/ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Applying a user-defined function
Hello List, Last time, Arun's following solution worked to create 3 new columns (1,3,5). Now how would I tweak this function to create corresponding (additional) columns (7,8,9) of mode factor (levels = 1,2,3,4,5)? Thanks for your continued support. Pradip ### cut and paste from the reproducible example CutQuintiles - function( x) { cut (x,quantile (x, (0:5/5)),include.lowest=TRUE) } #apply the CutQuintile () on every odd-numbered columns of the test1 data frame test1$newcols - sapply(test1 [, seq (1,6,2)], CutQuintiles) # name 3 new columns based on the odd-numbered columns names(test1$newcols) - paste (names(test1 [, seq (1,6,2)]), _cat) ## Reproducible Example test1 - read.table (text= State,ObtMj_P,ObtMj_SE,ExpPrevMed_P,ExpPrevMed_SE,ParMon_P,ParMon_SE Alabama,49.60,1.37,80.00,0.91,12.10,0.68 Alaska,55.00,1.41,81.80,1.08,12.40,0.90 Arizona,52.50,1.56,79.60,1.20,15.80,1.08 Arkansas,50.50,1.22,78.00,0.78,12.80,0.72 California,51.10,0.65,80.50,0.53,13.00,0.41 Colorado,55.10,1.26,81.70,1.03,12.10,0.72 Connecticut,56.30,1.28,85.00,0.93,14.60,0.77 Delaware,53.60,1.30,79.50,1.04,14.70,0.97 District of Columbia,53.50,1.22,76.20,1.03,14.30,1.13 Florida,52.70,0.67,78.90,0.52,14.10,0.45 Georgia,52.50,1.15,79.30,1.02,15.90,0.98 Hawaii,49.40,1.33,83.80,1.12,16.00,1.06 Idaho,48.30,1.23,82.40,0.99,11.90,0.74 Illinois,52.70,0.63,81.00,0.46,13.60,0.40 Indiana,49.60,1.16,80.90,0.91,12.60,0.82 Iowa,46.30,1.37,82.10,1.01,13.60,0.87 Kansas,44.30,1.43,79.20,0.98,12.90,0.79 Kentucky,52.90,1.37,78.70,1.05,14.60,0.98 Louisiana,49.70,1.23,76.80,1.06,14.50,0.76 Maine,55.60,1.44,82.90,0.93,16.70,0.83 Maryland,53.90,1.46,83.60,0.95,14.00,0.80 Massachusetts,55.40,1.41,81.00,1.15,14.70,0.80 Michigan,52.40,0.62,80.50,0.47,15.00,0.43 Minnesota,51.50,1.20,84.40,0.87,14.40,0.86 Mississippi,43.20,1.14,76.60,0.91,12.30,0.78 Missouri,48.70,1.20,80.30,0.90,13.70,0.12 Montana,56.40,1.16,83.70,0.95,12.10,0.68 Nebraska,45.70,1.51,83.40,0.95,12.40,0.90 Nevada,54.20,1.17,80.60,1.07,15.80,1.08 New Hampshire,56.10,1.30,83.30,0.93,12.80,0.72 New Jersey,53.20,1.45,83.70,0.95,13.00,0.41 New Mexico,57.60,1.34,78.90,1.03,12.10,0.72 New York,53.70,0.67,82.60,0.48,14.60,0.77 North Carolina,52.20,1.26,81.90,0.84,14.70,0.97 North Dakota,48.60,1.34,84.20,0.88,14.30,1.13 Ohio,50.90,0.61,82.70,0.49,14.10,0.45 Oklahoma,47.20,1.42,78.80,1.33,15.90,0.98 Oregon,54.00,1.35,80.60,1.14,16.00,1.06 Pennsylvania,53.00,0.63,79.90,0.47,11.90,0.74 Rhode Island,57.20,1.20,79.50,1.02,13.60,0.40 South Carolina,50.50,1.21,79.50,0.95,12.60,0.82 South Dakota,43.40,1.30,81.70,1.05,13.60,0.87 Tennessee,48.90,1.35,78.40,1.35,12.90,0.79 Texas,48.70,0.62,79.00,0.48,14.60,0.98 Utah,42.00,1.49,85.00,0.93,14.50,0.76 Vermont,58.70,1.24,83.70,0.84,16.70,0.83 Virginia,51.80,1.18,82.00,1.04,14.00,0.80 Washington,53.50,1.39,84.10,0.96,14.70,0.80 West Virginia,52.80,1.07,79.80,0.93,15.00,0.43 Wisconsin,49.90,1.50,83.50,1.02,14.40,0.86 Wyoming,49.20,1.29,82.00,0.85,12.30,0.78 , sep=,, row.names='State', header=TRUE, as.is=TRUE) # change names () to lower case names (test1) - tolower (names (test1)) #Write a cut/quantile function to apply on different columns of the data frame CutQuintiles - function( x) { cut (x,quantile (x, (0:5/5)),include.lowest=TRUE) } #apply the CutQuintile () on every odd-numbered columns of the test1 data frame test1$newcols - sapply(test1 [, seq (1,6,2)], CutQuintiles) # name 3 new columns based on the odd-numbered columns names(test1$newcols) - paste (names(test1 [, seq (1,6,2)]), _cat) dim (test1) options (width=100) test1 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] cut ()
Hello List, My goal is to create a 5 category variable (p1_st_data$ob_mrj_cat), based on the p1_st_data$obt_mrj_p variable, using the following code for 50 States and District of Columbia (N=51). p1_st_data$ob_mrj_cat - cut (p1_st_data$obt_mrj_p, quantile (p1_st_data$obt_mrj_p, (0:5/5), include.lowest=TRUE)) The issue is that, for Utah, I am getting an NA instead of (42,48.7] in the ob_mrj_cat column. Is there a way to tweak the code (i.e., programmatically) to resolve the issue? I would appreciate receiving your help. Happy New Year and Best Wishes to R Expert-members, who have been so kind and helpful to beginner R users like me. Thanks and regards, Pradip Muhuri ## console followed the reproducible example ### table(p1_st_data$ob_mrj_cat) (42,48.7] (48.7,50.9] (50.9,52.8] (52.8,54.2] (54.2,58.7] 10 10 10 10 10 p1_st_data [p1_st_data$state ==Utah,] [, 1:4] state obt_mrj_p obt_mrj_se ob_mrj_cat 45 Utah42 1.49 NA# I expected this to be (42,48.7] instead of NA. ### The Reproducible Example (data and code) is shown below: #read estimates of risk factors for substances use (ages 12-17) by State obtained from SUDAAN output p1_st_data -read.table (text= Alabama, 49.60, 1.37 Alaska, 55.00,1.41 Arizona, 52.50, 1.56 Arkansas,50.50,1.22 California,51.10,0.65 Colorado,55.10,1.26 Connecticut, 56.30,1.28 Delaware, 53.60,1.30 District of Columbia, 53.50, 1.22 Florida, 52.70, 0.67 Georgia, 52.50,1.15 Hawaii, 49.40,1.33 Idaho, 48.30,1.23 Illinois, 52.70,0.63 Indiana,49.60,1.16 Iowa, 46.30,1.37 Kansas, 44.30,1.43 Kentucky,52.90,1.37 Louisiana,49.70,1.23 Maine, 55.60,1.44 Maryland, 53.90,1.46 Massachusetts,55.40,1.41 Michigan,52.40,0.62 Minnesota, 51.50,1.20 Mississippi, 43.20,1.14 Missouri, 48.70,1.20 Montana,56.40,1.16 Nebraska, 45.70,1.51 Nevada, 54.20,1.17 New Hampshire, 56.10,1.30 New Jersey, 53.20,1.45 New Mexico, 57.60,1.34 New York, 53.70,0.67 North Carolina, 52.20,1.26 North Dakota, 48.60,1.34 Ohio, 50.90,0.61 Oklahoma, 47.20,1.42 Oregon, 54.00,1.35 Pennsylvania,53.00,0.63 Rhode Island,57.20,1.20 South Carolina, 50.50,1.21 South Dakota, 43.40,1.30 Tennessee,48.90,1.35 Texas, 48.70,0.62 Utah, 42.00,1.49 Vermont,58.70,1.24 Virginia,51.80,1.18 Washington, 53.50,1.39 West Virginia,52.80,1.07 Wisconsin, 49.90,1.50 Wyoming, 49.20,1.29, sep= , , col.names = c(state , Obt_mrj_p , Obt_mrj_se ), colClasses = c( character , numeric , numeric ) ) #change the names to lower cases names(p1_st_data) - tolower (names(p1_st_data)) # cerate five equal-sized groups for the perceived ease of obtaining marijuana variable p1_st_data$ob_mrj_cat - cut (p1_st_data$obt_mrj_p, quantile (p1_st_data$obt_mrj_p, (0:5/5), include.lowest=TRUE)) p1_st_data dim (p1_st_data) table(p1_st_data$ob_mrj_cat) p1_st_data [p1_st_data$state ==Utah,] [, 1:4] Pradip K. Muhuri, PhD Statistician Substance Abuse Mental Health Services Administration The Center for Behavioral Health Statistics and Quality Division of Population Surveys 1 Choke Cherry Road, Room 2-1071 Rockville, MD 20857 Tel: 240-276-1070 Fax: 240-276-1260 e-mail: pradip.muh...@samhsa.hhs.govmailto:pradip.muh...@samhsa.hhs.gov The Center for Behavioral Health Statistics and Quality your feedback. Please click on the following link to complete a brief customer survey: http://cbhsqsurvey.samhsa.govhttp://cbhsqsurvey.samhsa.gov/ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] cut ()
Dear David, Thank you so much for catching the mistake that is kind of careless. Sorry about that. Happy New Year. Pradip From: David L Carlson [dcarl...@tamu.edu] Sent: Monday, December 31, 2012 6:18 PM To: Muhuri, Pradip (SAMHSA/CBHSQ); 'R help' Subject: RE: [R] cut () A misplaced right parenthesis caused the problem: p1_st_data$ob_mrj_cat - cut (p1_st_data$obt_mrj_p, quantile (p1_st_data$obt_mrj_p, (0:5/5), include.lowest=TRUE)) Should be p1_st_data$ob_mrj_cat - cut (p1_st_data$obt_mrj_p, quantile (p1_st_data$obt_mrj_p, (0:5/5)), include.lowest=TRUE) - David L Carlson Associate Professor of Anthropology Texas AM University College Station, TX 77843-4352 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Muhuri, Pradip (SAMHSA/CBHSQ) Sent: Monday, December 31, 2012 4:25 PM To: R help Subject: [R] cut () Hello List, My goal is to create a 5 category variable (p1_st_data$ob_mrj_cat), based on the p1_st_data$obt_mrj_p variable, using the following code for 50 States and District of Columbia (N=51). p1_st_data$ob_mrj_cat - cut (p1_st_data$obt_mrj_p, quantile (p1_st_data$obt_mrj_p, (0:5/5), include.lowest=TRUE)) The issue is that, for Utah, I am getting an NA instead of (42,48.7] in the ob_mrj_cat column. Is there a way to tweak the code (i.e., programmatically) to resolve the issue? I would appreciate receiving your help. Happy New Year and Best Wishes to R Expert-members, who have been so kind and helpful to beginner R users like me. Thanks and regards, Pradip Muhuri ## console followed the reproducible example ### table(p1_st_data$ob_mrj_cat) (42,48.7] (48.7,50.9] (50.9,52.8] (52.8,54.2] (54.2,58.7] 10 10 10 10 10 p1_st_data [p1_st_data$state ==Utah,] [, 1:4] state obt_mrj_p obt_mrj_se ob_mrj_cat 45 Utah42 1.49 NA# I expected this to be (42,48.7] instead of NA. ### The Reproducible Example (data and code) is shown below: #read estimates of risk factors for substances use (ages 12-17) by State obtained from SUDAAN output p1_st_data -read.table (text= Alabama, 49.60, 1.37 Alaska, 55.00,1.41 Arizona, 52.50, 1.56 Arkansas,50.50,1.22 California,51.10,0.65 Colorado,55.10,1.26 Connecticut, 56.30,1.28 Delaware, 53.60,1.30 District of Columbia, 53.50, 1.22 Florida, 52.70, 0.67 Georgia, 52.50,1.15 Hawaii, 49.40,1.33 Idaho, 48.30,1.23 Illinois, 52.70,0.63 Indiana,49.60,1.16 Iowa, 46.30,1.37 Kansas, 44.30,1.43 Kentucky,52.90,1.37 Louisiana,49.70,1.23 Maine, 55.60,1.44 Maryland, 53.90,1.46 Massachusetts,55.40,1.41 Michigan,52.40,0.62 Minnesota, 51.50,1.20 Mississippi, 43.20,1.14 Missouri, 48.70,1.20 Montana,56.40,1.16 Nebraska, 45.70,1.51 Nevada, 54.20,1.17 New Hampshire, 56.10,1.30 New Jersey, 53.20,1.45 New Mexico, 57.60,1.34 New York, 53.70,0.67 North Carolina, 52.20,1.26 North Dakota, 48.60,1.34 Ohio, 50.90,0.61 Oklahoma, 47.20,1.42 Oregon, 54.00,1.35 Pennsylvania,53.00,0.63 Rhode Island,57.20,1.20 South Carolina, 50.50,1.21 South Dakota, 43.40,1.30 Tennessee,48.90,1.35 Texas, 48.70,0.62 Utah, 42.00,1.49 Vermont,58.70,1.24 Virginia,51.80,1.18 Washington, 53.50,1.39 West Virginia,52.80,1.07 Wisconsin, 49.90,1.50 Wyoming, 49.20,1.29, sep= , , col.names = c(state , Obt_mrj_p , Obt_mrj_se ), colClasses = c( character , numeric , numeric ) ) #change the names to lower cases names(p1_st_data) - tolower (names(p1_st_data)) # cerate five equal-sized groups for the perceived ease of obtaining marijuana variable p1_st_data$ob_mrj_cat - cut (p1_st_data$obt_mrj_p, quantile (p1_st_data$obt_mrj_p, (0:5/5), include.lowest=TRUE)) p1_st_data dim (p1_st_data) table(p1_st_data$ob_mrj_cat) p1_st_data [p1_st_data$state ==Utah,] [, 1:4] Pradip K. Muhuri, PhD Statistician Substance Abuse Mental Health Services Administration The Center for Behavioral Health Statistics and Quality Division of Population Surveys 1 Choke Cherry Road, Room 2-1071 Rockville, MD 20857 Tel: 240-276-1070 Fax: 240-276-1260 e-mail: pradip.muh...@samhsa.hhs.govmailto:pradip.muh...@samhsa.hhs.gov The Center for Behavioral Health Statistics and Quality your feedback. Please click on the following link
Re: [R] cut ()
Dear Neal, Although David's solution (putting the right parenthesis, which I had missed) has resolved the issue, I would like to try yours as well. Could you please clarify the six elements: c(-1e-8, 0, 0, 0, 0, 1e8)? Thanks and regards, Pradip From: Neal H. Walfield [n...@walfield.org] Sent: Monday, December 31, 2012 5:42 PM To: Muhuri, Pradip (SAMHSA/CBHSQ) Cc: R help Subject: Re: [R] cut () At Mon, 31 Dec 2012 22:25:25 +, Muhuri, Pradip (SAMHSA/CBHSQ) wrote: The issue is that, for Utah, I am getting an NA instead of (42,48.7] in the ob_mrj_cat column. The problem is likely due to comparisons of floating point numbers. Try moving your lower and upper bounds out a tiny bit. When I add c(-1e-8, 0, 0, 0, 0, 1e8) to the result of quantile, I don't get any NAs. Neal __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] format.pval () and printCoefmat ()
Dear Arun and David, I am so grateful to you for all your help with the code. Thanks and regards, Pradip Arun - All this is very helpful. In general, I can follow the code. I only have the following questions: What changes in the code would be required to have 3 places after decimal for all numeric variables in the res data frame? Thanks, Pradip ### below is the display of the data from Lines1, Lines2, and res head (data.frame(Lines1)) Lines1 1mean_level1 mean_level2 rel_diff p_mean cohens_d 2 1 18.744 11.9110.574 0.000.175 3 2 18.744 14.4550.297 0.000.110 4 3 18.744 13.5400.384 0.000.133 5 4 18.744 6.0022.123 0.000.333 6 5 18.744 5.8342.213 0.000.349 head (data.frame(Lines2)) Lines2 1mean_level1 mean_level2 rel_diff p_mean cohens_d 2 1 18.744 11.9110.574 0.000.175 3 2 18.744 14.4550.297 0.000.110 4 3 18.744 13.5400.384 0.000.133 5 4 18.744 6.0022.123 0.000.333 6 5 18.744 5.8342.213 0.000.349 head (res) contrast_level1 contrast_level2 mean_level1 mean_level2 rel_diff p_mean cohens_d 1 wh2+hi18.7 11.910.574 0 0.175 2 wh2+rc18.7 14.460.297 0 0.110 3 whaian18.7 13.540.384 0 0.133 4 whasan18.76.002.123 0 0.333 5 whblck18.75.832.213 0 0.349 6 whcsam18.77.931.363 0 0.279 From: arun [smartpink...@yahoo.com] Sent: Friday, December 14, 2012 10:12 PM To: Muhuri, Pradip (SAMHSA/CBHSQ) Cc: R help; David Winsemius Subject: Re: [R] format.pval () and printCoefmat () Hi Pradip, May be this helps: dat1-read.table(text= contrast_level1 contrast_level2 mean_level1 mean_level2 rel_diffp_mean cohens_d 1 wh2+hi18.7 11.910.574 1.64e-05 0.1753 2 wh2+rc18.7 14.460.297 9.24e-06 0.1101 3 whaian18.7 13.540.384 9.01e-05 0.1335 4 whasan18.76.002.123 2.20e-119 0.3326 5 whblck18.75.832.213 0.00e+00 0.3490 6 whcsam18.77.931.363 1.27e-47 0.2793 7 whcub18.7 10.850.728 6.12e-08 0.2025 8 whdmcn18.77.131.629 1.59e-15 0.2981 9 whhisp18.79.720.928 3.27e-125 0.2420 10 whmex18.79.600.952 8.81e-103 0.2420 11 whnhpi18.7 16.140.162 1.74e-01 0.0669 12 whothh18.7 NA NANA NA 13 wh pr18.7 10.470.791 3.64e-23 0.2131 14 whspn18.7 15.150.237 1.58e-02 0.0922 ,sep=,header=TRUE,stringsAsFactors=FALSE) Lines1-capture.output(printCoefmat(dat1[,-c(1:2)],has.Pvalue=TRUE,eps.Pvalue=0.001)) Lines2-gsub(\\s+$,,gsub(\\.$,,Lines1[1:15])) res-data.frame(dat1[,1:2],read.table(text=Lines2,header=TRUE)) #or # res-cbind(dat1[,1:2],read.table(text=Lines2,header=TRUE)) res # contrast_level1 contrast_level2 mean_level1 mean_level2 rel_diff p_mean #1 wh2+hi18.7 11.910.574 0. #2 wh2+rc18.7 14.460.297 0. #3 whaian18.7 13.540.384 0.0001 - -- # cohens_d #10.1753 #20.1101 #30.1335 - - str(res) #'data.frame':14 obs. of 7 variables: # $ contrast_level1: chr wh wh wh wh ... # $ contrast_level2: chr 2+hi 2+rc aian asan ... # $ mean_level1: num 18.7 18.7 18.7 18.7 18.7 18.7 18.7 18.7 18.7 18.7 ... # $ mean_level2: num 11.91 14.46 13.54 6 5.83 ... # $ rel_diff : num 0.574 0.297 0.384 2.123 2.213 ... # $ p_mean : num 0e+00 0e+00 1e-04 0e+00 0e+00 0e+00 0e+00 0e+00 0e+00 0e+00 ... # $ cohens_d : num 0.175 0.11 0.134 0.333 0.349 ... A.K. - Original Message - From: Muhuri, Pradip (SAMHSA/CBHSQ) pradip.muh...@samhsa.hhs.gov To: 'David Winsemius' dwinsem...@comcast.net Cc: R help r-help@r-project.org Sent
Re: [R] format.pval () and printCoefmat ()
Hi Arun, Thank you so much for further clarifications and help. Pradip Pradip K. Muhuri, PhD Statistician Substance Abuse Mental Health Services Administration The Center for Behavioral Health Statistics and Quality Division of Population Surveys 1 Choke Cherry Road, Room 2-1071 Rockville, MD 20857 Tel: 240-276-1070 Fax: 240-276-1260 e-mail: pradip.muh...@samhsa.hhs.gov The Center for Behavioral Health Statistics and Quality your feedback. Please click on the following link to complete a brief customer survey: http://cbhsqsurvey.samhsa.gov -Original Message- From: arun [mailto:smartpink...@yahoo.com] Sent: Saturday, December 15, 2012 11:04 AM To: Muhuri, Pradip (SAMHSA/CBHSQ) Cc: R help; David Winsemius Subject: Re: [R] format.pval () and printCoefmat () Hi Pradip, It this is just formatting issue, it is possible to do that with ?formatC() or ?sprintf(), but it may change those variables from numeric to character. One possibilty from `res`: res-data.frame(dat1[,1:2],read.table(text=Lines2,header=TRUE)) varsNum-sapply(res,is.numeric) res[varsNum]-lapply(res[varsNum],round,digits=3) #Here, the numeric columns with digits3 are not changed, but the ones with 3 were all changed to digits3. As I mentioned, sprintf() changes the number of digits as.data.frame(do.call(cbind,lapply(res[varsNum],function(x) sprintf(%.3f,x # mean_level1 mean_level2 rel_diff p_mean cohens_d #1 18.700 11.9100.574 0.0000.175 #2 18.700 14.4600.297 0.0000.110 #3 18.700 13.5400.384 0.0000.134 A.K. - Original Message - From: Muhuri, Pradip (SAMHSA/CBHSQ) pradip.muh...@samhsa.hhs.gov To: arun smartpink...@yahoo.com Cc: R help r-help@r-project.org; David Winsemius dwinsem...@comcast.net Sent: Saturday, December 15, 2012 10:12 AM Subject: RE: [R] format.pval () and printCoefmat () Dear Arun and David, I am so grateful to you for all your help with the code. Thanks and regards, Pradip Arun - All this is very helpful. In general, I can follow the code. I only have the following questions: What changes in the code would be required to have 3 places after decimal for all numeric variables in the res data frame? Thanks, Pradip ### below is the display of the data from Lines1, Lines2, and res head (data.frame(Lines1)) Lines1 1mean_level1 mean_level2 rel_diff p_mean cohens_d 2 1 18.744 11.9110.574 0.000.175 3 2 18.744 14.4550.297 0.000.110 4 3 18.744 13.5400.384 0.000.133 5 4 18.744 6.0022.123 0.000.333 6 5 18.744 5.8342.213 0.000.349 head (data.frame(Lines2)) Lines2 1mean_level1 mean_level2 rel_diff p_mean cohens_d 2 1 18.744 11.9110.574 0.000.175 3 2 18.744 14.4550.297 0.000.110 4 3 18.744 13.5400.384 0.000.133 5 4 18.744 6.0022.123 0.000.333 6 5 18.744 5.8342.213 0.000.349 head (res) contrast_level1 contrast_level2 mean_level1 mean_level2 rel_diff p_mean cohens_d 1 wh2+hi18.7 11.910.574 0 0.175 2 wh2+rc18.7 14.460.297 0 0.110 3 whaian18.7 13.540.384 0 0.133 4 whasan18.76.002.123 0 0.333 5 whblck18.75.832.213 0 0.349 6 whcsam18.77.931.363 0 0.279 From: arun [smartpink...@yahoo.com] Sent: Friday, December 14, 2012 10:12 PM To: Muhuri, Pradip (SAMHSA/CBHSQ) Cc: R help; David Winsemius Subject: Re: [R] format.pval () and printCoefmat () Hi Pradip, May be this helps: dat1-read.table(text= contrast_level1 contrast_level2 mean_level1 mean_level2 rel_diffp_mean cohens_d 1 wh2+hi18.7 11.910.574 1.64e-05 0.1753 2 wh2+rc18.7 14.460.297 9.24e-06 0.1101 3 whaian18.7 13.540.384 9.01e-05 0.1335 4 whasan18.76.002.123 2.20e-119 0.3326 5 whblck18.75.832.213 0.00e+00 0.3490 6 whcsam18.77.931.363 1.27e-47 0.2793 7 whcub18.7 10.850.728 6.12e-08 0.2025 8 whdmcn18.77.131.629 1.59e-15 0.2981 9 whhisp18.79.720.928 3.27e-125 0.2420 10 whmex18.79.600.952 8.81e-103 0.2420 11 whnhpi
[R] format.pval () and printCoefmat ()
Hi List, My goal is to force R not to print in scientific notation in the sixth column (rel_diff - for the p-value) of my data frame (not a matrix). I have used the format.pval () and printCoefmat () functions on the data frame. The R script is appended below. This issue is that use of the format.pval () and printCoefmat () functions on the data frame gives me the desired results, but coerces the character string into NAs for the two character variables, because my object is a data frame, not a matrix. Please see the first output below: contrast_level1 contrast_level2). Is there a way I could have avoid printing the NAs in the character fields when using the format.pval () and printCoefmat () on the data frame? I would appreciate receiving your help. Thanks, Pradip setwd (F:/PR1/R_PR1) load (file = sigtests_overall_withid.rdata) #format.pval(tt$p.value, eps=0.0001) # keep only selected columns from the above data frame keep_cols1 - c(contrast_level1, contrast_level2,mean_level1, mean_level2, rel_diff, p_mean, cohens_d) #subset the data frame y0410_1825_mf_alc - subset (sigtests_overall_withid, years==0410 age_group==1825 gender_group==all drug==alc contrast_level1==wh, select=keep_cols1) #change the row.names row.names (y0410_1825_mf_alc)= 1:dim(y0410_1825_mf_alc)[1] #force format.pval(y0410_1825_mf_alc$p_mean, eps=0.0001) #print the observations from the sub-data frame options (width=120,digits=3 ) #y0410_1825_mf_alc printCoefmat(y0410_1825_mf_alc, has.Pvalue=TRUE, eps.Pvalue=0.0001) ### When format.pval () and printCoefmat () used contrast_level1 contrast_level2 mean_level1 mean_level2 rel_diff p_mean cohens_d 1 NA NA 18.744 11.9110.574 0.00 0.175 2 NA NA 18.744 14.4550.297 0.00 0.110 3 NA NA 18.744 13.5400.384 0.00 0.133 4 NA NA 18.744 6.0022.123 0.00 0.333 5 NA NA 18.744 5.8342.213 0.00 0.349 6 NA NA 18.744 7.9331.363 0.00 0.279 7 NA NA 18.744 10.8490.728 0.00 0.203 8 NA NA 18.744 7.1301.629 0.00 0.298 9 NA NA 18.744 9.7200.928 0.00 0.242 10 NA NA 18.744 9.6000.952 0.00 0.242 11 NA NA 18.744 16.1350.162 0.17 0.067 . 12 NA NA 18.744 NA NA NA NA 13 NA NA 18.744 10.4650.791 0.00 0.213 14 NA NA 18.744 15.1490.237 0.02 0.092 . --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Warning messages: 1: In data.matrix(x) : NAs introduced by coercion 2: In data.matrix(x) : NAs introduced by coercion ### When format.pval () and printCoefmat () not used contrast_level1 contrast_level2 mean_level1 mean_level2 rel_diffp_mean cohens_d 1 wh2+hi18.7 11.910.574 1.64e-05 0.1753 2 wh2+rc18.7 14.460.297 9.24e-06 0.1101 3 whaian18.7 13.540.384 9.01e-05 0.1335 4 whasan18.76.002.123 2.20e-119 0.3326 5 whblck18.75.832.213 0.00e+00 0.3490 6 whcsam18.77.931.363 1.27e-47 0.2793 7 wh cub18.7 10.850.728 6.12e-08 0.2025 8 whdmcn18.77.131.629 1.59e-15 0.2981 9 whhisp18.79.720.928 3.27e-125 0.2420 10 wh mex18.79.600.952 8.81e-103 0.2420 11 whnhpi18.7 16.140.162 1.74e-01 0.0669 12 whothh18.7 NA NANA NA 13 wh pr18.7 10.470.791 3.64e-23 0.2131 14 wh spn18.7 15.150.237 1.58e-02 0.0922 Pradip K. Muhuri, PhD Statistician Substance Abuse Mental Health Services Administration The Center for Behavioral Health Statistics and Quality Division of Population Surveys 1 Choke Cherry Road, Room 2-1071 Rockville, MD 20857 Tel: 240-276-1070 Fax: 240-276-1260 e-mail: pradip.muh...@samhsa.hhs.govmailto:pradip.muh...@samhsa.hhs.gov The Center for Behavioral Health
Re: [R] format.pval () and printCoefmat ()
Hi David, Thank you so much for helping me with the code. Your suggested code gives me the following results. Please see below. I don't understand why I am getting two blocks of prints (5 columns, and then 7 columns), with some columns repeated. Regards, Pradip # cbind( y0410_1825_mf_alc[ 1:2], + printCoefmat(y0410_1825_mf_alc[ -(1:2) ], has.Pvalue=TRUE, eps.Pvalue=0.0001) + ) mean_level1 mean_level2 rel_diff p_mean cohens_d 1 18.744 11.9110.574 0.000.175 2 18.744 14.4550.297 0.000.110 3 18.744 13.5400.384 0.000.133 4 18.744 6.0022.123 0.000.333 5 18.744 5.8342.213 0.000.349 6 18.744 7.9331.363 0.000.279 7 18.744 10.8490.728 0.000.203 8 18.744 7.1301.629 0.000.298 9 18.744 9.7200.928 0.000.242 10 18.744 9.6000.952 0.000.242 11 18.744 16.1350.162 0.170.067 . 12 18.744 NA NA NA NA 13 18.744 10.4650.791 0.000.213 14 18.744 15.1490.237 0.020.092 . --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 contrast_level1 contrast_level2 mean_level1 mean_level2 rel_diffp_mean cohens_d 1 wh2+hi18.7 11.910.574 1.64e-05 0.1753 2 wh2+rc18.7 14.460.297 9.24e-06 0.1101 3 whaian18.7 13.540.384 9.01e-05 0.1335 4 whasan18.76.002.123 2.20e-119 0.3326 5 whblck18.75.832.213 0.00e+00 0.3490 6 whcsam18.77.931.363 1.27e-47 0.2793 7 wh cub18.7 10.850.728 6.12e-08 0.2025 8 whdmcn18.77.131.629 1.59e-15 0.2981 9 whhisp18.79.720.928 3.27e-125 0.2420 10 wh mex18.79.600.952 8.81e-103 0.2420 11 whnhpi18.7 16.140.162 1.74e-01 0.0669 12 whothh18.7 NA NANA NA 13 wh pr18.7 10.470.791 3.64e-23 0.2131 14 wh spn18.7 15.150.237 1.58e-02 0.0922 Pradip K. Muhuri, PhD Statistician Substance Abuse Mental Health Services Administration The Center for Behavioral Health Statistics and Quality Division of Population Surveys 1 Choke Cherry Road, Room 2-1071 Rockville, MD 20857 Tel: 240-276-1070 Fax: 240-276-1260 e-mail: pradip.muh...@samhsa.hhs.gov The Center for Behavioral Health Statistics and Quality your feedback. Please click on the following link to complete a brief customer survey: http://cbhsqsurvey.samhsa.gov -Original Message- From: David Winsemius [mailto:dwinsem...@comcast.net] Sent: Friday, December 14, 2012 3:22 PM To: Muhuri, Pradip (SAMHSA/CBHSQ) Cc: R help Subject: Re: [R] format.pval () and printCoefmat () On Dec 14, 2012, at 11:48 AM, Muhuri, Pradip (SAMHSA/CBHSQ) wrote: Hi List, My goal is to force R not to print in scientific notation in the sixth column (rel_diff - for the p-value) of my data frame (not a matrix). I have used the format.pval () and printCoefmat () functions on the data frame. The R script is appended below. This issue is that use of the format.pval () and printCoefmat () functions on the data frame gives me the desired results, but coerces the character string into NAs for the two character variables, because my object is a data frame, not a matrix. Please see the first output below: contrast_level1 contrast_level2). Is there a way I could have avoid printing the NAs in the character fields They are probably factor columns. when using the format.pval () and printCoefmat () on the data frame? I would appreciate receiving your help. Thanks, Pradip setwd (F:/PR1/R_PR1) load (file = sigtests_overall_withid.rdata) #format.pval(tt$p.value, eps=0.0001) # keep only selected columns from the above data frame keep_cols1 - c(contrast_level1, contrast_level2,mean_level1, mean_level2, rel_diff, p_mean, cohens_d) #subset the data frame y0410_1825_mf_alc - subset (sigtests_overall_withid, years==0410 age_group==1825 gender_group==all drug==alc contrast_level1==wh, select=keep_cols1) #change the row.names row.names (y0410_1825_mf_alc)= 1:dim(y0410_1825_mf_alc)[1] #force format.pval(y0410_1825_mf_alc$p_mean, eps=0.0001) Presumably
[R] read.table()
Hi List, I have spent more than 30 minutes, but failed to read in this file using the read.table() function. I could not figure out how to fix the following error. Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : line 1 did not have 6 elements Any help would be be appreciated. Thanks, Pradip Muhuri ### below is the reproducible example xd1 - raceage percent sepercent flag_var Mexican 12-17 5.7926 0.64195 any Puerto Rican 12-17 5.1975 0.24929 any Cuban 12-17 3.7977 1.00487 any C-S American 12-17 4.3665 0.55329 any Dominican 12-17 1.8149 0.46677 any Spanish (Spain) 12-17 6.1971 0.98386 any Multi Hisp Eth 12-17 6.7006 1.12464 any NH White 12-17 4.8442 0.08660 any NH Black 12-17 3.6943 0.16045 any NH AM-AK 12-17 9.6325 1.06100 any NH HI-OPI 12-17 3.9189 1.08047 any NH Asian 12-17 1.9115 0.28432 any NH Multiracial 12-17 6.4255 0.51434 any Mexican 18-25 8.9284 0.73022 any Puerto Rican 18-25 6.1364 0.28394 any Cuban 18-25 8.6782 1.45543 any C-S American 18-25 5.9360 0.59899 any Dominican 18-25 7.7642 1.64553 any Spanish (Spain) 18-25 9.2632 1.15652 any Multi Hisp Eth 18-25 11.3566 1.79282 any NH White 18-25 8.6484 0.11866 any NH Black 18-25 7.5972 0.24926 any NH AM-AK 18-25 13.5041 1.57275 any NH HI-OPI 18-25 8.0227 1.41348 any NH Asian 18-25 3.2701 0.32414 any NH Multiracial 18-25 10.6489 0.85105 any Mexican 26+ 3.2110 0.51683 any Puerto Rican 26+ 1.6273 0.15033 any Cuban 26+ 1.4419 0.44118 any C-S American 26+ 1.0187 0.26594 any Dominican 26+ 0.9554 0.50275 any Spanish (Spain) 26+ 2.5976 0.86230 any Multi Hisp Eth 26+ 1.1345 0.66375 any NH White 26+ 1.5510 0.04156 any NH Black 26+ 2.8763 0.15133 any NH AM-AK 26+ 3.9674 0.76611 any NH HI-OPI 26+ 1.2919 0.66205 any NH Asian 26+ 0.7207 0.13870 any NH Multiracial 26+ 3.0668 0.52334 any Mexican 12-17 4.3152 0.53235 mrj Puerto Rican 12-17 3.7237 0.20969 mrj Cuban 12-17 2.0616 0.67248 mrj C-S American 12-17 3.3282 0.47392 mrj Dominican 12-17 1.3797 0.40435 mrj Spanish (Spain) 12-17 5.1810 0.93979 mrj Multi Hisp Eth 12-17 4.8915 0.94816 mrj NH White 12-17 3.6190 0.07379 mrj NH Black 12-17 2.8196 0.14042 mrj NH AM-AK 12-17 6.5091 0.85124 mrj NH HI-OPI 12-17 3.6267 1.06724 mrj NH Asian 12-17 1.3162 0.23575 mrj NH Multiracial 12-17 5.0657 0.49614 mrj Mexican 18-25 7.3802 0.67992 mrj Puerto Rican 18-25 4.3260 0.24191 mrj Cuban 18-25 6.1433 1.19242 mrj C-S American 18-25 3.9166 0.51272 mrj Dominican 18-25 5.8000 1.24097 mrj Spanish (Spain) 18-25 6.8646 1.01387 mrj Multi Hisp Eth 18-25 10.1134 1.75013 mrj NH White 18-25 5.8656 0.10100 mrj NH Black 18-25 6.6869 0.23643 mrj NH AM-AK 18-25 11.2989 1.51687 mrj NH HI-OPI 18-25 5.6302 1.14561 mrj NH Asian 18-25 2.3418 0.28309 mrj NH Multiracial 18-25 8.2696 0.77139 mrj Mexican 26+ 1.1658 0.33967 mrj Puerto Rican 26+ 0.6757 0.09329 mrj Cuban 26+ 0.6653 0.31239 mrj C-S American 26+ 0.3177 0.17604 mrj Dominican 26+ 0.5616 0.39780 mrj Spanish (Spain) 26+ 1.8078 0.82590 mrj Multi Hisp Eth 26+ 0.8468 0.63529 mrj NH White 26+ 0.6915 0.02791 mrj NH Black 26+ 1.5675 0.12031 mrj NH AM-AK 26+ 1.7273 0.37673 mrj NH HI-OPI 26+ 0.0356 0.03535 mrj NH Asian 26+ 0.2687 0.07564 mrj NH Multiracial 26+ 1.3419 0.30074 mrj Mexican 12-17 1.2074 0.36082 anl Puerto Rican 12-17 1.0772 0.11547 anl Cuban 12-17 1.2569 0.67109 anl C-S American 12-17 0.6213 0.22726 anl Dominican 12-17 0.1412 0.08552 anl Spanish (Spain) 12-17 0.9625 0.25453 anl Multi Hisp Eth 12-17 1.2863 0.43909 anl NH White 12-17 1.1490 0.04289 anl NH Black 12-17 0.5932 0.06220 anl NH AM-AK 12-17 1.9117 0.50122 anl NH HI-OPI 12-17 0.3833 0.20240 anl NH Asian 12-17 0.4782
Re: [R] read. table()
Dear Prof Ripley, Your hint is helpful, and I see considerable improvements in the results. The only issue is that the column names do not seem to be correct. I did not understand part of your comment, which says fortunes::fortune(14) applies although I read about the double colon operator- ns-dblcolon {base}. Could you please provide a little more hint for me to resolve the issue? Thanks and regards, # new result agerace - read.delim(textConnection(xd1), sep=\t, header=TRUE, as.is=TRUE) names(agerace) [1] raceage...percent..sepercent..flag_var head(agerace) raceage...percent..sepercent..flag_var 1 Mexican 12-17 5.7926 0.64195 any 2 Puerto Rican 12-17 5.1975 0.24929 any 3Cuban 12-17 3.7977 1.00487 any 4 C-S American 12-17 4.3665 0.55329 any 5Dominican 12-17 1.8149 0.46677 any 6 Spanish (Spain) 12-17 6.1971 0.98386 any Pradip K. Muhuri, PhD Statistician Substance Abuse Mental Health Services Administration The Center for Behavioral Health Statistics and Quality Division of Population Surveys 1 Choke Cherry Road, Room 2-1071 Rockville, MD 20857 Tel: 240-276-1070 Fax: 240-276-1260 e-mail: pradip.muh...@samhsa.hhs.gov The Center for Behavioral Health Statistics and Quality your feedback. Please click on the following link to complete a brief customer survey: http://cbhsqsurvey.samhsa.gov -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Prof Brian Ripley Sent: Saturday, December 08, 2012 2:29 PM To: r-help@r-project.org Subject: Re: [R] read.table() On 08/12/2012 19:10, Muhuri, Pradip (SAMHSA/CBHSQ) wrote: Hi List, I have spent more than 30 minutes, but failed to read in this file using the read.table() function. I could not figure out how to fix the following error. Well, we have a whole manual on this, mentioned on ?read.table (see See Also) Have you read it? fortunes::fortune(14) applies. The issue is what the separator is. You have specified whitespace, and that is not correct. The original might have had tabs (see ?read.delim) but as pasted into this email only a human can disentangle this file. Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : line 1 did not have 6 elements Any help would be be appreciated. Thanks, Pradip Muhuri ### below is the reproducible example xd1 - raceage percent sepercent flag_var Mexican 12-17 5.7926 0.64195 any Puerto Rican 12-17 5.1975 0.24929 any Cuban 12-17 3.7977 1.00487 any C-S American 12-17 4.3665 0.55329 any Dominican 12-17 1.8149 0.46677 any Spanish (Spain) 12-17 6.1971 0.98386 any Multi Hisp Eth 12-17 6.7006 1.12464 any NH White 12-17 4.8442 0.08660 any NH Black 12-17 3.6943 0.16045 any NH AM-AK 12-17 9.6325 1.06100 any NH HI-OPI 12-17 3.9189 1.08047 any NH Asian 12-17 1.9115 0.28432 any NH Multiracial 12-17 6.4255 0.51434 any Mexican 18-25 8.9284 0.73022 any Puerto Rican 18-25 6.1364 0.28394 any Cuban 18-25 8.6782 1.45543 any C-S American 18-25 5.9360 0.59899 any Dominican 18-25 7.7642 1.64553 any Spanish (Spain) 18-25 9.2632 1.15652 any Multi Hisp Eth 18-25 11.3566 1.79282 any NH White 18-25 8.6484 0.11866 any NH Black 18-25 7.5972 0.24926 any NH AM-AK 18-25 13.5041 1.57275 any NH HI-OPI 18-25 8.0227 1.41348 any NH Asian 18-25 3.2701 0.32414 any NH Multiracial 18-25 10.6489 0.85105 any Mexican 26+ 3.2110 0.51683 any Puerto Rican 26+ 1.6273 0.15033 any Cuban 26+ 1.4419 0.44118 any C-S American 26+ 1.0187 0.26594 any Dominican 26+ 0.9554 0.50275 any Spanish (Spain) 26+ 2.5976 0.86230 any Multi Hisp Eth 26+ 1.1345 0.66375 any NH White 26+ 1.5510 0.04156 any NH Black 26+ 2.8763 0.15133 any NH AM-AK 26+ 3.9674 0.76611 any NH HI-OPI 26+ 1.2919 0.66205 any NH Asian 26+ 0.7207 0.13870 any NH Multiracial 26+ 3.0668 0.52334 any Mexican 12-17 4.3152 0.53235 mrj Puerto Rican 12-17 3.7237 0.20969 mrj Cuban 12-17 2.0616 0.67248 mrj C-S American 12-17 3.3282 0.47392 mrj Dominican 12-17 1.3797 0.40435 mrj Spanish (Spain) 12-17 5.1810 0.93979 mrj Multi Hisp Eth 12-17 4.8915 0.94816 mrj NH White 12-17 3.6190
Re: [R] read. table()
Dear Arun, The issue is that the column names are incorrect. I will also look into the comment by Prof Ripley. Thanks for your continued support and help. Pradip str(read.delim(textConnection(xd1),header=TRUE,sep=\t)) 'data.frame': 195 obs. of 1 variable: $ raceage...percent..sepercent..flag_var: Factor w/ 195 levels Cuban 26+ 0.6653 0.31239 mrj,..: 27 148 13 140 108 193 169 100 85 67 ... names(agerace) [1] raceage...percent..sepercent..flag_var head(agerace) raceage...percent..sepercent..flag_var 1 Mexican 12-17 5.7926 0.64195 any 2 Puerto Rican 12-17 5.1975 0.24929 any 3Cuban 12-17 3.7977 1.00487 any 4 C-S American 12-17 4.3665 0.55329 any 5Dominican 12-17 1.8149 0.46677 any 6 Spanish (Spain) 12-17 6.1971 0.98386 any Pradip K. Muhuri, PhD Statistician Substance Abuse Mental Health Services Administration The Center for Behavioral Health Statistics and Quality Division of Population Surveys 1 Choke Cherry Road, Room 2-1071 Rockville, MD 20857 Tel: 240-276-1070 Fax: 240-276-1260 e-mail: pradip.muh...@samhsa.hhs.gov The Center for Behavioral Health Statistics and Quality your feedback. Please click on the following link to complete a brief customer survey: http://cbhsqsurvey.samhsa.gov -Original Message- From: arun [mailto:smartpink...@yahoo.com] Sent: Saturday, December 08, 2012 5:13 PM To: Muhuri, Pradip (SAMHSA/CBHSQ) Cc: David L Carlson; R help Subject: Re: [R] read. table() Hi, You can check the str() I assume it will be like this: str(read.delim(textConnection(Lines),header=TRUE,sep=\t)) #'data.frame':195 obs. of 1 variable: # $ raceage...percent..sepercent..flag_var: Factor w/ 195 levels C-S American 12-17 0.2399 0.15804 coc,..: 50 170 20 5 35 185 65 155 110 80 ... A.K. - Original Message - From: Muhuri, Pradip (SAMHSA/CBHSQ) pradip.muh...@samhsa.hhs.gov To: 'Prof Brian Ripley' rip...@stats.ox.ac.uk; r-help@r-project.org r-help@r-project.org Cc: Sent: Saturday, December 8, 2012 5:05 PM Subject: Re: [R] read. table() Dear Prof Ripley, Your hint is helpful, and I see considerable improvements in the results. The only issue is that the column names do not seem to be correct. I did not understand part of your comment, which says fortunes::fortune(14) applies although I read about the double colon operator- ns-dblcolon {base}. Could you please provide a little more hint for me to resolve the issue? Thanks and regards, # new result agerace - read.delim(textConnection(xd1), sep=\t, header=TRUE, as.is=TRUE) names(agerace) [1] raceage...percent..sepercent..flag_var head(agerace) raceage...percent..sepercent..flag_var 1 Mexican 12-17 5.7926 0.64195 any 2 Puerto Rican 12-17 5.1975 0.24929 any 3Cuban 12-17 3.7977 1.00487 any 4 C-S American 12-17 4.3665 0.55329 any 5Dominican 12-17 1.8149 0.46677 any 6 Spanish (Spain) 12-17 6.1971 0.98386 any Pradip K. Muhuri, PhD Statistician Substance Abuse Mental Health Services Administration The Center for Behavioral Health Statistics and Quality Division of Population Surveys 1 Choke Cherry Road, Room 2-1071 Rockville, MD 20857 Tel: 240-276-1070 Fax: 240-276-1260 e-mail: pradip.muh...@samhsa.hhs.gov The Center for Behavioral Health Statistics and Quality your feedback. Please click on the following link to complete a brief customer survey: http://cbhsqsurvey.samhsa.gov -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Prof Brian Ripley Sent: Saturday, December 08, 2012 2:29 PM To: r-help@r-project.org Subject: Re: [R] read.table() On 08/12/2012 19:10, Muhuri, Pradip (SAMHSA/CBHSQ) wrote: Hi List, I have spent more than 30 minutes, but failed to read in this file using the read.table() function. I could not figure out how to fix the following error. Well, we have a whole manual on this, mentioned on ?read.table (see See Also) Have you read it? fortunes::fortune(14) applies. The issue is what the separator is. You have specified whitespace, and that is not correct. The original might have had tabs (see ?read.delim) but as pasted into this email only a human can disentangle this file. Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : line 1 did not have 6 elements Any help would be be appreciated. Thanks, Pradip Muhuri ### below is the reproducible example xd1 - raceage percent sepercent flag_var Mexican 12-17 5.7926 0.64195 any Puerto Rican 12-17 5.1975 0.24929 any Cuban 12-17 3.7977 1.00487 any C-S American 12-17 4.3665 0.55329 any Dominican 12-17 1.8149 0.46677 any Spanish (Spain) 12-17 6.1971 0.98386
Re: [R] read. table()
Dear David and Arun, Thank you very much for your time and efforts and for resolving the issue. From this exchange, I have learned something new about reading the data files into R. Regards, Pradip Pradip K. Muhuri, PhD Statistician Substance Abuse Mental Health Services Administration The Center for Behavioral Health Statistics and Quality Division of Population Surveys 1 Choke Cherry Road, Room 2-1071 Rockville, MD 20857 Tel: 240-276-1070 Fax: 240-276-1260 e-mail: pradip.muh...@samhsa.hhs.gov The Center for Behavioral Health Statistics and Quality your feedback. Please click on the following link to complete a brief customer survey: http://cbhsqsurvey.samhsa.gov -Original Message- From: arun [mailto:smartpink...@yahoo.com] Sent: Saturday, December 08, 2012 8:45 PM To: Muhuri, Pradip (SAMHSA/CBHSQ) Cc: dcarl...@tamu.edu; R help Subject: Re: [R] read. table() Hi, David's method is much better than mine. Regarding the spaces in the race field, this should preserve them if you wish to try my method. source(Muhuri.txt) Lines1-readLines(textConnection(Lines)) Col1new-gsub( +$,,gsub(\\s+(\\D+)[[:digit:]]+\\+.*,\\1,gsub(\\s+(\\D+)[[:digit:]]+\\-.*,\\1,Lines1[-1]))) #changed Col2-gsub(\\s+\\D+([[:digit:]]+\\+.*),\\1,gsub(\\s+\\D+([[:digit:]]+\\-.*),\\1,Lines1[-1])) dat1-data.frame(Col1new,read.table(text=Col2,stringsAsFactors=FALSE,sep=),stringsAsFactors=FALSE) heading-unlist(strsplit(Lines1[1], )) colnames(dat1)-heading[heading!=] head(dat1) # race age percent sepercent flag_var #1 Mexican 12-17 5.7926 0.64195 any #2Puerto Rican 12-17 5.1975 0.24929 any #3 Cuban 12-17 3.7977 1.00487 any #4C-S American 12-17 4.3665 0.55329 any #5 Dominican 12-17 1.8149 0.46677 any #6 Spanish (Spain) 12-17 6.1971 0.98386 any str(dat1) #'data.frame':195 obs. of 5 variables: # $ race : chr Mexican Puerto Rican Cuban C-S American ... # $ age : chr 12-17 12-17 12-17 12-17 ... # $ percent : num 5.79 5.2 3.8 4.37 1.81 ... # $ sepercent: num 0.642 0.249 1.005 0.553 0.467 ... # $ flag_var : chr any any any any ... A.K. - Original Message - From: David L Carlson dcarl...@tamu.edu To: 'arun' smartpink...@yahoo.com; 'Muhuri, Pradip (SAMHSA/CBHSQ)' pradip.muh...@samhsa.hhs.gov Cc: 'R help' r-help@r-project.org Sent: Saturday, December 8, 2012 8:06 PM Subject: RE: [R] read. table() Arun's solution works but you lose your spaces in the race field. These commands will preserve them. We need to make sure that your file has two or more spaces between each field. The first gsub() command strips leading space. The second inserts a space before the digit 1 (that is where all the fields separated by a single space are). Then we convert two or more spaces to a comma. Finally you can use read.table(). Starting with your vector xd1 from your first posting: raw2 - readLines(con=textConnection(xd1)) raw2 - gsub(^ +, , raw2) raw2 - gsub( 1, 1, raw2) raw3 - gsub( +, ,, raw2) agerace - read.table(text=raw3, header=TRUE, sep=,, as.is=TRUE) str(agerace) 'data.frame': 195 obs. of 5 variables: $ race : chr Mexican Puerto Rican Cuban C-S American ... $ age : chr 12-17 12-17 12-17 12-17 ... $ percent : num 5.79 5.2 3.8 4.37 1.81 ... $ sepercent: num 0.642 0.249 1.005 0.553 0.467 ... $ flag_var : chr any any any any ... -Original Message- From: arun [mailto:smartpink...@yahoo.com] Sent: Saturday, December 08, 2012 5:11 PM To: Muhuri, Pradip (SAMHSA/CBHSQ) Cc: R help; David L Carlson Subject: Re: [R] read. table() HI Pradip, Try this: source(Muhuri.txt) #Muhuri.txt Lines- raceage percent sepercent flag_var Mexican 12-17 5.7926 0.64195 any-- --- Lines1-readLines(textConnection(Lines)) Col1new-gsub( ,,gsub(\\s+(\\D+)[[:digit:]]+\\+.*,\\1,gsub(\\s+(\\D+)[[:digit: ]]+\\-.*,\\1,Lines1[-1]))) Col2- gsub(\\s+\\D+([[:digit:]]+\\+.*),\\1,gsub(\\s+\\D+([[:digit:]]+\\- .*),\\1,Lines1[-1])) dat1- data.frame(Col1new,read.table(text=Col2,stringsAsFactors=FALSE,sep=), stringsAsFactors=FALSE) heading-unlist(strsplit(Lines1[1], )) colnames(dat1)-heading[heading!=] head(dat1,6) #race age percent sepercent flag_var #1Mexican 12-17 5.7926 0.64195 any #2PuertoRican 12-17 5.1975 0.24929 any #3 Cuban 12-17 3.7977 1.00487 any #4C-SAmerican 12-17 4.3665 0.55329 any #5 Dominican 12-17 1.8149 0.46677 any #6 Spanish(Spain) 12-17 6.1971 0.98386 any str(dat1) 'data.frame':195 obs. of 5 variables: $ race : chr Mexican PuertoRican Cuban C-SAmerican ... $ age : chr 12-17 12-17 12-17 12-17 ... $ percent : num 5.79 5.2 3.8 4.37 1.81 ... $ sepercent: num 0.642 0.249 1.005 0.553 0.467
[R] subsetting - questions
Hello, I have two very basic questions (console attached): 1) What am I getting an error message for # 5 and # 7 ? 2) How to fix the code? I would appreciate receiving your help. Thanks, Pradip Muhuri ## Reproducible Example # N - 100 set.seed(13) df-data.frame(matrix(sample(c(1:10),N, replace=TRUE),ncol=5)) keep_var - c(X1, X2) drop_var - c(X3, X4, X5) df[df$X1=8,] [,1:2] #1 df[df$X1=8,] [,-c(3,4,5)] #2 df[df$X1=8,] [,c(-3,-4,-5)] #3 df[df$X1=8,] [,c(X1, X2)] #4 df[df$X1=8,] [,-c(X3, X4, X5)] #5 DOES NOT WORK df[df$X1=8,] [,keep_var] #6 df[df$X1=8,] [, !drop_var]#7 DOES NOT WORK __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] subsetting - questions
Hi Arun, Thank you so much for your help. Pradip From: arun [smartpink...@yahoo.com] Sent: Friday, November 23, 2012 10:15 PM To: Muhuri, Pradip (SAMHSA/CBHSQ) Cc: R help Subject: Re: [R] subsetting - questions HI, This should work: df[df$X1=8,][-which(names(df)%in% c(X3,X4,X5))] # X1 X2 #1 8 2 #5 10 1 #8 8 5 #9 9 4 #12 9 5 #13 9 10 #19 9 8 df[df$X1=8,][,!names(df)%in%drop_var] # X1 X2 #1 8 2 #5 10 1 #8 8 5 #9 9 4 #12 9 5 #13 9 10 #19 9 8 A.K. - Original Message - From: Muhuri, Pradip (SAMHSA/CBHSQ) pradip.muh...@samhsa.hhs.gov To: r-help@r-project.org r-help@r-project.org Cc: Sent: Friday, November 23, 2012 9:55 PM Subject: [R] subsetting - questions Hello, I have two very basic questions (console attached): 1) What am I getting an error message for # 5 and # 7 ? 2) How to fix the code? I would appreciate receiving your help. Thanks, Pradip Muhuri ## Reproducible Example # N - 100 set.seed(13) df-data.frame(matrix(sample(c(1:10),N, replace=TRUE),ncol=5)) keep_var - c(X1, X2) drop_var - c(X3, X4, X5) df[df$X1=8,] [,1:2] #1 df[df$X1=8,] [,-c(3,4,5)] #2 df[df$X1=8,] [,c(-3,-4,-5)] #3 df[df$X1=8,] [,c(X1, X2)] #4 df[df$X1=8,] [,-c(X3, X4, X5)] #5 DOES NOT WORK df[df$X1=8,] [,keep_var] #6 df[df$X1=8,] [, !drop_var] #7 DOES NOT WORK __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] subsetting - questions
Hi Jorge, I could use subset(). But, I wanted to minimize coding. Thanks, Pradip From: Jorge I Velez [jorgeivanve...@gmail.com] Sent: Friday, November 23, 2012 10:02 PM To: Muhuri, Pradip (SAMHSA/CBHSQ) Cc: r-help@r-project.org Subject: Re: [R] subsetting - questions Hi Pradip, It is easier to use subset(). Check ?subset for some examples and pay special attention to the select parameter. By the way, do not call your data df as it is already a function. Best, Jorge.- On Sat, Nov 24, 2012 at 1:55 PM, Muhuri, Pradip (SAMHSA/CBHSQ) wrote: Hello, I have two very basic questions (console attached): 1) What am I getting an error message for # 5 and # 7 ? 2) How to fix the code? I would appreciate receiving your help. Thanks, Pradip Muhuri ## Reproducible Example # N - 100 set.seed(13) df-data.frame(matrix(sample(c(1:10),N, replace=TRUE),ncol=5)) keep_var - c(X1, X2) drop_var - c(X3, X4, X5) df[df$X1=8,] [,1:2] #1 df[df$X1=8,] [,-c(3,4,5)] #2 df[df$X1=8,] [,c(-3,-4,-5)] #3 df[df$X1=8,] [,c(X1, X2)] #4 df[df$X1=8,] [,-c(X3, X4, X5)] #5 DOES NOT WORK df[df$X1=8,] [,keep_var] #6 df[df$X1=8,] [, !drop_var]#7 DOES NOT WORK __ R-help@r-project.orgmailto:R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] subsetting - questions
Hello Peter, 1. -c(X3, X4, X5) For the above variables, class is integer. Arun has suggested the following: df[df$X1=8,][-which(names(df)%in% c(X3,X4,X5))] 2.df[df$X1=8,] [, !names(df) %in% drop_var] I agree - Arun has also suggested the same. Thanks and regards, Pradip From: Peter Ehlers [ehl...@ucalgary.ca] Sent: Friday, November 23, 2012 10:47 PM To: Muhuri, Pradip (SAMHSA/CBHSQ) Cc: r-help@r-project.org Subject: Re: [R] subsetting - questions On 2012-11-23 18:55, Muhuri, Pradip (SAMHSA/CBHSQ) wrote: Hello, I have two very basic questions (console attached): 1) What am I getting an error message for # 5 and # 7 ? 2) How to fix the code? I would appreciate receiving your help. Thanks, Pradip Muhuri ## Reproducible Example # N - 100 set.seed(13) df-data.frame(matrix(sample(c(1:10),N, replace=TRUE),ncol=5)) keep_var - c(X1, X2) drop_var - c(X3, X4, X5) df[df$X1=8,] [,1:2] #1 df[df$X1=8,] [,-c(3,4,5)] #2 df[df$X1=8,] [,c(-3,-4,-5)] #3 df[df$X1=8,] [,c(X1, X2)] #4 df[df$X1=8,] [,-c(X3, X4, X5)] #5 DOES NOT WORK df[df$X1=8,] [,keep_var] #6 df[df$X1=8,] [, !drop_var]#7 DOES NOT WORK To see what's wrong, just print the problematic part: -c(X3, X4, X5) You can't negate a character vector; you have to have a numeric vector. And !drop_var doesn't work because you need something that evaluates to a logical value if you want to ! it. This will do it: df[df$X1=8,] [, !names(df) %in% drop_var] Or use the subset() function, as Jorge suggests. Peter Ehlers __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Data Extraction
Hello, I would appreciate if someone could help me resolve the following: 1. df1[!is.na( X1 | X2 | X3 | X4 | X5),][,1:5] # This does not work 2. Is these message harmful? The following object(s) are masked from 'df1 (position 3)': X1, X2, X3, X4, X5 Thanks, Pradip Muhuri #Reproducible Example set.seed(5) df1-data.frame(matrix(sample(c(1:10,NA),100,replace=TRUE),ncol=5)) attach (df1) #delete rows if any of them NA for X1 df1[!is.na( X1),][,1:5] # This works #delete rows if any of them NA for X1, X2, X3, X4 or X5 df1[!is.na( X1 | X2 | X3 | X4 | X5),][,1:5] # This does not work __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data Extraction
Petr, You have shown a solution that is the simplest. Thanks and regards, Pradip Muhuri Beginner useR From: PIKAL Petr [petr.pi...@precheza.cz] Sent: Thursday, November 22, 2012 9:33 AM To: Muhuri, Pradip (SAMHSA/CBHSQ); r-help@r-project.org Subject: RE: Data Extraction Hi do you want this? df1[complete.cases(df1),] X1 X2 X3 X4 X5 2 8 8 3 2 10 6 8 6 7 10 1 11 4 5 5 10 8 12 6 1 7 8 4 17 5 7 3 1 3 18 10 7 3 8 7 19 7 5 3 5 6 20 10 5 2 4 6 Regards Petr -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Muhuri, Pradip (SAMHSA/CBHSQ) Sent: Thursday, November 22, 2012 3:11 PM To: Muhuri, Pradip (SAMHSA/CBHSQ); r-help@r-project.org Subject: [R] Data Extraction Hello, I would appreciate if someone could help me resolve the following: 1. df1[!is.na( X1 | X2 | X3 | X4 | X5),][,1:5] # This does not work 2. Is these message harmful? The following object(s) are masked from 'df1 (position 3)': X1, X2, X3, X4, X5 Thanks, Pradip Muhuri #Reproducible Example set.seed(5) df1-data.frame(matrix(sample(c(1:10,NA),100,replace=TRUE),ncol=5)) attach (df1) #delete rows if any of them NA for X1 df1[!is.na( X1),][,1:5] # This works #delete rows if any of them NA for X1, X2, X3, X4 or X5 df1[!is.na( X1 | X2 | X3 | X4 | X5),][,1:5] # This does not work __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data Extraction
Hi Bert, Your solution is similar to Petr's. Thanks and regards, Pradip Muhuri BeginneR UseR From: Bert Gunter [gunter.ber...@gene.com] Sent: Thursday, November 22, 2012 10:20 AM To: Berend Hasselman Cc: Muhuri, Pradip (SAMHSA/CBHSQ); r-help@r-project.org Subject: Re: [R] Data Extraction Unnecessarily complicated. ?na.omit (linked from ?complete.cases) df - na.omit(df) -- Bert On Thu, Nov 22, 2012 at 6:49 AM, Berend Hasselman b...@xs4all.nlmailto:b...@xs4all.nl wrote: On 22-11-2012, at 15:11, Muhuri, Pradip (SAMHSA/CBHSQ) wrote: Hello, I would appreciate if someone could help me resolve the following: 1. df1[!is.nahttp://is.na( X1 | X2 | X3 | X4 | X5),][,1:5] # This does not work 2. Is these message harmful? The following object(s) are masked from 'df1 (position 3)': X1, X2, X3, X4, X5 Thanks, Pradip Muhuri #Reproducible Example set.seed(5) df1-data.frame(matrix(sample(c(1:10,NA),100,replace=TRUE),ncol=5)) attach (df1) #delete rows if any of them NA for X1 df1[!is.nahttp://is.na( X1),][,1:5] # This works #delete rows if any of them NA for X1, X2, X3, X4 or X5 df1[!is.nahttp://is.na( X1 | X2 | X3 | X4 | X5),][,1:5] # This does not work Yet another way of doing this is df1[!is.nahttp://is.na(rowSums(df1)),][1:5] But Petr's solution appears to be quickest. See this: N - 10 set.seed(13) df - data.frame(matrix(sample(c(1:10,NA),N,replace=TRUE),ncol=50)) library(rbenchmark) f1 - function(df) {df[apply(df, 1, function(x)all(!is.nahttp://is.na(x))),][,1:ncol(df)]} f2 - function(df) {df[!is.nahttp://is.na(rowSums(df)),][1:ncol(df)]} f3 - function(df) {df[complete.cases(df),][1:ncol(df)]} benchmark(d1 - f1(df), d2 - f2(df), d3 - f3(df), columns=c(test,elapsed, relative, replications)) test elapsed relative replications 1 d1 - f1(df) 3.675 13.172 100 2 d2 - f2(df) 0.4011.437 100 3 d3 - f3(df) 0.2791.000 100 identical(d1,d2) [1] TRUE identical(d1,d3) [1] TRUE Berend __ R-help@r-project.orgmailto:R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data Extraction
Hi Berend, You have compared all 3 ways. ... very nicely evaluated. Thanks and regards, Pradip Muhuri Beginner UseR From: Berend Hasselman [b...@xs4all.nl] Sent: Thursday, November 22, 2012 9:49 AM To: Muhuri, Pradip (SAMHSA/CBHSQ) Cc: r-help@r-project.org Subject: Re: [R] Data Extraction On 22-11-2012, at 15:11, Muhuri, Pradip (SAMHSA/CBHSQ) wrote: Hello, I would appreciate if someone could help me resolve the following: 1. df1[!is.na( X1 | X2 | X3 | X4 | X5),][,1:5] # This does not work 2. Is these message harmful? The following object(s) are masked from 'df1 (position 3)': X1, X2, X3, X4, X5 Thanks, Pradip Muhuri #Reproducible Example set.seed(5) df1-data.frame(matrix(sample(c(1:10,NA),100,replace=TRUE),ncol=5)) attach (df1) #delete rows if any of them NA for X1 df1[!is.na( X1),][,1:5] # This works #delete rows if any of them NA for X1, X2, X3, X4 or X5 df1[!is.na( X1 | X2 | X3 | X4 | X5),][,1:5] # This does not work Yet another way of doing this is df1[!is.na(rowSums(df1)),][1:5] But Petr's solution appears to be quickest. See this: N - 10 set.seed(13) df - data.frame(matrix(sample(c(1:10,NA),N,replace=TRUE),ncol=50)) library(rbenchmark) f1 - function(df) {df[apply(df, 1, function(x)all(!is.na(x))),][,1:ncol(df)]} f2 - function(df) {df[!is.na(rowSums(df)),][1:ncol(df)]} f3 - function(df) {df[complete.cases(df),][1:ncol(df)]} benchmark(d1 - f1(df), d2 - f2(df), d3 - f3(df), columns=c(test,elapsed, relative, replications)) test elapsed relative replications 1 d1 - f1(df) 3.675 13.172 100 2 d2 - f2(df) 0.4011.437 100 3 d3 - f3(df) 0.2791.000 100 identical(d1,d2) [1] TRUE identical(d1,d3) [1] TRUE Berend __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data Extraction
Hi Sarah, I am glad you have precisely caught where I made the mistake. Thank you so much. regards, Pradip Muhuri From: Sarah Goslee [sarah.gos...@gmail.com] Sent: Thursday, November 22, 2012 9:21 AM To: Muhuri, Pradip (SAMHSA/CBHSQ) Cc: r-help@r-project.org Subject: Re: [R] Data Extraction Hi, is.nahttp://is.na/( X1 | X2 | X3 | X4 | X5) isn't a valid construct. You'd need !(is.nahttp://is.na(X1) | is.nahttp://is.na(X2) etc ) Or more elegantly df1[apply(df1, 1, function(x)all(!is.nahttp://is.na(x))), ] Sarah On Thursday, November 22, 2012, Muhuri, Pradip (SAMHSA/CBHSQ) wrote: Hello, I would appreciate if someone could help me resolve the following: 1. df1[!is.nahttp://is.na( X1 | X2 | X3 | X4 | X5),][,1:5] # This does not work 2. Is these message harmful? The following object(s) are masked from 'df1 (position 3)': X1, X2, X3, X4, X5 Thanks, Pradip Muhuri #Reproducible Example set.seed(5) df1-data.frame(matrix(sample(c(1:10,NA),100,replace=TRUE),ncol=5)) attach (df1) #delete rows if any of them NA for X1 df1[!is.nahttp://is.na( X1),][,1:5] # This works #delete rows if any of them NA for X1, X2, X3, X4 or X5 df1[!is.nahttp://is.na( X1 | X2 | X3 | X4 | X5),][,1:5] # This does not work __ R-help@r-project.orgjavascript:; mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Sarah Goslee http://www.stringpage.com http://www.sarahgoslee.com http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data Extraction - benchmark()
Hi Berend, I see you are one of the contributors to the rbecnhmark package. I am sorry that I am bothering you again. I have tried to run your code (slightly tweaked) involving the benchmark function, and I am getting the following error message. What am I doing wrong? Error in benchmark(d1 - s1(df), d2 - s2(df), d3 - s3(df), d4 - s4(df), : could not find function s1 identical (d1,d2), identical (d1,d3), identical (d1,d4), identical (d1,d5), identical (d1,d6) Error: unexpected ',' in identical (d1,d2), sessionInfo () R version 2.15.1 (2012-06-22) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] rbenchmark_1.0.0 loaded via a namespace (and not attached): [1] tools_2.15.1 I would appreciate receiving your help if your time permits .. Thanks and regards, Pradip Muhuri # Berend's code extended N - 10 set.seed(13) df-data.frame(matrix(sample(c(1:10,NA),N, replace=TRUE),ncol=50)) s1 - df[complete.cases(df),] s2 - na.omit(df) s3 - df[apply(df, 1, function(x)all(!is.na(x))), ] s4 - function(df) {df[apply(df, 1, function(x)all(!is.na(x))),][,1:ncol(df)]} s5 - function(df) {df[!is.na(rowSums(df)),][1:ncol(df)]} s6 - function(df) {df[complete.cases(df),][1:ncol(df)]} require(rbenchmark) benchmark( d1 - s1(df), d2 - s2(df), d3 - s3(df), d4 - s4(df), d5 - s5(df), d6 - s6(df), columns=c(test,elapsed, relative, replications) ) identical (d1,d2), identical (d1,d3), identical (d1,d4), identical (d1,d5), identical (d1,d6) From: Berend Hasselman [b...@xs4all.nl] Sent: Thursday, November 22, 2012 11:03 AM To: Muhuri, Pradip (SAMHSA/CBHSQ) Cc: r-help@r-project.org Subject: Re: [R] Data Extraction On 22-11-2012, at 16:50, Muhuri, Pradip (SAMHSA/CBHSQ) wrote: Hi Berend, You have compared all 3 ways. ... very nicely evaluated. Bert's solution is indeed nice and simple. But Petr's solution is still the quickest: N - 10 set.seed(13) df - data.frame(matrix(sample(c(1:10,NA),N,replace=TRUE),ncol=50)) library(rbenchmark) f1 - function(df) {df[apply(df, 1, function(x)all(!is.na(x))),]} f2 - function(df) {df[!is.na(rowSums(df)),]} f3 - function(df) {df[complete.cases(df),]} f4 - function(df) {data.frame(na.omit(df))} benchmark(d1 - f1(df), d2 - f2(df), d3 - f3(df), d4 - f4(df), columns=c(test,elapsed, relative, replications)) test elapsed relative replications 1 d1 - f1(df) 3.588 14.888 100 2 d2 - f2(df) 0.4031.672 100 3 d3 - f3(df) 0.2411.000 100 4 d4 - f4(df) 0.5572.311 100 identical(d1,d2) [1] TRUE identical(d1,d3) [1] TRUE identical(d1,d4) [1] TRUE Berend __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data Extraction - benchmark()
Hi Berend, Thank you very much for pointing out the mistake and for your patience. I have corrected the the script, which has worked fine. regards, Pradip Muhuri From: Berend Hasselman [b...@xs4all.nl] Sent: Thursday, November 22, 2012 12:42 PM To: Muhuri, Pradip (SAMHSA/CBHSQ) Cc: r-help@r-project.org Subject: Re: [R] Data Extraction - benchmark() On 22-11-2012, at 18:20, Muhuri, Pradip (SAMHSA/CBHSQ) wrote: Hi Berend, I see you are one of the contributors to the rbecnhmark package. I am sorry that I am bothering you again. I have tried to run your code (slightly tweaked) involving the benchmark function, and I am getting the following error message. What am I doing wrong? Error in benchmark(d1 - s1(df), d2 - s2(df), d3 - s3(df), d4 - s4(df), : could not find function s1 Because you haven't defined a function s1 (or s2, s3, s4 for that matter). You did s1 - df[complete.cases(df),] Berend identical (d1,d2), identical (d1,d3), identical (d1,d4), identical (d1,d5), identical (d1,d6) Error: unexpected ',' in identical (d1,d2), sessionInfo () R version 2.15.1 (2012-06-22) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] rbenchmark_1.0.0 loaded via a namespace (and not attached): [1] tools_2.15.1 I would appreciate receiving your help if your time permits .. Thanks and regards, Pradip Muhuri # Berend's code extended N - 10 set.seed(13) df-data.frame(matrix(sample(c(1:10,NA),N, replace=TRUE),ncol=50)) s1 - df[complete.cases(df),] s2 - na.omit(df) s3 - df[apply(df, 1, function(x)all(!is.na(x))), ] s4 - function(df) {df[apply(df, 1, function(x)all(!is.na(x))),][,1:ncol(df)]} s5 - function(df) {df[!is.na(rowSums(df)),][1:ncol(df)]} s6 - function(df) {df[complete.cases(df),][1:ncol(df)]} require(rbenchmark) benchmark( d1 - s1(df), d2 - s2(df), d3 - s3(df), d4 - s4(df), d5 - s5(df), d6 - s6(df), columns=c(test,elapsed, relative, replications) ) identical (d1,d2), identical (d1,d3), identical (d1,d4), identical (d1,d5), identical (d1,d6) From: Berend Hasselman [b...@xs4all.nl] Sent: Thursday, November 22, 2012 11:03 AM To: Muhuri, Pradip (SAMHSA/CBHSQ) Cc: r-help@r-project.org Subject: Re: [R] Data Extraction On 22-11-2012, at 16:50, Muhuri, Pradip (SAMHSA/CBHSQ) wrote: Hi Berend, You have compared all 3 ways. ... very nicely evaluated. Bert's solution is indeed nice and simple. But Petr's solution is still the quickest: N - 10 set.seed(13) df - data.frame(matrix(sample(c(1:10,NA),N,replace=TRUE),ncol=50)) library(rbenchmark) f1 - function(df) {df[apply(df, 1, function(x)all(!is.na(x))),]} f2 - function(df) {df[!is.na(rowSums(df)),]} f3 - function(df) {df[complete.cases(df),]} f4 - function(df) {data.frame(na.omit(df))} benchmark(d1 - f1(df), d2 - f2(df), d3 - f3(df), d4 - f4(df), columns=c(test,elapsed, relative, replications)) test elapsed relative replications 1 d1 - f1(df) 3.588 14.888 100 2 d2 - f2(df) 0.4031.672 100 3 d3 - f3(df) 0.2411.000 100 4 d4 - f4(df) 0.5572.311 100 identical(d1,d2) [1] TRUE identical(d1,d3) [1] TRUE identical(d1,d4) [1] TRUE Berend __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] github
Thanks, Michael. Pradip Pradip K. Muhuri, PhD Statistician Substance Abuse Mental Health Services Administration The Center for Behavioral Health Statistics and Quality Division of Population Surveys 1 Choke Cherry Road, Room 2-1071 Rockville, MD 20857 Tel: 240-276-1070 Fax: 240-276-1260 e-mail: pradip.muh...@samhsa.hhs.gov The Center for Behavioral Health Statistics and Quality your feedback. Please click on the following link to complete a brief customer survey: http://cbhsqsurvey.samhsa.gov -Original Message- From: R. Michael Weylandt [mailto:michael.weyla...@gmail.com] Sent: Tuesday, November 20, 2012 8:41 AM To: Muhuri, Pradip (SAMHSA/CBHSQ) Cc: r-help@r-project.org Subject: Re: [R] github On Tue, Nov 20, 2012 at 2:07 AM, Muhuri, Pradip (SAMHSA/CBHSQ) pradip.muh...@samhsa.hhs.gov wrote: Hello, I would like to learn how to set up Github/repository and upload/update files and am looking for Github for Dummies. Any help will be appreciated. I believe Hadley has done some github integration work: https://github.com/hadley/devtools Michael __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] kinitr
Dear Michael, I really appreciated that you have sent me the link info - Jeromy Anglim's Blog. I was exactly looking for this kind of resources about R Markdown and knitr. All this would be of immense help. Thank you so much. Pradip Pradip K. Muhuri, PhD Statistician Substance Abuse Mental Health Services Administration The Center for Behavioral Health Statistics and Quality Division of Population Surveys 1 Choke Cherry Road, Room 2-1071 Rockville, MD 20857 Tel: 240-276-1070 Fax: 240-276-1260 e-mail: pradip.muh...@samhsa.hhs.gov The Center for Behavioral Health Statistics and Quality your feedback. Please click on the following link to complete a brief customer survey: http://cbhsqsurvey.samhsa.gov -Original Message- From: R. Michael Weylandt [mailto:michael.weyla...@gmail.com] Sent: Tuesday, November 20, 2012 8:36 AM To: Muhuri, Pradip (SAMHSA/CBHSQ) Cc: r-help@r-project.org Subject: Re: [R] kinitr On Tue, Nov 20, 2012 at 1:57 AM, Muhuri, Pradip (SAMHSA/CBHSQ) pradip.muh...@samhsa.hhs.gov wrote: Hello, I am an Intro-level R and ggplot2 user and looking for resources to self teach dynamic report generation in R using knitr. Any advice would be highly appreciated. http://jeromyanglim.blogspot.co.uk/2012/05/getting-started-with-r-markdown-knitr.html Michael Weylandt __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] kinitr
Mark- Thank you for your help. Pradip Pradip K. Muhuri, PhD Statistician Substance Abuse Mental Health Services Administration The Center for Behavioral Health Statistics and Quality Division of Population Surveys 1 Choke Cherry Road, Room 2-1071 Rockville, MD 20857 Tel: 240-276-1070 Fax: 240-276-1260 e-mail: pradip.muh...@samhsa.hhs.govmailto:pradip.muh...@samhsa.hhs.gov The Center for Behavioral Health Statistics and Quality your feedback. Please click on the following link to complete a brief customer survey: http://cbhsqsurvey.samhsa.govhttp://cbhsqsurvey.samhsa.gov/ From: Mark Lamias [mailto:mlam...@yahoo.com] Sent: Tuesday, November 20, 2012 4:45 PM To: Muhuri, Pradip (SAMHSA/CBHSQ); 'R. Michael Weylandt' Cc: r-help@r-project.org Subject: Re: [R] kinitr This is how I learned everything about knitr: http://yihui.name/knitr/ Yihui is great and his site gives you pretty much all the information you need to get you started. From: Muhuri, Pradip (SAMHSA/CBHSQ) pradip.muh...@samhsa.hhs.gov To: 'R. Michael Weylandt' michael.weyla...@gmail.com Cc: r-help@r-project.org r-help@r-project.org Sent: Tuesday, November 20, 2012 9:10 AM Subject: Re: [R] kinitr Dear Michael, I really appreciated that you have sent me the link info - Jeromy Anglim's Blog. I was exactly looking for this kind of resources about R Markdown and knitr. All this would be of immense help. Thank you so much. Pradip Pradip K. Muhuri, PhD Statistician Substance Abuse Mental Health Services Administration The Center for Behavioral Health Statistics and Quality Division of Population Surveys 1 Choke Cherry Road, Room 2-1071 Rockville, MD 20857 Tel: 240-276-1070 Fax: 240-276-1260 e-mail: pradip.muh...@samhsa.hhs.govmailto:pradip.muh...@samhsa.hhs.gov The Center for Behavioral Health Statistics and Quality your feedback. Please click on the following link to complete a brief customer survey: http://cbhsqsurvey.samhsa.gov -Original Message- From: R. Michael Weylandt [mailto:michael.weyla...@gmail.commailto:michael.weyla...@gmail.com] Sent: Tuesday, November 20, 2012 8:36 AM To: Muhuri, Pradip (SAMHSA/CBHSQ) Cc: r-help@r-project.orgmailto:r-help@r-project.org Subject: Re: [R] kinitr On Tue, Nov 20, 2012 at 1:57 AM, Muhuri, Pradip (SAMHSA/CBHSQ) pradip.muh...@samhsa.hhs.govmailto:pradip.muh...@samhsa.hhs.gov wrote: Hello, I am an Intro-level R and ggplot2 user and looking for resources to self teach dynamic report generation in R using knitr. Any advice would be highly appreciated. http://jeromyanglim.blogspot.co.uk/2012/05/getting-started-with-r-markdown-knitr.html Michael Weylandt __ R-help@r-project.orgmailto:R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] kinitr
Hello, I am an Intro-level R and ggplot2 user and looking for resources to self teach dynamic report generation in R using knitr. Any advice would be highly appreciated. Thanks, Pradip __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] github
Hello, I would like to learn how to set up Github/repository and upload/update files and am looking for Github for Dummies. Any help will be appreciated. Thanks, Pradip __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Saving R Graph to a file
Hello, #Example 1: The following code to save svyboxplots works for me pdf(boxplots_dthage.pdf, width = 1020) # 4 boxplots in 2 columns and 2 rows par(mfrow=c(2,2), oma=c(0,0,0,0)) # svyboxplot commands not shown dev.off() #Example 2: The following code to save a ggplot graph works for me: # ggolot () not shown print (p) ggsave(file='Xfacet_abodill_age3.pdf', width=12, height=8) Thanks, Pradip Muhuri From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] On Behalf Of Robert Baer [rb...@atsu.edu] Sent: Sunday, November 04, 2012 5:32 AM To: frespider Cc: r-help@r-project.org Subject: Re: [R] Saving R Graph to a file Some hints: For pdf(), height and width are in inches, not pixels. dev.off() is necessary after drawing the image for pdf(). The name for the file argument (file=c:/figure.xxx) is file not filename hist(CO2[,5]) is more interesting And yes, ?pdf ?postscript ?ping On 11/3/2012 11:16 PM, frespider wrote: Hi I am not sure why I can't get my plot saved to a file as .ps, I searched online and I found that I have to use something is called postscript,png or pdf function which I did but still not working. Actually what I have is a matrix with almost 300-400 columns. I need to create a histogram and boxplot for some columns as .ps file (with reasonable size if i can adjust that would be nice also) so I can import them in my latex code to display a good chart on my report. And I found out R display a certain limit of device. Can you please help me code this? This an example I create data(CO2) png(filename=C:/R/figure.png, height=295, width=300, bg=white) hist(CO2[,4]) device.off() pdf(filename=C:/R/figure.pdf, height=295, width=300, bg=white) hist(CO2[,4]) postscript(filename=C:/R/figure.pdf, height=295, width=300, bg=white) hist(CO2[,4]) Thanks -- View this message in context: http://r.789695.n4.nabble.com/Saving-R-Graph-to-a-file-tp4648369.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- __ Robert W Baer, Ph.D. Professor of Physiology Kirksville College of Osteopathic Medicine A. T. Still University of Health Sciences Kirksville, MO 63501 US __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Logical vector-based extraction
Hello, The most part of the program works except that the following logical variable does not get created although the second logical variable-based extraction works. I don't understand what I am doing wrong here. state_pflt200 - df$p_fatal 200 df[state_pflt200, c(state.name,p_fatal)] I would appreciate receiving your help. Thanks, Pradip Muhuri # Below is the code that includes the reproducible example. df - data.frame (state.name= c(Alabama,Alaska,Arizona,Arkansas,California,Colorado,Connecticut, Delaware,DC, Florida,Georgia,Hawaii,Idaho,Illinois,Indiana, Iowa,Kansas,Kentucky, Louisiana,Maine,Maryland,Massachusetts,Michigan, Minnesota,Mississippi,Missouri,Montana,Nebraska,Nevada,New Hampshire, New Jersey,New Mexico,New York,North Carolina,North Dakota,Ohio,Oklahoma, Oregon,Pennsylvania,Rhode Island,South Carolina,South Dakota,Tennessee,Texas, Utah, Vermont,Virginia,Washington,West Virginia,Wisconsin,Wyoming), p_fatal = sample(200:500,51,replace=TRUE), t_safety_score = sample(1:10,51,replace=TRUE) ) options (width=120) # The following logical variable does not get created - Don't understand what I am doing wrong state_pflt200 - df$p_fatal 200 df[state_pflt200, c(state.name,p_fatal)] # The following works state_sslt5 - df$t_safety_score 5 df[state_sslt5,c(state.name, t_safety_score)] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How generate random numbers from given vector???
Hello, The other options is to use the sample() function. test2 - matrix (rep(sample(number1, size = 5), times=3), nrow=3) Pradip Muhuri From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] On Behalf Of Rui Barradas [ruipbarra...@sapo.pt] Sent: Thursday, October 25, 2012 7:19 PM To: Rlotus Cc: r-help@r-project.org Subject: Re: [R] How generate random numbers from given vector??? Hello, You don't need the loop, the sample() argument 'size' is there for that. See 'sample. number - c(0,1,3,4,5,6,8) rsidp - function(n) sample(number, n, replace = TRUE) rsidp(5) Hope this helps, Rui Barradas Em 25-10-2012 20:24, Rlotus escreveu: I wanna generate random numbers from a vector... for example number-c(0,1,3,4,5,6,8) so rsidp-function(x){ i=0 for (i in seq(1:x)) {y-sample(number,x, replace=T)} return(y) } so all random numbers have to be from vector number; so if I type rsidp(5). it has to give me 5 random numbers except 2,7,9 (because they are not in the vector numbers). help me plz with it ((( -- View this message in context: http://r.789695.n4.nabble.com/How-generate-random-numbers-from-given-vector-tp4647447.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] svyboxplot - library (survey)
Hello Dr. Lumley, Thank you for your advice/suggestions. I have rescaled the weight (i.e., original weight divided by total weighted count averaged across 8 surveys - NHIS). As can be seen below (R console), the new weight sums to 1. I have used the freq=TRUE argument in the svyhist () function along with a new svydesign object which includes the recalled weight. There are two issues: 1) I am getting a warning message: In plot.histogram(h, ..., freq = freq, xlab =xlab, main = main) : the AREAS in the plot are wrong -- rather use freq=FALSE. 2) The scale of two graphs looks different (please see the attachment). Any thoughts on how to resolve these issues? Regards, Pradip Muhuri ## R console is appended below ## options (width=120) sum (tor$new_wt) [1] 1 # object with survey design variables and data with new_wt (rescaled) that sums to 1 xnhis - svydesign (id=~psu,strat=~stratum, weights=~new_wt, data=tor, nest=TRUE) MyBreaks - c(18, 25, 35, 45, 55, 65, 75, 85, 95) par(mfrow=c(2,2)) # Chart 1 options( survey.lonely.psu = adjust ) svyhist (~age_p, + subset (xnhis, xspd2=='SPD'), breaks=MyBreaks, + #ylim = c(0,0.040), + main= , freq=TRUE, + col=red, + xlab=Age at Interview (SPD Category) + ) Warning message: In plot.histogram(h, ..., freq = freq, xlab = xlab, main = main) : the AREAS in the plot are wrong -- rather use freq=FALSE #lines (svysmooth(~age_p, bandwidth=5,subset(nhis, xspd2=='SPD')), lwd=2) #Chart 2 options( survey.lonely.psu = adjust ) svyhist (~age_p, + subset (xnhis, xspd2=='No SPD'), breaks=MyBreaks, + #ylim = c(0,0.040), + main= , freq=TRUE, + col=yellow, xlab=Age at Interview (No SPD Category) + ) Warning message: In plot.histogram(h, ..., freq = freq, xlab = xlab, main = main) : the AREAS in the plot are wrong -- rather use freq=FALSE Pradip K. Muhuri Statistician Substance Abuse Mental Health Services Administration The Center for Behavioral Health Statistics and Quality Division of Population Surveys 1 Choke Cherry Road, Room 2-1071 Rockville, MD 20857 Tel: 240-276-1070 Fax: 240-276-1260 e-mail: pradip.muh...@samhsa.hhs.gov The Center for Behavioral Health Statistics and Quality your feedback. Please click on the following link to complete a brief customer survey: http://cbhsqsurvey.samhsa.gov -Original Message- From: Thomas Lumley [mailto:tlum...@uw.edu] Sent: Wednesday, October 17, 2012 11:13 PM To: Muhuri, Pradip (SAMHSA/CBHSQ) Cc: Anthony Damico; R help Subject: Re: [R] svyboxplot - library (survey) On Thu, Oct 18, 2012 at 2:04 PM, Muhuri, Pradip (SAMHSA/CBHSQ) pradip.muh...@samhsa.hhs.gov wrote: Hello, I understand that svyhist () provides density histograms with density values on the y-axis (R code shown below). Is there a way one can have relative relative frequency histograms with relative freqencies on the y-axis? You get frequencies just by asking for them with freq: compare svyhist(~enroll, dstrat, main=Survey weighted,col=purple,freq=TRUE) svyhist(~enroll, dstrat, main=Survey weighted,col=purple) If you mean that you want the heights of the bars to sum to 1, the simplest way I know of is to rescale the weights to sum to 1 and use freq=TRUE -thomas Any advice/help would be appreciated. Thanks, Pradip Muhuri ## svyhist - Density Histogram options( survey.lonely.psu = adjust ) svyhist (~age_p, subset (nhis, xspd2=='SPD'), breaks=MyBreaks, ylim = c(0,0.040), main= , col=red, xlab=Age at Interview (SPD Category) ) lines (svysmooth(~age_p, bandwidth=5,subset(nhis, xspd2=='SPD')), lwd=2) From: Anthony Damico [ajdam...@gmail.com] Sent: Monday, October 01, 2012 10:07 AM To: Muhuri, Pradip (SAMHSA/CBHSQ) Cc: R help Subject: Re: [R] svyboxplot - library (survey) using a slight modification of the example shown in ?svyboxplot # load survey library library(survey) # load example data data(api) # create an example svydesign dstrat - svydesign(id = ~1, strata = ~stype, weights = ~pw, data = apistrat, fpc = ~fpc) # set the plot window to display 1 plot x 2 plots par(mfrow=c(1,2)) # generate two example boxplots svyboxplot(enroll~stype,dstrat,all.outliers=TRUE) svyboxplot(enroll~1,dstrat) # done # alternative: not as nice # set the plot window to display 2 plots x 1 plot par(mfrow=c(2,1)) # generate two example boxplots svyboxplot(enroll~stype,dstrat,all.outliers=TRUE) svyboxplot(enroll~1,dstrat) # done On Mon, Oct 1, 2012 at 9:50 AM, Muhuri, Pradip (SAMHSA/CBHSQ) pradip.muh...@samhsa.hhs.govmailto:pradip.muh...@samhsa.hhs.gov wrote: Hello, I have used the library (survey) package for boxplots using the following code. Could anyone please tell me why I am getting only 1
Re: [R] svyboxplot - library (survey)
Hi Dr. Lumley, Further thoughts: To get the histogram of age with proportions (relative frequencies) on y-axis, I probably need to rescale the weight for each subgroup separately so that the rescaled weight would sum to 1 for the respective subgroup. Am I correct? Thanks, Pradip Muhuri From: Muhuri, Pradip (SAMHSA/CBHSQ) Sent: Thursday, October 18, 2012 4:45 PM To: 'Thomas Lumley' Cc: Anthony Damico; R help; Muhuri, Pradip (SAMHSA/CBHSQ) Subject: RE: [R] svyboxplot - library (survey) Hello Dr. Lumley, Thank you for your advice/suggestions. I have rescaled the weight (i.e., original weight divided by total weighted count averaged across 8 surveys - NHIS). As can be seen below (R console), the new weight sums to 1. I have used the freq=TRUE argument in the svyhist () function along with a new svydesign object which includes the recalled weight. There are two issues: 1) I am getting a warning message: In plot.histogram(h, ..., freq = freq, xlab =xlab, main = main) : the AREAS in the plot are wrong -- rather use freq=FALSE. 2) The scale of two graphs looks different (please see the attachment). Any thoughts on how to resolve these issues? Regards, Pradip Muhuri ## R console is appended below ## options (width=120) sum (tor$new_wt) [1] 1 # object with survey design variables and data with new_wt (rescaled) that sums to 1 xnhis - svydesign (id=~psu,strat=~stratum, weights=~new_wt, data=tor, nest=TRUE) MyBreaks - c(18, 25, 35, 45, 55, 65, 75, 85, 95) par(mfrow=c(2,2)) # Chart 1 options( survey.lonely.psu = adjust ) svyhist (~age_p, + subset (xnhis, xspd2=='SPD'), breaks=MyBreaks, + #ylim = c(0,0.040), + main= , freq=TRUE, + col=red, + xlab=Age at Interview (SPD Category) + ) Warning message: In plot.histogram(h, ..., freq = freq, xlab = xlab, main = main) : the AREAS in the plot are wrong -- rather use freq=FALSE #lines (svysmooth(~age_p, bandwidth=5,subset(nhis, xspd2=='SPD')), lwd=2) #Chart 2 options( survey.lonely.psu = adjust ) svyhist (~age_p, + subset (xnhis, xspd2=='No SPD'), breaks=MyBreaks, + #ylim = c(0,0.040), + main= , freq=TRUE, + col=yellow, xlab=Age at Interview (No SPD Category) + ) Warning message: In plot.histogram(h, ..., freq = freq, xlab = xlab, main = main) : the AREAS in the plot are wrong -- rather use freq=FALSE Pradip K. Muhuri Statistician Substance Abuse Mental Health Services Administration The Center for Behavioral Health Statistics and Quality Division of Population Surveys 1 Choke Cherry Road, Room 2-1071 Rockville, MD 20857 Tel: 240-276-1070 Fax: 240-276-1260 e-mail: pradip.muh...@samhsa.hhs.gov The Center for Behavioral Health Statistics and Quality your feedback. Please click on the following link to complete a brief customer survey: http://cbhsqsurvey.samhsa.gov -Original Message- From: Thomas Lumley [mailto:tlum...@uw.edu] Sent: Wednesday, October 17, 2012 11:13 PM To: Muhuri, Pradip (SAMHSA/CBHSQ) Cc: Anthony Damico; R help Subject: Re: [R] svyboxplot - library (survey) On Thu, Oct 18, 2012 at 2:04 PM, Muhuri, Pradip (SAMHSA/CBHSQ) pradip.muh...@samhsa.hhs.gov wrote: Hello, I understand that svyhist () provides density histograms with density values on the y-axis (R code shown below). Is there a way one can have relative relative frequency histograms with relative freqencies on the y-axis? You get frequencies just by asking for them with freq: compare svyhist(~enroll, dstrat, main=Survey weighted,col=purple,freq=TRUE) svyhist(~enroll, dstrat, main=Survey weighted,col=purple) If you mean that you want the heights of the bars to sum to 1, the simplest way I know of is to rescale the weights to sum to 1 and use freq=TRUE -thomas Any advice/help would be appreciated. Thanks, Pradip Muhuri ## svyhist - Density Histogram options( survey.lonely.psu = adjust ) svyhist (~age_p, subset (nhis, xspd2=='SPD'), breaks=MyBreaks, ylim = c(0,0.040), main= , col=red, xlab=Age at Interview (SPD Category) ) lines (svysmooth(~age_p, bandwidth=5,subset(nhis, xspd2=='SPD')), lwd=2) From: Anthony Damico [ajdam...@gmail.com] Sent: Monday, October 01, 2012 10:07 AM To: Muhuri, Pradip (SAMHSA/CBHSQ) Cc: R help Subject: Re: [R] svyboxplot - library (survey) using a slight modification of the example shown in ?svyboxplot # load survey library library(survey) # load example data data(api) # create an example svydesign dstrat - svydesign(id = ~1, strata = ~stype, weights = ~pw, data = apistrat, fpc = ~fpc) # set the plot window to display 1 plot x 2 plots par(mfrow=c(1,2)) # generate two example boxplots svyboxplot(enroll~stype,dstrat,all.outliers=TRUE) svyboxplot(enroll~1,dstrat
Re: [R] svyboxplot - library (survey)
Hello, I understand that svyhist () provides density histograms with density values on the y-axis (R code shown below). Is there a way one can have relative relative frequency histograms with relative freqencies on the y-axis? Any advice/help would be appreciated. Thanks, Pradip Muhuri ## svyhist - Density Histogram options( survey.lonely.psu = adjust ) svyhist (~age_p, subset (nhis, xspd2=='SPD'), breaks=MyBreaks, ylim = c(0,0.040), main= , col=red, xlab=Age at Interview (SPD Category) ) lines (svysmooth(~age_p, bandwidth=5,subset(nhis, xspd2=='SPD')), lwd=2) From: Anthony Damico [ajdam...@gmail.com] Sent: Monday, October 01, 2012 10:07 AM To: Muhuri, Pradip (SAMHSA/CBHSQ) Cc: R help Subject: Re: [R] svyboxplot - library (survey) using a slight modification of the example shown in ?svyboxplot # load survey library library(survey) # load example data data(api) # create an example svydesign dstrat - svydesign(id = ~1, strata = ~stype, weights = ~pw, data = apistrat, fpc = ~fpc) # set the plot window to display 1 plot x 2 plots par(mfrow=c(1,2)) # generate two example boxplots svyboxplot(enroll~stype,dstrat,all.outliers=TRUE) svyboxplot(enroll~1,dstrat) # done # alternative: not as nice # set the plot window to display 2 plots x 1 plot par(mfrow=c(2,1)) # generate two example boxplots svyboxplot(enroll~stype,dstrat,all.outliers=TRUE) svyboxplot(enroll~1,dstrat) # done On Mon, Oct 1, 2012 at 9:50 AM, Muhuri, Pradip (SAMHSA/CBHSQ) pradip.muh...@samhsa.hhs.govmailto:pradip.muh...@samhsa.hhs.gov wrote: Hello, I have used the library (survey) package for boxplots using the following code. Could anyone please tell me why I am getting only 1 boxplot instead of 2 boxplots (1-SPD, 2-No SPD). What changes in the following code would be required to get 2 boxplots in the same plot frame? Thanks, Pradip ### nhis - svydesign (id=~psu, strat=~stratum, weights=~wt8, data=tor, nest=TRUE) svyboxplot (dthage~xspd2, subset (nhis, mortstat==1), col=gray80, varwidth=TRUE, ylab=Age at Death, xlab=SPD Status: 1-SPD, 2=No SPD) Pradip K. Muhuri Statistician Substance Abuse Mental Health Services Administration The Center for Behavioral Health Statistics and Quality Division of Population Surveys 1 Choke Cherry Road, Room 2-1071 Rockville, MD 20857 Tel: 240-276-1070 Fax: 240-276-1260 e-mail: pradip.muh...@samhsa.hhs.govmailto:pradip.muh...@samhsa.hhs.gov The Center for Behavioral Health Statistics and Quality your feedback. Please click on the following link to complete a brief customer survey: http://cbhsqsurvey.samhsa.gov vide commented, minimal, self-contained, reproducible code. __ R-help@r-project.orgmailto:R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] svyhist and svyboxplot
Hello, The following code is expected to produce 4 charts. But, I only get charts 1,2 , 4, NOT CHART # 3. For Chart# 3, I am getting the following error message: Error in tapply(1:NROW(x), list(factor(strata)), function(index) { : arguments must have same length I would appreciate if someone could help me resolve the issue. Thanks, Pradip # BELOW IS THE REPRODUCIBLE EXAMPLE setwd (E:/RDATA) options(width = 120) library (survey) library (KernSmooth) xd1- xsmoke age_p psu stratum wt8 13601 322 2 20 356.5600 32966 338 2 45 434.3562 63493 132 1 87 699.9987 238175 346 1 338 982.8075 174162 340 1 240 273.6313 220206 333 2 308 1477.1688 118133 368 1 159 716.3012 142859 223 1 194 1100.9475 115253 235 2 155 444.3750 61675 331 1 85 769.5963 189813 337 1 263 328.5600 226274 147 2 318 605.8700 41969 371 2 58 597.0150 167667 340 2 230 1030.4637 225103 337 2 316 349.6825 49894 370 2 68 517.7862 98075 346 2 130 1428.7225 180771 350 1 250 652.4188 137057 342 1 186 590.2100 77705 223 1 105 1687.2450 89106 348 1 118 407.6513 208178 350 1 290 556.5000 100403 352 2 133 1481.8200 221571 127 2 310 833.5338 10823 272 1 16 1807.6425 108431 371 2 145 945.6263 68708 146 1 94 1989.3775 23874 323 2 33 1707.8775 150634 319 2 206 761.1500 231232 342 2 326 1487.4113 184654 242 2 255 1715.2375 215312 357 1 300 483.5663 40713 257 2 56 2042.2762 130309 323 1 177 948.5625 25515 255 1 35 2719.7525 235612 283 2 333 603.3537 13755 236 2 20 265.1938 2441333 1 4 1062.1200 157327 377 1 215 2010.6600 66502 320 2 91 1122.9725 230778 155 2 325 1207.3025 74805 354 1 101 1028.5150 166556 150 1 229 1546.9450 91914 168 1 121 428.5350 89651 359 2 118 143.5437 149329 344 2 204 1064.7725 212700 259 2 295 1050.1163 454 179 1 1 275.5700 125639 127 1 170 785.1037 55442 347 1 76 950.3312 145132 377 1 197 1269.2287 123069 324 1 167 216.1937 188301 155 2 260 426.6313 852 266 2 1 1443.4887 3582381 1 6 790.8412 235423 144 2 333 659.4238 42175 240 1 59 1089.6762 57033 343 1 78 226.8750 177273 285 1 244 392.7200 218558 340 2 305 1680.2700 27784 245 1 39 280.0550 81823 343 1 110 965.0438 76344 326 1 103 1095.6012 114916 356 2 154 436.8838 35563 378 1 49 333.2875 192279 330 2 267 722.0312 61315 148 2 84 1426.5725 219903 343 1 308 791.5738 42612 325 1 60 658.1387 178488 333 2 246 675.1912 9031127 2 14 989.4863 145092 264 1 197 960.1912 71885 353 2 97 595.4050 38137 275 1 53 1004.0912 140149 121 1 190 1870.9350 162052 325 1 223 892.7775 89527 239 2 118 518.1050 59650 326 2 82 432.7837 24709 284 1 34 453.9013 18933 385 1 27 582.3288 24904 335 2 34 1027.5287 213668 339 1 298 3174.1925 110509 330 1 149 469.8188 72462 363 1 98 386.2163 152596 319 1 209 1328.2188 17014 462 1 24 294.9250 33467 250 1 46 1601.4575 5241333 1 9 1651.0988 215094 323 1 300 427.6313 5 121 1 118 1092.2613 204868 260 2 285 781.2325 157415 231 2 215 1323.5750 71081 244 2 96 1059.2088 25420 338 1 35 530.7413 144226 127 1 196 1126.3112 47888 346 2 66 965.4050 216179 329 2 301 1237.6463 29172 368 1 41 1025.9738 168786 147 1 232 680.6213 94035 223 2 124 330.4563 170542 125 2 234 757.2287 160331 233 2 220 636.3900 124163 380 2 167 287.6988 71442 237 1 97 442.2300 80191 274 2 107 871.0338 199309 329 2 277 485.2337 91293 335 2 120
Re: [R] svyhist and svyboxplot
Anthony, I now can't afford to forget that R is case-sensitive! Thank you so much! Pradip Muhuri From: Anthony Damico [ajdam...@gmail.com] Sent: Saturday, October 13, 2012 10:10 PM To: Muhuri, Pradip (SAMHSA/CBHSQ) Cc: Thomas Lumley; R help Subject: Re: svyhist and svyboxplot R is case sensitive either change subset (nhis, xsmoke=='Never SMK') to subset (nhis, xsmoke=='Never Smk') or change labels=c('Current SMK','Former SMK', 'Never Smk') to labels=c('Current SMK','Former SMK', 'Never SMK') but not both :) On Sat, Oct 13, 2012 at 10:02 PM, Muhuri, Pradip (SAMHSA/CBHSQ) pradip.muh...@samhsa.hhs.govmailto:pradip.muh...@samhsa.hhs.gov wrote: Hello, The following code is expected to produce 4 charts. But, I only get charts 1,2 , 4, NOT CHART # 3. For Chart# 3, I am getting the following error message: Error in tapply(1:NROW(x), list(factor(strata)), function(index) { : arguments must have same length I would appreciate if someone could help me resolve the issue. Thanks, Pradip # BELOW IS THE REPRODUCIBLE EXAMPLE setwd (E:/RDATA) options(width = 120) library (survey) library (KernSmooth) xd1- xsmoke age_p psu stratum wt8 13601 322 2 20 356.5600 32966 338 2 45 434.3562 63493 132 1 87 699.9987 238175 346 1 338 982.8075 174162 340 1 240 273.6313 220206 333 2 308 1477.1688 118133 368 1 159 716.3012 142859 223 1 194 1100.9475 115253 235 2 155 444.3750 61675 331 1 85 769.5963 189813 337 1 263 328.5600 226274 147 2 318 605.8700 41969 371 2 58 597.0150 167667 340 2 230 1030.4637 225103 337 2 316 349.6825 49894 370 2 68 517.7862 98075 346 2 130 1428.7225 180771 350 1 250 652.4188 137057 342 1 186 590.2100 77705 223 1 105 1687.2450 89106 348 1 118 407.6513 208178 350 1 290 556.5000 100403 352 2 133 1481.8200 221571 127 2 310 833.5338 10823 272 1 16 1807.6425 108431 371 2 145 945.6263 68708 146 1 94 1989.3775 23874 323 2 33 1707.8775 150634 319 2 206 761.1500 231232 342 2 326 1487.4113 184654 242 2 255 1715.2375 215312 357 1 300 483.5663 40713 257 2 56 2042.2762 130309 323 1 177 948.5625 25515 255 1 35 2719.7525 235612 283 2 333 603.3537 13755 236 2 20 265.1938 2441333 1 4 1062.1200 157327 377 1 215 2010.6600 66502 320 2 91 1122.9725 230778 155 2 325 1207.3025 74805 354 1 101 1028.5150 166556 150 1 229 1546.9450 91914 168 1 121 428.5350 89651 359 2 118 143.5437 149329 344 2 204 1064.7725 212700 259 2 295 1050.1163 454 179 1 1 275.5700 125639 127 1 170 785.1037 55442 347 1 76 950.3312 145132 377 1 197 1269.2287 123069 324 1 167 216.1937 188301 155 2 260 426.6313 852 266 2 1 1443.4887 3582381 1 6 790.8412 235423 144 2 333 659.4238 42175 240 1 59 1089.6762 57033 343 1 78 226.8750 177273 285 1 244 392.7200 218558 340 2 305 1680.2700 27784 245 1 39 280.0550 81823 343 1 110 965.0438 76344 326 1 103 1095.6012 114916 356 2 154 436.8838 35563 378 1 49 333.2875 192279 330 2 267 722.0312 61315 148 2 84 1426.5725 219903 343 1 308 791.5738 42612 325 1 60 658.1387 178488 333 2 246 675.1912 9031127 2 14 989.4863 145092 264 1 197 960.1912 71885 353 2 97 595.4050 38137 275 1 53 1004.0912 140149 121 1 190 1870.9350 162052 325 1 223 892.7775 89527 239 2 118 518.1050 59650 326 2 82 432.7837 24709 284 1 34 453.9013 18933 385 1 27 582.3288 24904 335 2 34 1027.5287 213668 339 1 298 3174.1925 110509 330 1 149 469.8188 72462 363 1 98 386.2163 152596 319 1 209 1328.2188 17014 462 1 24 294.9250 33467 250 1 46 1601.4575 5241333 1 9 1651.0988 215094 323 1 300 427.6313 5 121 1 118 1092.2613 204868
[R] svyplot
Hello, Using the svyplot () function, I have plotted four graphs that are saved in four different .png files. I am looking for examples how to redraw the same four graphs within grid viewports so that they stay together on a page. The goal is to create one .png file that will include all four graphs (2 rows, 2 columns). Any help would be appreciated. Thanks, Pradip Pradip K. Muhuri, Statistician Substance Abuse Mental Health Services Administration The Center for Behavioral Health Statistics and Quality Division of Population Surveys 1 Choke Cherry Road, Room 2-1071 Rockville, MD 20857 Tel: 240-276-1070 Fax: 240-276-1260 e-mail: pradip.muh...@samhsa.hhs.govmailto:pradip.muh...@samhsa.hhs.gov The Center for Behavioral Health Statistics and Quality your feedback. Please click on the following link to complete a brief customer survey: http://cbhsqsurvey.samhsa.govhttp://cbhsqsurvey.samhsa.gov/ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] svyplot
Dear Anthony, You have been so so helpful! The par() example code has worked well. Thanks, Pradip From: Anthony Damico [ajdam...@gmail.com] Sent: Wednesday, October 10, 2012 5:25 PM To: Muhuri, Pradip (SAMH/CBHSQ) Cc: R help Subject: Re: [R] svyplot https://stat.ethz.ch/pipermail/r-help/2012-October/324944.html On Wed, Oct 10, 2012 at 5:05 PM, Muhuri, Pradip (SAMHSA/CBHSQ) pradip.muh...@samhsa.hhs.govmailto:pradip.muh...@samhsa.hhs.gov wrote: Hello, Using the svyplot () function, I have plotted four graphs that are saved in four different .png files. I am looking for examples how to redraw the same four graphs within grid viewports so that they stay together on a page. The goal is to create one .png file that will include all four graphs (2 rows, 2 columns). Any help would be appreciated. Thanks, Pradip Pradip K. Muhuri, Statistician Substance Abuse Mental Health Services Administration The Center for Behavioral Health Statistics and Quality Division of Population Surveys 1 Choke Cherry Road, Room 2-1071 Rockville, MD 20857 Tel: 240-276-1070tel:240-276-1070 Fax: 240-276-1260tel:240-276-1260 e-mail: pradip.muh...@samhsa.hhs.govmailto:pradip.muh...@samhsa.hhs.govmailto:pradip.muh...@samhsa.hhs.govmailto:pradip.muh...@samhsa.hhs.gov The Center for Behavioral Health Statistics and Quality your feedback. Please click on the following link to complete a brief customer survey: http://cbhsqsurvey.samhsa.govhttp://cbhsqsurvey.samhsa.gov/ [[alternative HTML version deleted]] __ R-help@r-project.orgmailto:R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] svyhist
Thomas, Sorry about my repeat typo in the line () function, which caused the distortion of the line that did not match in the earlier graph. The revised code gives me the graphs that look lot better (please see the attachment). Thank you for catching that mistake and also for providing clarification regarding the kernel density estimator. Pradip Muhuri From: Thomas Lumley [tlum...@uw.edu] Sent: Monday, October 08, 2012 8:40 PM To: Muhuri, Pradip (SAMHSA/CBHSQ) Cc: Anthony Damico; R help Subject: Re: [R] svyhist The line isn't a theoretical distribution, it's a kernel density estimator and so should match your histogram. It looks as though you use exactly the same call lines (svysmooth(~dthage, bandwidth=5,subset(nhis, xspd2=='No SPD')), lwd=2) for each plot, which gives a smooth curve estimating the age at death for people with No SPD. To get, eg, age at interview for the SPD group use something like: lines (svysmooth(~age_p, bandwidth=5,subset(nhis, xspd2=='SPD')), lwd=2) -thomas On Sun, Oct 7, 2012 at 2:19 PM, Muhuri, Pradip (SAMHSA/CBHSQ) pradip.muh...@samhsa.hhs.gov wrote: Hi Anthony, The ylim () has been added to the code (please see below), and I got 4 plots that have the same y -dimension. Each plot displays 2 distributions - one as histogram from the data and another one as line (i.e., idealized theoretical normal distribution?). My question is, Is there way to change the distribution in the line () function and try other theoretical distribution to approximate the observed distribution? Thanks, Pradip Muhuri MyBreaks - c(18,35,45,55,65,75,85,95) png(svyhist_no_spd_age_at_inteview.png) options( survey.lonely.psu = adjust ) svyhist (~age_p, subset (nhis, xspd2=='No SPD'), breaks=MyBreaks, ylim = c(0,0.035), main= , col=grey80, xlab=Age at Interview among those Who had no SPD ) lines (svysmooth(~dthage, bandwidth=5,subset(nhis, xspd2=='No SPD')), lwd=2) dev.off () From: Anthony Damico [ajdam...@gmail.com] Sent: Saturday, October 06, 2012 6:56 AM To: Muhuri, Pradip (SAMHSA/CBHSQ) Cc: David Winsemius; R help Subject: Re: [R] svyhist ?ylim says numeric vectors of length 2 - so just the beginning and end. ?svyhist doesn't specifically mention the ylim parameter, meaning you should look for a ... in the arguments list and click through to the page for ?hist ?hist has an example that shows the ylim parameter only containing the beginning and end values. try using ylim = c( 0 , 0.030 ) if you're looking to set the tick marks, look at ?axis ;) On Fri, Oct 5, 2012 at 11:18 PM, Muhuri, Pradip (SAMHSA/CBHSQ) pradip.muh...@samhsa.hhs.govmailto:pradip.muh...@samhsa.hhs.gov wrote: Dear Anthony and David, Sorry- the earlier-sent plots were mislabeled, which I have corrected and attached. But, the y-lim issue is yet to be resolved. Thanks, Pradip Muhuri From: Anthony Damico [ajdam...@gmail.commailto:ajdam...@gmail.com] Sent: Friday, October 05, 2012 7:29 PM To: David Winsemius Cc: Muhuri, Pradip (SAMHSA/CBHSQ); R help Subject: Re: [R] svyhist this worked for me -- and doesn't require removing the PSUs from the design :) options( survey.lonely.psu = adjust ) svyhist (~dthage, subset (nhis, xspd2=='No SPD'), breaks=MyBreaks, main= , col=grey80, xlab=Age at Death Distribution ) lines (svysmooth(~dthage, bandwidth=5,subset(nhis, xspd2=='No SPD')), lwd=2) Dr. Lumley has written quite a bit about single-PSU strata here: http://faculty.washington.edu/tlumley/survey/exmample-lonely.html On Fri, Oct 5, 2012 at 7:16 PM, David Winsemius dwinsem...@comcast.netmailto:dwinsem...@comcast.netmailto:dwinsem...@comcast.netmailto:dwinsem...@comcast.net wrote: On Oct 5, 2012, at 3:33 PM, Muhuri, Pradip (SAMHSA/CBHSQ) wrote: Hello, I was trying to draw histograms of age at death and got the following 2 error messages: 1) Error in tapply(1:NROW(x), list(factor(strata)), function(index) { : arguments must have same length This is the top of the output of str applied to the data argument you offered to svyhist: str(subset (nhis, xspd2==2) ) List of 9 $ cluster :'data.frame': 0 obs. of 1 variable: ..$ psu: Factor w/ 47 levels 109.1,115.2,..: ..- attr(*, terms)=Classes 'terms', 'formula' length 2 ~psu .. .. ..- attr(*, variables)= language list(psu) .. .. ..- attr(*, factors)= int [1, 1] 1 .. .. .. ..- attr(*, dimnames)=List of 2 .. .. .. .. ..$ : chr psu .. .. .. .. ..$ : chr psu At least one problem seems pretty clear. No data. That can be corrected by wrapping as.numeric() around the factor on which you are subsetting in two places. Another problem may arise when you restrict
Re: [R] svyhist
Hi Anthony, The ylim () has been added to the code (please see below), and I got 4 plots that have the same y -dimension. Each plot displays 2 distributions - one as histogram from the data and another one as line (i.e., idealized theoretical normal distribution?). My question is, Is there way to change the distribution in the line () function and try other theoretical distribution to approximate the observed distribution? Thanks, Pradip Muhuri MyBreaks - c(18,35,45,55,65,75,85,95) png(svyhist_no_spd_age_at_inteview.png) options( survey.lonely.psu = adjust ) svyhist (~age_p, subset (nhis, xspd2=='No SPD'), breaks=MyBreaks, ylim = c(0,0.035), main= , col=grey80, xlab=Age at Interview among those Who had no SPD ) lines (svysmooth(~dthage, bandwidth=5,subset(nhis, xspd2=='No SPD')), lwd=2) dev.off () From: Anthony Damico [ajdam...@gmail.com] Sent: Saturday, October 06, 2012 6:56 AM To: Muhuri, Pradip (SAMHSA/CBHSQ) Cc: David Winsemius; R help Subject: Re: [R] svyhist ?ylim says numeric vectors of length 2 - so just the beginning and end. ?svyhist doesn't specifically mention the ylim parameter, meaning you should look for a ... in the arguments list and click through to the page for ?hist ?hist has an example that shows the ylim parameter only containing the beginning and end values. try using ylim = c( 0 , 0.030 ) if you're looking to set the tick marks, look at ?axis ;) On Fri, Oct 5, 2012 at 11:18 PM, Muhuri, Pradip (SAMHSA/CBHSQ) pradip.muh...@samhsa.hhs.govmailto:pradip.muh...@samhsa.hhs.gov wrote: Dear Anthony and David, Sorry- the earlier-sent plots were mislabeled, which I have corrected and attached. But, the y-lim issue is yet to be resolved. Thanks, Pradip Muhuri From: Anthony Damico [ajdam...@gmail.commailto:ajdam...@gmail.com] Sent: Friday, October 05, 2012 7:29 PM To: David Winsemius Cc: Muhuri, Pradip (SAMHSA/CBHSQ); R help Subject: Re: [R] svyhist this worked for me -- and doesn't require removing the PSUs from the design :) options( survey.lonely.psu = adjust ) svyhist (~dthage, subset (nhis, xspd2=='No SPD'), breaks=MyBreaks, main= , col=grey80, xlab=Age at Death Distribution ) lines (svysmooth(~dthage, bandwidth=5,subset(nhis, xspd2=='No SPD')), lwd=2) Dr. Lumley has written quite a bit about single-PSU strata here: http://faculty.washington.edu/tlumley/survey/exmample-lonely.html On Fri, Oct 5, 2012 at 7:16 PM, David Winsemius dwinsem...@comcast.netmailto:dwinsem...@comcast.netmailto:dwinsem...@comcast.netmailto:dwinsem...@comcast.net wrote: On Oct 5, 2012, at 3:33 PM, Muhuri, Pradip (SAMHSA/CBHSQ) wrote: Hello, I was trying to draw histograms of age at death and got the following 2 error messages: 1) Error in tapply(1:NROW(x), list(factor(strata)), function(index) { : arguments must have same length This is the top of the output of str applied to the data argument you offered to svyhist: str(subset (nhis, xspd2==2) ) List of 9 $ cluster :'data.frame': 0 obs. of 1 variable: ..$ psu: Factor w/ 47 levels 109.1,115.2,..: ..- attr(*, terms)=Classes 'terms', 'formula' length 2 ~psu .. .. ..- attr(*, variables)= language list(psu) .. .. ..- attr(*, factors)= int [1, 1] 1 .. .. .. ..- attr(*, dimnames)=List of 2 .. .. .. .. ..$ : chr psu .. .. .. .. ..$ : chr psu At least one problem seems pretty clear. No data. That can be corrected by wrapping as.numeric() around the factor on which you are subsetting in two places. Another problem may arise when you restrict to one class only, namely there won't any design to work with. All the clusters there would be only one no longer have any multiplicity, and svyhist apparently isn't built to handle situation, at least with that design argument. Error in onestrat(x[index, , drop = FALSE], clusters[index], nPSU[index][1], : Stratum (2) has only one PSU at stage 1 Taking the 'stratum' argument out of the design() spec allows it to proceed, but I do not know if that is introducing invalidity in the analysis. -- David. 2) Error in findInterval(mm[, i], gx) : 'vec' contains NAs In addition: Warning messages: 1: In min(x) : no non-missing arguments to min; returning Inf 2: In max(x) : no non-missing arguments to max; returning -Inf I would appreciate if someone could help me resolve these issues. Below is reproducible example. Thanks, Pradip Muhuri setwd (E:/RDATA) options(width = 120) library (survey) library (KernSmooth) xd1 - dthage ypll_75 xspd2 psu stratum wt8 56 19 2 2 33 1512.7287 86 0 2 2 129 1830.6400 81 0 2 1 67 536.1400 47 28 2 1 17 519.8350 71 4 1 1 225 254.4087 72 3
[R] svyhist
Hello, I was trying to draw histograms of age at death and got the following 2 error messages: 1) Error in tapply(1:NROW(x), list(factor(strata)), function(index) { : arguments must have same length 2) Error in findInterval(mm[, i], gx) : 'vec' contains NAs In addition: Warning messages: 1: In min(x) : no non-missing arguments to min; returning Inf 2: In max(x) : no non-missing arguments to max; returning -Inf I would appreciate if someone could help me resolve these issues. Below is reproducible example. Thanks, Pradip Muhuri setwd (E:/RDATA) options(width = 120) library (survey) library (KernSmooth) xd1 - dthage ypll_75 xspd2 psu stratum wt8 56 19 2 2 33 1512.7287 86 0 2 2 129 1830.6400 81 0 2 1 67 536.1400 47 28 2 1 17 519.8350 71 4 1 1 225 254.4087 72 3 1 1 238 424.4787 75 0 2 2 115 407.0987 83 0 2 2 46 622.5137 79 -4 2 1 300 509.1212 78 -3 2 1 133 517.3325 71 4 2 2 328 1179.3063 64 11 2 1 2 301.5250 78 -3 2 1 62 253.9025 65 10 2 2 260 932.6575 75 0 2 1 247 145.5900 63 12 2 2 156 247.0650 71 4 2 1 146 829.4787 76 -1 2 2 234 432.5437 76 0 2 1 109 859.6888 68 7 2 1 228 1236.2975 64 11 2 2 167 347.5788 62 13 2 2 312 354.0500 77 0 2 2 275 882.1938 78 -3 2 1 28 481.5975 81 0 2 1 180 1285.5425 79 0 2 2 205 576. 70 5 2 1 173 128.3725 75 0 2 2 189 359.3863 78 0 2 1 332 512.8062 74 1 2 2 14 449.0800 77 0 2 1 242 283.0013 92 0 2 1 152 915.3200 69 6 2 2 217 672.7663 53 22 2 1 290 1430.8812 81 0 2 2 90 699.1075 67 8 2 2 316 607.6500 85 0 2 1 171 312.9850 93 0 2 2 119 936.1275 82 0 2 1 118 186.4450 71 4 2 2 329 729.1213 43 32 2 1 215 887.6313 74 1 2 1 180 569.9338 89 0 2 1 324 1054.0887 81 0 2 2 47 532.0987 70 5 2 1 53 450.8750 75 0 1 1 38 557.9750 56 19 2 1 17 512.6363 90 0 2 2 29 569.7888 70 5 2 1 251 554.2138 56 19 2 2 14 1114.1762 tor - read.table (textConnection(xd1), header=TRUE, sep='', as.is=TRUE) # Grouping variable (xspd) to be factor tor - within(tor, { xspd2 - factor(xspd2,levels=c (1,2), labels=c('SPD', 'No SPD'), ordered=TRUE) } ) # object with survey design variables and data nhis - svydesign (id=~psu,strat=~stratum, weights=~wt8, data=tor, nest=TRUE) MyBreaks - c(18,35,45,55,65,75,85,95) png(svyhist_age_at_death.png) svyhist (~dthage, subset (nhis, xspd2==2), breaks=MyBreaks, main= , col=grey80, xlab=Age at Death Distribution ) lines (svysmooth(~dthage, bandwidth=5,subset(nhis, xspd2==2)), lwd=2) dev.off () Pradip K. Muhuri Statistician Substance Abuse Mental Health Services Administration The Center for Behavioral Health Statistics and Quality Division of Population Surveys 1 Choke Cherry Road, Room 2-1071 Rockville, MD 20857 Tel: 240-276-1070 Fax: 240-276-1260 e-mail: pradip.muh...@samhsa.hhs.govmailto:pradip.muh...@samhsa.hhs.gov The Center for Behavioral Health Statistics and Quality your feedback. Please click on the following link to complete a brief customer survey: http://cbhsqsurvey.samhsa.govhttp://cbhsqsurvey.samhsa.gov/ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] svyhist
Dear Anthony and David, Thank you so much for your comments and suggestions! The sample data set I had embedded in the earlier-sent R script was intended to be used for the reproducible example. Now I have used Anthony's revised code on the the entire analytic file. The code has worked fine. Thanks, again. Attached are the 2 .png files. The only problem I see is that the y-lim in these 2 plots is not exactly the same. I have tried this: ylim = c(0,0.005, 0.010, 0.015, 0.020, 0.025, 0.030), which did not work. Any thoughts? Pradip Muhuri ### Revised Code ### setwd (E:/RDATA) options(width = 120) library (survey) library (KernSmooth) library (Hmisc) load(tor.rdata) #contents (tor) # Grouping variable (xspd) to be factor tor - within(tor, { xspd2 - factor(xspd2,levels=c (1,2), labels=c('SPD', 'No SPD'), ordered=TRUE) } ) # object with survey design variables and data nhis - svydesign (id=~psu,strat=~stratum, weights=~wt8, data=tor, nest=TRUE) MyBreaks - c(18,35,45,55,65,75,85,95) png(svyhist_no_spd_age_at_death.png) options( survey.lonely.psu = adjust ) svyhist (~dthage, subset (nhis, mortstat==1 xspd2=='No SPD'), breaks=MyBreaks, main= , col=grey80, xlab=Age at Death among those Who had SPD ) lines (svysmooth(~dthage, bandwidth=5,subset(nhis, xspd2=='No SPD')), lwd=2) dev.off () png(svyhist_spd_age_at_death.png) options( survey.lonely.psu = adjust ) svyhist (~dthage, subset (nhis, mortstat==1 xspd2=='SPD'), breaks=MyBreaks, #ylim = c(0,0.005, 0.010, 0.015, 0.020, 0.025, 0.030), main= , col=grey80, xlab=Age at Death among those Who had no SPD Distribution ) lines (svysmooth(~dthage, bandwidth=5,subset(nhis, xspd2=='No SPD')), lwd=2) dev.off () ## From: Anthony Damico [ajdam...@gmail.com] Sent: Friday, October 05, 2012 7:29 PM To: David Winsemius Cc: Muhuri, Pradip (SAMHSA/CBHSQ); R help Subject: Re: [R] svyhist this worked for me -- and doesn't require removing the PSUs from the design :) options( survey.lonely.psu = adjust ) svyhist (~dthage, subset (nhis, xspd2=='No SPD'), breaks=MyBreaks, main= , col=grey80, xlab=Age at Death Distribution ) lines (svysmooth(~dthage, bandwidth=5,subset(nhis, xspd2=='No SPD')), lwd=2) Dr. Lumley has written quite a bit about single-PSU strata here: http://faculty.washington.edu/tlumley/survey/exmample-lonely.html On Fri, Oct 5, 2012 at 7:16 PM, David Winsemius dwinsem...@comcast.netmailto:dwinsem...@comcast.net wrote: On Oct 5, 2012, at 3:33 PM, Muhuri, Pradip (SAMHSA/CBHSQ) wrote: Hello, I was trying to draw histograms of age at death and got the following 2 error messages: 1) Error in tapply(1:NROW(x), list(factor(strata)), function(index) { : arguments must have same length This is the top of the output of str applied to the data argument you offered to svyhist: str(subset (nhis, xspd2==2) ) List of 9 $ cluster :'data.frame': 0 obs. of 1 variable: ..$ psu: Factor w/ 47 levels 109.1,115.2,..: ..- attr(*, terms)=Classes 'terms', 'formula' length 2 ~psu .. .. ..- attr(*, variables)= language list(psu) .. .. ..- attr(*, factors)= int [1, 1] 1 .. .. .. ..- attr(*, dimnames)=List of 2 .. .. .. .. ..$ : chr psu .. .. .. .. ..$ : chr psu At least one problem seems pretty clear. No data. That can be corrected by wrapping as.numeric() around the factor on which you are subsetting in two places. Another problem may arise when you restrict to one class only, namely there won't any design to work with. All the clusters there would be only one no longer have any multiplicity, and svyhist apparently isn't built to handle situation, at least with that design argument. Error in onestrat(x[index, , drop = FALSE], clusters[index], nPSU[index][1], : Stratum (2) has only one PSU at stage 1 Taking the 'stratum' argument out of the design() spec allows it to proceed, but I do not know if that is introducing invalidity in the analysis. -- David. 2) Error in findInterval(mm[, i], gx) : 'vec' contains NAs In addition: Warning messages: 1: In min(x) : no non-missing arguments to min; returning Inf 2: In max(x) : no non-missing arguments to max; returning -Inf I would appreciate if someone could help me resolve these issues. Below is reproducible example. Thanks, Pradip Muhuri setwd (E:/RDATA) options(width = 120) library (survey) library (KernSmooth) xd1 - dthage ypll_75 xspd2 psu stratum wt8 56 19 2 2 33 1512.7287 86 0 2 2 129 1830.6400 81 0 2 1 67 536.1400 47 28 2 1
Re: [R] svyhist
Dear Anthony and David, Sorry- the earlier-sent plots were mislabeled, which I have corrected and attached. But, the y-lim issue is yet to be resolved. Thanks, Pradip Muhuri From: Anthony Damico [ajdam...@gmail.com] Sent: Friday, October 05, 2012 7:29 PM To: David Winsemius Cc: Muhuri, Pradip (SAMHSA/CBHSQ); R help Subject: Re: [R] svyhist this worked for me -- and doesn't require removing the PSUs from the design :) options( survey.lonely.psu = adjust ) svyhist (~dthage, subset (nhis, xspd2=='No SPD'), breaks=MyBreaks, main= , col=grey80, xlab=Age at Death Distribution ) lines (svysmooth(~dthage, bandwidth=5,subset(nhis, xspd2=='No SPD')), lwd=2) Dr. Lumley has written quite a bit about single-PSU strata here: http://faculty.washington.edu/tlumley/survey/exmample-lonely.html On Fri, Oct 5, 2012 at 7:16 PM, David Winsemius dwinsem...@comcast.netmailto:dwinsem...@comcast.net wrote: On Oct 5, 2012, at 3:33 PM, Muhuri, Pradip (SAMHSA/CBHSQ) wrote: Hello, I was trying to draw histograms of age at death and got the following 2 error messages: 1) Error in tapply(1:NROW(x), list(factor(strata)), function(index) { : arguments must have same length This is the top of the output of str applied to the data argument you offered to svyhist: str(subset (nhis, xspd2==2) ) List of 9 $ cluster :'data.frame': 0 obs. of 1 variable: ..$ psu: Factor w/ 47 levels 109.1,115.2,..: ..- attr(*, terms)=Classes 'terms', 'formula' length 2 ~psu .. .. ..- attr(*, variables)= language list(psu) .. .. ..- attr(*, factors)= int [1, 1] 1 .. .. .. ..- attr(*, dimnames)=List of 2 .. .. .. .. ..$ : chr psu .. .. .. .. ..$ : chr psu At least one problem seems pretty clear. No data. That can be corrected by wrapping as.numeric() around the factor on which you are subsetting in two places. Another problem may arise when you restrict to one class only, namely there won't any design to work with. All the clusters there would be only one no longer have any multiplicity, and svyhist apparently isn't built to handle situation, at least with that design argument. Error in onestrat(x[index, , drop = FALSE], clusters[index], nPSU[index][1], : Stratum (2) has only one PSU at stage 1 Taking the 'stratum' argument out of the design() spec allows it to proceed, but I do not know if that is introducing invalidity in the analysis. -- David. 2) Error in findInterval(mm[, i], gx) : 'vec' contains NAs In addition: Warning messages: 1: In min(x) : no non-missing arguments to min; returning Inf 2: In max(x) : no non-missing arguments to max; returning -Inf I would appreciate if someone could help me resolve these issues. Below is reproducible example. Thanks, Pradip Muhuri setwd (E:/RDATA) options(width = 120) library (survey) library (KernSmooth) xd1 - dthage ypll_75 xspd2 psu stratum wt8 56 19 2 2 33 1512.7287 86 0 2 2 129 1830.6400 81 0 2 1 67 536.1400 47 28 2 1 17 519.8350 71 4 1 1 225 254.4087 72 3 1 1 238 424.4787 75 0 2 2 115 407.0987 83 0 2 2 46 622.5137 79 -4 2 1 300 509.1212 78 -3 2 1 133 517.3325 71 4 2 2 328 1179.3063 64 11 2 1 2 301.5250 78 -3 2 1 62 253.9025 65 10 2 2 260 932.6575 75 0 2 1 247 145.5900 63 12 2 2 156 247.0650 71 4 2 1 146 829.4787 76 -1 2 2 234 432.5437 76 0 2 1 109 859.6888 68 7 2 1 228 1236.2975 64 11 2 2 167 347.5788 62 13 2 2 312 354.0500 77 0 2 2 275 882.1938 78 -3 2 1 28 481.5975 81 0 2 1 180 1285.5425 79 0 2 2 205 576. 70 5 2 1 173 128.3725 75 0 2 2 189 359.3863 78 0 2 1 332 512.8062 74 1 2 2 14 449.0800 77 0 2 1 242 283.0013 92 0 2 1 152 915.3200 69 6 2 2 217 672.7663 53 22 2 1 290 1430.8812 81 0 2 2 90 699.1075 67 8 2 2 316 607.6500 85 0 2 1 171 312.9850 93 0 2 2 119 936.1275 82 0 2 1 118 186.4450 71 4 2 2 329 729.1213 43 32 2 1 215 887.6313 74 1 2 1 180 569.9338 89 0 2 1 324 1054.0887 81 0 2 2 47 532.0987 70 5 2 1 53 450.8750 75 0 1 1 38 557.9750 56 19 2 1 17 512.6363 90 0 2 2
[R] svyby and make.formula
Hello, Although my R code for the svymean () and svyquantile () functions works fine, I am stuck with the svyby () and make.formula () functions. I got the following error messages. - Error: object of type 'closure' is not subsettable # svyby () - Error in xx[[1]] : subscript out of bounds# make.formula () A reproducible example is appended below. I would appreciate if someone could help me. Thank you in advance. Pradip Muhuri Below is a reproducible example ## setwd (E:/RDATA) library (survey) xd1 - dthage ypll_ler ypll_75 xspd2 psu stratum wt8 mortstat NA NANA 2 1 1 1683.73870 NA NANA 2 1 1 640.89500 NA NANA 2 1 1 714.06620 NA NANA 2 1 1 714.06620 NA NANA 2 1 1 530.52630 NA NANA 2 1 1 2205.28630 NA NANA 2 1 339 1683.73870 NA NANA 2 1 339 640.89500 NA NANA 2 1 339 714.06620 NA NANA 2 1 339 714.06620 NA NANA 2 1 339 530.52630 NA NANA 2 1 339 2205.28630 788.817926 0 2 2 1 592.3100 1 809.291881 0 2 2 1 1014.7387 1 875.001076 0 2 2 1 853.4763 1 875.001076 0 2 2 1 505.1475 1 885.510514 0 2 2 1 1429.5963 1 788.817926 0 2 2 339 592.31001 809.291881 0 2 2 339 1014.73871 875.001076 0 2 2 339 853.47631 875.001076 0 2 2 339 505.14751 885.510514 0 2 2 339 1429.59631 788.817926 0 2 2 339 592.31001 809.291881 0 2 2 339 1014.73871 875.001076 0 2 2 339 853.47631 875.001076 0 2 2 339 505.14751 885.510514 0 2 2 339 1429.59631 newdata - read.table (textConnection(xd1), header=TRUE, as.is=TRUE) dim (newdata) # make the grouping variable (xspd)2 newdata$xspd2 - factor(newdata$xspd2,levels=c (1,2),labels=c('SPD', 'No SPD'), ordered=TRUE) nhis - svydesign (id=~psu,strat=~stratum, weights=~wt8, data=newdata, nest=TRUE) # mean age at death - nationwide svymean( ~dthage, data=nhis , subset (nhis, mortstat==1)) # mean by SPD status svyby(~dthage, ~xspd2 , design=nhis, svymean ) #percentile svyquantile(~dthage, data = nhis , subset (nhis, mortstat==1), c( 0 , .25 , .5 , .75 , 1 ) ) # percentile by SPD status svyby(~dthage, ~xspd2, desin=nhis, svyquantile, c( 0 , .25 , .5 , .75 , 1 ), keep.var = F) # mean for each of the 3 variables vars - names(nhis) %in% c(dthage, ypll_ler, ypl_75) vars svymean(make.formula(vars),nhis,subset (nhis, mortstat==1), na.rm=TRUE) # Pradip K. Muhuri Statistician Substance Abuse Mental Health Services Administration The Center for Behavioral Health Statistics and Quality Division of Population Surveys 1 Choke Cherry Road, Room 2-1071 Rockville, MD 20857 Tel: 240-276-1070 Fax: 240-276-1260 e-mail: pradip.muh...@samhsa.hhs.govmailto:pradip.muh...@samhsa.hhs.gov The Center for Behavioral Health Statistics and Quality your feedback. Please click on the following link to complete a brief customer survey: http://cbhsqsurvey.samhsa.gov __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] svyboxplot - library (survey)
Hi Thomas, Thank you so much for your help. Pradip From: Thomas Lumley [tlum...@uw.edu] Sent: Monday, October 01, 2012 6:45 PM To: Muhuri, Pradip (SAMHSA/CBHSQ) Cc: Anthony Damico; R help Subject: Re: [R] svyboxplot - library (survey) The documentation says The grouping variable in svyboxplot, if present, must be a factor -thomas On Tue, Oct 2, 2012 at 4:28 AM, Muhuri, Pradip (SAMHSA/CBHSQ) pradip.muh...@samhsa.hhs.gov wrote: Dear Anthony, Yes, I can follow the example code you have given. But, do you know from the code shown below (following Thomas Lumley's Complex Surveys) why I am getting the boxplot of dthage for just xspd=1, not xspd2=2? My intent is the make this code work so that I can generate similar plots on other continuous variable. Any help will be appreciated. Thanks, Pradip nhis - svydesign (id=~psu, strat=~stratum, weights=~wt8, data=tor, nest=TRUE) svyboxplot (dthage~xspd2, subset (nhis, mortstat==1), col=gray80, varwidth=TRUE, ylab=Age at Death, xlab=SPD Status: 1-SPD, 2=No SPD) Pradip K. Muhuri, PhD Statistician Substance Abuse Mental Health Services Administration The Center for Behavioral Health Statistics and Quality Division of Population Surveys 1 Choke Cherry Road, Room 2-1071 Rockville, MD 20857 Tel: 240-276-1070 Fax: 240-276-1260 e-mail: pradip.muh...@samhsa.hhs.govmailto:pradip.muh...@samhsa.hhs.gov The Center for Behavioral Health Statistics and Quality your feedback. Please click on the following link to complete a brief customer survey: http://cbhsqsurvey.samhsa.govhttp://cbhsqsurvey.samhsa.gov/ From: Anthony Damico [mailto:ajdam...@gmail.com] Sent: Monday, October 01, 2012 10:07 AM To: Muhuri, Pradip (SAMHSA/CBHSQ) Cc: R help Subject: Re: [R] svyboxplot - library (survey) using a slight modification of the example shown in ?svyboxplot # load survey library library(survey) # load example data data(api) # create an example svydesign dstrat - svydesign(id = ~1, strata = ~stype, weights = ~pw, data = apistrat, fpc = ~fpc) # set the plot window to display 1 plot x 2 plots par(mfrow=c(1,2)) # generate two example boxplots svyboxplot(enroll~stype,dstrat,all.outliers=TRUE) svyboxplot(enroll~1,dstrat) # done # alternative: not as nice # set the plot window to display 2 plots x 1 plot par(mfrow=c(2,1)) # generate two example boxplots svyboxplot(enroll~stype,dstrat,all.outliers=TRUE) svyboxplot(enroll~1,dstrat) # done On Mon, Oct 1, 2012 at 9:50 AM, Muhuri, Pradip (SAMHSA/CBHSQ) pradip.muh...@samhsa.hhs.govmailto:pradip.muh...@samhsa.hhs.gov wrote: Hello, I have used the library (survey) package for boxplots using the following code. Could anyone please tell me why I am getting only 1 boxplot instead of 2 boxplots (1-SPD, 2-No SPD). What changes in the following code would be required to get 2 boxplots in the same plot frame? Thanks, Pradip ### nhis - svydesign (id=~psu, strat=~stratum, weights=~wt8, data=tor, nest=TRUE) svyboxplot (dthage~xspd2, subset (nhis, mortstat==1), col=gray80, varwidth=TRUE, ylab=Age at Death, xlab=SPD Status: 1-SPD, 2=No SPD) Pradip K. Muhuri Statistician Substance Abuse Mental Health Services Administration The Center for Behavioral Health Statistics and Quality Division of Population Surveys 1 Choke Cherry Road, Room 2-1071 Rockville, MD 20857 Tel: 240-276-1070 Fax: 240-276-1260 e-mail: pradip.muh...@samhsa.hhs.govmailto:pradip.muh...@samhsa.hhs.gov The Center for Behavioral Health Statistics and Quality your feedback. Please click on the following link to complete a brief customer survey: http://cbhsqsurvey.samhsa.gov vide commented, minimal, self-contained, reproducible code. __ R-help@r-project.orgmailto:R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Thomas Lumley Professor of Biostatistics University of Auckland __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] svyby and make.formula
Dear Anthony, Thank you very much for helping me resolve the issues. I now got all the results, which I intended to generate. Pradip Muhuri From: Anthony Damico [ajdam...@gmail.com] Sent: Tuesday, October 02, 2012 9:50 PM To: Muhuri, Pradip (SAMHSA/CBHSQ) Cc: R help Subject: Re: svyby and make.formula please double-check that you've got all of your parameters correct by typing ?svymean ?svyby and ?make.formula before you send questions to r-help :) # you spelled design wrong and probably need to throw out your NA values. try this # percentile by SPD status svyby(~dthage, ~xspd2, design=nhis, svyquantile, c( 0 , .25 , .5 , .75 , 1 ), keep.var = F, na.rm = TRUE) # mean for each of the 3 variables # this returns a logical vector, but make.formula requires a character vector vars - names(nhis) %in% c(dthage, ypll_ler, ypl_75) vars svymean(make.formula(vars),nhis,subset (nhis, mortstat==1), na.rm=TRUE) # create a character vector instead # note you also spelled the third variable wrong-- it will break unless you correct that vars - c(dthage, ypll_ler, ypll_75) # this statement has two survey design parameters, which won't work. which one do you want to use? svymean(make.formula(vars),nhis,subset (nhis, mortstat==1), na.rm=TRUE) # pick one svymean(make.formula(vars),nhis, na.rm=TRUE) svymean(make.formula(vars),subset(nhis, mortstat==1), na.rm=TRUE) # all of the variables in vars are NA whenever mortstat isn't 1, so they give the same results On Tue, Oct 2, 2012 at 7:51 PM, Muhuri, Pradip (SAMHSA/CBHSQ) pradip.muh...@samhsa.hhs.govmailto:pradip.muh...@samhsa.hhs.gov wrote: Hello, Although my R code for the svymean () and svyquantile () functions works fine, I am stuck with the svyby () and make.formula () functions. I got the following error messages. - Error: object of type 'closure' is not subsettable # svyby () - Error in xx[[1]] : subscript out of bounds# make.formula () A reproducible example is appended below. I would appreciate if someone could help me. Thank you in advance. Pradip Muhuri Below is a reproducible example ## setwd (E:/RDATA) library (survey) xd1 - dthage ypll_ler ypll_75 xspd2 psu stratum wt8 mortstat NA NANA 2 1 1 1683.73870 NA NANA 2 1 1 640.89500 NA NANA 2 1 1 714.06620 NA NANA 2 1 1 714.06620 NA NANA 2 1 1 530.52630 NA NANA 2 1 1 2205.28630 NA NANA 2 1 339 1683.73870 NA NANA 2 1 339 640.8950tel:339%20%C2%A0640.89500 NA NANA 2 1 339 714.0662tel:339%20%C2%A0714.06620 NA NANA 2 1 339 714.0662tel:339%20%C2%A0714.06620 NA NANA 2 1 339 530.5263tel:339%20%C2%A0530.52630 NA NANA 2 1 339 2205.28630 788.817926 0 2 2 1 592.3100 1 809.291881 0 2 2 1 1014.7387 1 875.001076 0 2 2 1 853.4763 1 875.001076 0 2 2 1 505.1475 1 885.510514 0 2 2 1 1429.5963 1 788.817926 0 2 2 339 592.3100tel:339%20%C2%A0592.31001 809.291881 0 2 2 339 1014.73871 875.001076 0 2 2 339 853.4763tel:339%20%C2%A0853.47631 875.001076 0 2 2 339 505.1475tel:339%20%C2%A0505.14751 885.510514 0 2 2 339 1429.59631 788.817926 0 2 2 339 592.3100tel:339%20%C2%A0592.31001 809.291881 0 2 2 339 1014.73871 875.001076 0 2 2 339 853.4763tel:339%20%C2%A0853.47631 875.001076 0 2 2 339 505.1475tel:339%20%C2%A0505.14751 885.510514 0 2 2 339 1429.59631 newdata - read.table (textConnection(xd1), header=TRUE, as.ishttp://as.is=TRUE) dim (newdata) # make the grouping variable (xspd)2 newdata$xspd2 - factor(newdata$xspd2,levels=c (1,2),labels=c('SPD', 'No SPD'), ordered=TRUE) nhis - svydesign (id=~psu,strat=~stratum, weights=~wt8, data=newdata, nest=TRUE) # mean age at death - nationwide svymean( ~dthage, data=nhis , subset (nhis, mortstat==1)) # mean by SPD status svyby(~dthage, ~xspd2 , design=nhis, svymean ) #percentile svyquantile(~dthage, data = nhis , subset (nhis, mortstat==1), c( 0 , .25 , .5 , .75 , 1 ) ) # percentile by SPD status svyby(~dthage, ~xspd2
[R] svyboxplot - library (survey)
Hello, I have used the library (survey) package for boxplots using the following code. Could anyone please tell me why I am getting only 1 boxplot instead of 2 boxplots (1-SPD, 2-No SPD). What changes in the following code would be required to get 2 boxplots in the same plot frame? Thanks, Pradip ### nhis - svydesign (id=~psu, strat=~stratum, weights=~wt8, data=tor, nest=TRUE) svyboxplot (dthage~xspd2, subset (nhis, mortstat==1), col=gray80, varwidth=TRUE, ylab=Age at Death, xlab=SPD Status: 1-SPD, 2=No SPD) Pradip K. Muhuri Statistician Substance Abuse Mental Health Services Administration The Center for Behavioral Health Statistics and Quality Division of Population Surveys 1 Choke Cherry Road, Room 2-1071 Rockville, MD 20857 Tel: 240-276-1070 Fax: 240-276-1260 e-mail: pradip.muh...@samhsa.hhs.gov The Center for Behavioral Health Statistics and Quality your feedback. Please click on the following link to complete a brief customer survey: http://cbhsqsurvey.samhsa.gov vide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] svyboxplot - library (survey)
Dear Anthony, Yes, I can follow the example code you have given. But, do you know from the code shown below (following Thomas Lumley's Complex Surveys) why I am getting the boxplot of dthage for just xspd=1, not xspd2=2? My intent is the make this code work so that I can generate similar plots on other continuous variable. Any help will be appreciated. Thanks, Pradip nhis - svydesign (id=~psu, strat=~stratum, weights=~wt8, data=tor, nest=TRUE) svyboxplot (dthage~xspd2, subset (nhis, mortstat==1), col=gray80, varwidth=TRUE, ylab=Age at Death, xlab=SPD Status: 1-SPD, 2=No SPD) Pradip K. Muhuri, PhD Statistician Substance Abuse Mental Health Services Administration The Center for Behavioral Health Statistics and Quality Division of Population Surveys 1 Choke Cherry Road, Room 2-1071 Rockville, MD 20857 Tel: 240-276-1070 Fax: 240-276-1260 e-mail: pradip.muh...@samhsa.hhs.govmailto:pradip.muh...@samhsa.hhs.gov The Center for Behavioral Health Statistics and Quality your feedback. Please click on the following link to complete a brief customer survey: http://cbhsqsurvey.samhsa.govhttp://cbhsqsurvey.samhsa.gov/ From: Anthony Damico [mailto:ajdam...@gmail.com] Sent: Monday, October 01, 2012 10:07 AM To: Muhuri, Pradip (SAMHSA/CBHSQ) Cc: R help Subject: Re: [R] svyboxplot - library (survey) using a slight modification of the example shown in ?svyboxplot # load survey library library(survey) # load example data data(api) # create an example svydesign dstrat - svydesign(id = ~1, strata = ~stype, weights = ~pw, data = apistrat, fpc = ~fpc) # set the plot window to display 1 plot x 2 plots par(mfrow=c(1,2)) # generate two example boxplots svyboxplot(enroll~stype,dstrat,all.outliers=TRUE) svyboxplot(enroll~1,dstrat) # done # alternative: not as nice # set the plot window to display 2 plots x 1 plot par(mfrow=c(2,1)) # generate two example boxplots svyboxplot(enroll~stype,dstrat,all.outliers=TRUE) svyboxplot(enroll~1,dstrat) # done On Mon, Oct 1, 2012 at 9:50 AM, Muhuri, Pradip (SAMHSA/CBHSQ) pradip.muh...@samhsa.hhs.govmailto:pradip.muh...@samhsa.hhs.gov wrote: Hello, I have used the library (survey) package for boxplots using the following code. Could anyone please tell me why I am getting only 1 boxplot instead of 2 boxplots (1-SPD, 2-No SPD). What changes in the following code would be required to get 2 boxplots in the same plot frame? Thanks, Pradip ### nhis - svydesign (id=~psu, strat=~stratum, weights=~wt8, data=tor, nest=TRUE) svyboxplot (dthage~xspd2, subset (nhis, mortstat==1), col=gray80, varwidth=TRUE, ylab=Age at Death, xlab=SPD Status: 1-SPD, 2=No SPD) Pradip K. Muhuri Statistician Substance Abuse Mental Health Services Administration The Center for Behavioral Health Statistics and Quality Division of Population Surveys 1 Choke Cherry Road, Room 2-1071 Rockville, MD 20857 Tel: 240-276-1070 Fax: 240-276-1260 e-mail: pradip.muh...@samhsa.hhs.govmailto:pradip.muh...@samhsa.hhs.gov The Center for Behavioral Health Statistics and Quality your feedback. Please click on the following link to complete a brief customer survey: http://cbhsqsurvey.samhsa.gov vide commented, minimal, self-contained, reproducible code. __ R-help@r-project.orgmailto:R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Bar chart in ascending order for each level of X
Hello List, The question is how to plot a bar chart in which bars are sorted in ascending order for each level of X. I would appreciate receiving your advice and help. Thanks, Pradip Muhuri ** The following codes work when producing the chart in which bars are NOT sorted. Please see the output. * Data File 5.1 8.7 1.6 3.7 7.4 2.8 10.412.03.5 4.4 8.8 1.7 2.0 3.5 0.7 6.7 11.03.1 5.3 6.7 1.8 ### #source(C:/Documents and Settings/pradip.muhuri/My Documents/disorders_chart1.R) - Please ignore this line #R Scripts for bar chart begin here # Read drug data from tab-delimited data set drug_data - read.table(C:/Documents and Settings/pradip.muhuri/My Documents/xdrug.dat, header=FALSE, col.names=c(Age_1217, Age_1825, Age_26Plus), row.names = c(White,Black,Native American/Alaska Native,Hawaiian/OPI,Asian, More than One Race, Hispanic), sep=\t) # Graph drug use disorder data with adjacent bars using rainbow colors barplot(as.matrix(drug_data), main=Past-Year Illicit Drug Use Disorders by Race/Ethnicity, ylab= Past-Year Use Disorder Rate (%), beside=TRUE, col=rainbow(7)) legend(topright, c(White,Black,Native American/Alaska Native,Hawaiian/OPI,Asian, More than One Race, Hispanic), cex=0.6, bty=n, fill=rainbow(7)); Bar_Graph.pdf Description: Bar_Graph.pdf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Asymmetrical Confidence Interval
Dear Patrick, I do agree with you that it is a very simple problem. Actually, I do have the following SAS program written to compute the asymmetrical confidence interval. As a new user of R, I just wanted to see the corresponding codes in R if they already exist. Thanks, Pradip *SAS program begins here; / MEAN = prevalence rate PLOWER = lower 95% confidence limit for the rate PPER = upper 95% confidence limit for the rate TLOWER = lower 95% confidence limit for the total TUPPER = upper 95% confidence limit for the total Calculate the 95% CI FOR PREVALENCE RATES AND TOTALS / IF MEAN=0 OR MEAN=1 THEN DO; L=.; NUMBER=.; A=.; B=.; PLOWER=.; PUPPER=.; TLOWER=.; TUPPER=.; END; ELSE DO; L=LOG(MEAN/(1-MEAN)); NUMBER=SEMEAN/(MEAN*(1-MEAN)); A=L-1.96*NUMBER; B=L+1.96*NUMBER; PLOWER=1/(1+EXP(-A)); PUPPER=1/(1+EXP(-B)); TLOWER=WSUM*PLOWER;TUPPER=WSUM*PUPPER; END; RUN; *SAS program ends here: Pradip K. Muhuri, PhD Statistician Substance Abuse Mental Health Services Administration The Center for Behavioral Health Statistics and Quality Division of Population Surveys 1 Choke Cherry Road, Room 7-1023 Rockville, MD 20857 Tel: 240-276-1070 Fax: 240-276-1260 e-mail: pradip.muh...@samhsa.hhs.gov The Center for Behavioral Health Statistics and Quality your feedback. Please click on the following link to complete a brief customer survey: http://cbhsqsurvey.samhsa.gov -Original Message- From: Patrick Connolly [mailto:p_conno...@slingshot.co.nz] Sent: Sunday, June 19, 2011 1:57 AM To: Muhuri, Pradip (SAMHSA/CBHSQ) Cc: r-help@r-project.org; 'tlum...@u.washington.edu' Subject: Re: [R] Asymetrical Confidence Interval On Thu, 16-Jun-2011 at 04:43PM -0400, Muhuri, Pradip (SAMHSA/CBHSQ) wrote: | | Dear List, | | I wanted to calculate the asymmetrical confidence interval based on | the sample statistic and standard error that available from the | published report (complex survey-based). | The calculation details can be seen from pages 17-18 of the | document at the following link: | http://www.oas.samhsa.gov/nsduh/2k5MRB/2k5statInference.pdf. | | Could someone tell me whether R has any function included in it | survey or other contributed package of R. There might be one in a package somewhere, but it's so trivial to make your own function by using the information you already have. This is sounding suspiciously like a homework question. -- ~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~. ___Patrick Connolly {~._.~} Great minds discuss ideas _( Y )_ Average minds discuss events (:_~*~_:) Small minds discuss people (_)-(_) . Eleanor Roosevelt ~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Asymetrical Confidence Interval
Dear List, I wanted to calculate the asymmetrical confidence interval based on the sample statistic and standard error that available from the published report (complex survey-based). The calculation details can be seen from pages 17-18 of the document at the following link: http://www.oas.samhsa.gov/nsduh/2k5MRB/2k5statInference.pdf. Could someone tell me whether R has any function included in it survey or other contributed package of R. Thank you in advance, Pradip Muhuri __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Contributed Packages - Hmisc survey
Hello List, Could someone tell why I can't install the Himsc and survey packages for R version 2.13.0 (2011-04-13)? What am I doing wrong here? Thanks, Pradip install.packages (Hmisc, dependencies=TRUE) --- Please select a CRAN mirror for use in this session --- Warning: unable to access index for repository http://watson.nci.nih.gov/cran_mirror/bin/windows/contrib/2.13 Warning message: In getDependencies(pkgs, dependencies, available, lib) : package 'Hmisc' is not available (for R version 2.13.0) install.packages (survey, dependencies=TRUE) Warning: unable to access index for repository http://watson.nci.nih.gov/cran_mirror/bin/windows/contrib/2.13 Warning message: In getDependencies(pkgs, dependencies, available, lib) : package 'survey' is not available (for R version 2.13.0) Pradip K. Muhuri, PhD Statistician Substance Abuse Mental Health Services Administration The Center for Behavioral Health Statistics and Quality Division of Population Surveys 1 Choke Cherry Road, Room 7-1023 Rockville, MD 20857 Tel: 240-276-1070 Fax: 240-276-1260 e-mail: pradip.muh...@samhsa.hhs.govmailto:pradip.muh...@samhsa.hhs.gov The Center for Behavioral Health Statistics and Quality your feedback. Please click on the following link to complete a brief customer survey: http://cbhsqsurvey.samhsa.gov [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R in batch mode
Hi Everyone, I am a new R user and trying to run R jobs in batch mode. Robert Muenchen (2009), in his book R for SAS and SPSS Users, has suggested writing a small batch file like mR.bat as shown below: C:\Program File\R\R-2.10.0\bin\Rterm.exe --no-restore --no-save %1 %1.Rout 2 1 Could anyone tell me in which directory or subdirectory I should save the mR.bat file? I would appreciate receiving any support you could extend on this subject. Thank you in advance, Pradip Pradip K. Muhuri, PhD Statistician Substance Abuse Mental Health Services Administration The Center for Behavioral Health Statistics and Quality Division of Population Surveys 1 Choke Cherry Road, Room 7-1023 Rockville, MD 20857 Tel: 240-276-1070 Fax: 240-276-1260 e-mail: pradip.muh...@samhsa.hhs.govmailto:pradip.muh...@samhsa.hhs.gov The Center for Behavioral Health Statistics and Quality your feedback. Please click on the following link to complete a brief customer survey: http://cbhsqsurvey.samhsa.gov [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R in batch mode
Dear Jonathan (and List), Sorry for bothering you again, and I am requesting your further guidance on this subject. Below are the steps, which I have followed. But I got an error message. 1. The content of myR.bat is as follows: C:\R\bin\i386\R\R-2.10.0\bin\Rterm.exe --no-restore --no-save %1 %1.Rout 2 1 2. I have saved that .bat file in the subdirectory C:\R\bin\i386\R\R-2.10.0\bin. 3. on the R prompt, I have issued the following: setwd(E:/R) 4. Then I have issued the following: myR dateR.R. Error: unexpected symbol in myR dateR.R What am I doing wrong? Please help resolve the issue. Thanks, Pradip Pradip K. Muhuri Statistician Substance Abuse Mental Health Services Administration The Center for Behavioral Health Statistics and Quality Division of Population Surveys 1 Choke Cherry Road, Room 7-1023 Rockville, MD 20857 Tel: 240-276-1070 Fax: 240-276-1260 e-mail: pradip.muh...@samhsa.hhs.gov The Center for Behavioral Health Statistics and Quality your feedback. Please click on the following link to complete a brief customer survey: http://cbhsqsurvey.samhsa.gov -Original Message- From: Jonathan Daily [mailto:biomathjda...@gmail.com] Sent: Tuesday, May 24, 2011 1:18 PM To: Muhuri, Pradip (SAMHSA/CBHSQ) Cc: R-help@r-project.org Subject: Re: [R] R in batch mode Save it anywhere that is on your search path, which can be seen by typing path into the command line. On Tue, May 24, 2011 at 12:40 PM, Muhuri, Pradip (SAMHSA/CBHSQ) pradip.muh...@samhsa.hhs.gov wrote: Hi Everyone, I am a new R user and trying to run R jobs in batch mode. Robert Muenchen (2009), in his book R for SAS and SPSS Users, has suggested writing a small batch file like mR.bat as shown below: C:\Program File\R\R-2.10.0\bin\Rterm.exe --no-restore --no-save %1 %1.Rout 2 1 Could anyone tell me in which directory or subdirectory I should save the mR.bat file? I would appreciate receiving any support you could extend on this subject. Thank you in advance, Pradip Pradip K. Muhuri, PhD Statistician Substance Abuse Mental Health Services Administration The Center for Behavioral Health Statistics and Quality Division of Population Surveys 1 Choke Cherry Road, Room 7-1023 Rockville, MD 20857 Tel: 240-276-1070 Fax: 240-276-1260 e-mail: pradip.muh...@samhsa.hhs.govmailto:pradip.muh...@samhsa.hhs.gov The Center for Behavioral Health Statistics and Quality your feedback. Please click on the following link to complete a brief customer survey: http://cbhsqsurvey.samhsa.gov [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- === Jon Daily Technician === #!/usr/bin/env outside # It's great, trust me. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.