Re: [R] big data?
The read.table.ffdf function in the ff package can read in delimited files and store them to disk as individual columns. The ffbase package provides additional data management and analytic functionality. I have used these packages on 15 Gb files of 18 million rows and 250 columns. On Tuesday, August 5, 2014 1:39:03 PM UTC-5, David Winsemius wrote: On Aug 5, 2014, at 10:20 AM, Spencer Graves wrote: What tools do you like for working with tab delimited text files up to 1.5 GB (under Windows 7 with 8 GB RAM)? ?data.table::fread Standard tools for smaller data sometimes grab all the available RAM, after which CPU usage drops to 3% ;-) The bigmemory project won the 2010 John Chambers Award but is not available (for R version 3.1.0). findFn(big data, 999) downloaded 961 links in 437 packages. That contains tools for data PostgreSQL and other formats, but I couldn't find anything for large tab delimited text files. Absent a better idea, I plan to write a function getField to extract a specific field from the data, then use that to split the data into 4 smaller files, which I think should be small enough that I can do what I want. There is the colbycol package with which I have no experience, but I understand it is designed to partition data into column sized objects. #--- from its help file- cbc.get.col {colbycol}R Documentation Reads a single column from the original file into memory Description Function cbc.read.table reads a file, stores it column by column in disk file and creates a colbycol object. Functioncbc.get.col queries this object and returns a single column. Thanks, Spencer __ r-h...@r-project.org javascript: mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius Alameda, CA, USA __ r-h...@r-project.org javascript: mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Multiple rms summary plots in a single device
I would like to incorporate multiple summary plots from the rms package into a single device and to control the titles, and also to open a new device when I reach a specified number of plots. Currently I am only getting a single plot(summary( graph in the upper left- hand corner of each successive device. However, in the rms documention I see instances of a loop being used with par(mfrow( for multiple plots in a single device(e.g. residuals.lrm), and these examples work on my system. Please advise regarding options that must be specified to plot(summary(, or in the construction of my loop. Below are sample code and my sessionInfo(). Please note that I am using data.table to facilitate my real analysis, but I can replicate the issue with tData as a data.frame (using seg - subset(tData, groups == segment) logic), but I included the data.table logic in case it may be having some influence. Thank you! Mike tData - data.frame(groups=as.factor(1:8), low=as.factor(1:4) ,high=as.factor(seq(100, 400, 100)), rand=runif(400)) tData - data.table(tData) setkeyv(tData, 'groups') dd - datadist(tData) options(datadist = 'dd') doSumPlot - function(segment){ seg - tData[groups == segment,] plot(summary(rand ~ + low + high ,data = seg ), main=paste('Group:', segment)) } for(i in 1:length(levels(tData$groups))){ cat('Group: ', i, '\n') if(i == 1 ){ dev.new() par(mfrow=c(2,2)) } if(i/5 == round(i/5, 0)){ dev.new() par(mfrow=c(2,2)) } # dev.new() doSumPlot(levels(tData$groups)[i]) } sessionInfo() R version 2.15.0 (2012-03-30) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] splines stats graphics grDevices utils datasets methods [8] base other attached packages: [1] rms_3.5-0Hmisc_3.9-3 survival_2.36-14 data.table_1.8.0 loaded via a namespace (and not attached): [1] cluster_1.14.2 grid_2.15.0lattice_0.20-6 tools_2.15.0 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] glmmPQL and predict
Is the labeling/naming of levels in the documentation for the predict.glmmPQL function backwards? The documentation states Level values increase from outermost to innermost grouping, with level zero corresponding to the population predictions. Taking the sample in the documentation: fit - glmmPQL(y ~ trt + I(week 2), random = ~1 | ID, family = binomial, data = bacteria) head(predict(fit, bacteria, level = 0, type=response)) [1] 0.9680779 0.9680779 0.8587270 0.8587270 0.9344832 0.9344832 head(predict(fit, bacteria, level = 1, type=response)) X01 X01 X01 X01 X02 X02 0.9828449 0.9828449 0.9198935 0.9198935 0.9050782 0.9050782 head(predict(fit, bacteria, type=response)) ## population prediction X01 X01 X01 X01 X02 X02 0.9828449 0.9828449 0.9198935 0.9198935 0.9050782 0.9050782 The returned values for level=1 and level=unspecified match, which is not what I expected based upon the documentation. Exponentiating the intercept coefficients from the fitted regression, the level=0 values match when the random effect intercept is included 1/(1+exp(-3.412014)) ## only the fixed effect [1] 0.9680779 1/(1+exp(-1*(3.412014+0.63614382))) ## fixed and random effect intercepts [1] 0.9828449 Thanks! Mike __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Dropping columns from data frame
How does R do it, and should I ever be worried? I always remove columns by index, and it works exactly as I would naively expect - but HOW? The second illustration, which deletes non contiguous columns, represents what I do all the time and have some trepidation about because I don't know the mechanics (e.g. why doesn't the column formerly-known-as-4 become 3 after column 1 is dropped: doesn't vector removal from a df/list invoke a loop in C?). Can I delete a named list of columns, which are examples 4 and 5 and which generate the unary error' mesages, without resorting to orig.df$num1.10 - NULL? Thanks! orig.df - data.frame(cbind( 1:10 ,11:20 ,letters[1:10] ,letters[11:20] ,LETTERS[1:10] ,LETTERS[11:20] )) names(orig.df) - c( 'num1.10' ,'num11.20' ,'lc1.10' ,'lc11.20' ,'uc1.10' ,'uc11.20' ) # Illustration 1: contiguous columns at beginning of data frame head(orig.df[,-c(1:3)]) # Illustration 2: non-contiguous columns head(orig.df[,-c(1,3,5)]) # Illustration 3: contiguous columns at end of data frame head(orig.df[,-c(4:6)]) ## as expected # Illustrations 4-5: unary errors head(orig.df[,-c(as.list('num1.10', 'lc1.10', 'uc1.10'))]) head(orig.df[,-c('num1.10', 'lc1.10', 'uc1.10')]) Mike __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Dropping columns from data frame
Thank you, David. I was merely using head to limit the code/ output. My question remains, because a created data frame has the same columns as was output from head: head(orig.df,3) num1.10 num11.20 lc1.10 lc11.20 uc1.10 uc11.20 1 1 11 a k A K 2 2 12 b l B L 3 3 13 c m C M # Illustration 1: contiguous columns at beginning of data frame head(orig.df[,-c(1:3)],2) lc11.20 uc1.10 uc11.20 1 k A K 2 l B L new.df - orig.df[,-c(1:3)] head(new.df,2) lc11.20 uc1.10 uc11.20 1 k A K 2 l B L # Illustration 2: non-contiguous columns head(orig.df[,-c(1,3,5)],2) num11.20 lc11.20 uc11.20 1 11 k K 2 12 l L new.df - orig.df[,-c(1,3,5)] head(new.df,2) num11.20 lc11.20 uc11.20 1 11 k K 2 12 l L On Jan 6, 9:49 am, David Winsemius dwinsem...@comcast.net wrote: On Jan 6, 2012, at 10:00 AM, Mike Harwood wrote: How does R do it, and should I ever be worried? I always remove columns by index, and it works exactly as I would naively expect - but HOW? The second illustration, which deletes non contiguous columns, represents what I do all the time and have some trepidation about because I don't know the mechanics (e.g. why doesn't the column formerly-known-as-4 become 3 after column 1 is dropped: doesn't vector removal from a df/list invoke a loop in C?). You are NOT removing columns. You are returning (to `head` and then to `print`) an extract from the dataframe, but that does not change the original dataframe. To effect a change you would need to assign the value back to the same name as the original daatframe. -- David Can I delete a named list of columns, which are examples 4 and 5 and which generate the unary error' mesages, without resorting to orig.df$num1.10 - NULL? Thanks! orig.df - data.frame(cbind( 1:10 ,11:20 ,letters[1:10] ,letters[11:20] ,LETTERS[1:10] ,LETTERS[11:20] )) names(orig.df) - c( 'num1.10' ,'num11.20' ,'lc1.10' ,'lc11.20' ,'uc1.10' ,'uc11.20' ) # Illustration 1: contiguous columns at beginning of data frame head(orig.df[,-c(1:3)]) # Illustration 2: non-contiguous columns head(orig.df[,-c(1,3,5)]) # Illustration 3: contiguous columns at end of data frame head(orig.df[,-c(4:6)]) ## as expected # Illustrations 4-5: unary errors head(orig.df[,-c(as.list('num1.10', 'lc1.10', 'uc1.10'))]) head(orig.df[,-c('num1.10', 'lc1.10', 'uc1.10')]) Mike __ r-h...@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Fwd: tcplot documentation in evd package
This issue occurs only when both the evd and ismev packages are loaded. Please retract this posting, if possible. Thank you in advance! Mike -- Forwarded message -- From: Mike Harwood harwood...@gmail.com Date: Mon, Dec 12, 2011 at 7:47 PM Subject: tcplot documentation in evd package To: r-help@r-project.org Hello, and please advise regarding any errors/omissions on my part. However, the documentation for R's tcplot function in the evd package appears to contain an error. I am using evd version 2.2-4 in R 2.14.0 with Windows XP. data(portpirie) mrlplot(portpirie) ## No Error tlim - c(3.6, 4.2) tcplot(portpirie, tlim) ## Line from documentation Error in fpot(data, u[1], model = model, cmax = cmax, r = r, ulow = ulow, : `x' must be a non-empty numeric vector tcplot(portpirie, tlim=c(3.6, 4.2)) ## explicitly specifying the threshold limits Error in fpot(data, u[1], model = model, cmax = cmax, r = r, ulow = ulow, : `x' must be a non-empty numeric vector tcplot(portpirie$SeaLevel, tlim) ## Resolves Issue gpd.fitrange(portpirie$SeaLevel, 3.6, 4.2) ## An alternative, still naming the SeaLevel vector Please advise. Thanks! Mike [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] tcplot documentation in evd package
Hello, and please advise regarding any errors/omissions on my part. However, the documentation for R's tcplot function in the evd package appears to contain an error. I am using evd version 2.2-4 in R 2.14.0 with Windows XP. data(portpirie) mrlplot(portpirie) ## No Error tlim - c(3.6, 4.2) tcplot(portpirie, tlim) ## Line from documentation Error in fpot(data, u[1], model = model, cmax = cmax, r = r, ulow = ulow, : `x' must be a non-empty numeric vector tcplot(portpirie, tlim=c(3.6, 4.2)) ## explicitly specifying the threshold limits Error in fpot(data, u[1], model = model, cmax = cmax, r = r, ulow = ulow, : `x' must be a non-empty numeric vector tcplot(portpirie$SeaLevel, tlim) ## Resolves Issue gpd.fitrange(portpirie$SeaLevel, 3.6, 4.2) ## An alternative, still naming the SeaLevel vector Please advise. Thanks! Mike __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] survexp with large dataframes
Hello, and thank you in advance. I would like to capture the expected survival from a coxph model for subjects in an observational study with recurrent events, but the survexp statement is failing due to memory. I am using R version 2.13.1 (2011-07-08) on Windows XP. My objective is to plot the fitted survival with the Kaplan-Meier plot. Below is the code with output and [unfortunately] errors. Is there something wrong in my use of cluster in generating the proportional hazards model, or is there some syntax to pass it into survexp? Mike dim(dev) [1] 899876 25 mod1 - coxph(Surv(begin.cp, end.cp, event) + ~ age.sex + + plan_type + + uw_load + + cluster(mbr_key) + ,data=dev + ) summary(mod1) Call: coxph(formula = Surv(begin.cp, end.cp, event) ~ age.sex + plan_type + uw_load + cluster(mbr_key), data = dev) n= 899876, number of events= 753324 coef exp(coef) se(coef) robust se z Pr(|z|) age.sex19-34_MALE -0.821944 0.439576 0.005529 0.023298 -35.280 2e-16 *** age.sex35-49_FEMALE 0.058776 1.060537 0.004201 0.018477 3.181 0.00147 ** age.sex35-49_MALE -0.515590 0.597148 0.004634 0.019986 -25.798 2e-16 *** age.sex50-64_FEMALE 0.190940 1.210386 0.004350 0.020415 9.353 2e-16 *** age.sex50-64_MALE -0.127514 0.880281 0.004487 0.021431 -5.950 2.68e-09 *** age.sexCHILD_CHILD -0.327522 0.720707 0.004238 0.017066 -19.192 2e-16 *** plan_typeLOW-0.165735 0.847270 0.002443 0.011080 -14.958 2e-16 *** uw_load1-50 0.215122 1.240014 0.006437 0.029189 7.370 1.71e-13 *** uw_load101-250 0.551042 1.735060 0.003993 0.018779 29.344 2e-16 *** uw_load251+ 0.981660 2.668884 0.003172 0.017490 56.126 2e-16 *** uw_load51-1000.413464 1.512046 0.006216 0.027877 14.832 2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 exp(coef) exp(-coef) lower .95 upper .95 age.sex19-34_MALE 0.4396 2.27490.42000.4601 age.sex35-49_FEMALE1.0605 0.94291.02281.0996 age.sex35-49_MALE 0.5971 1.67460.57420.6210 age.sex50-64_FEMALE1.2104 0.82621.16291.2598 age.sex50-64_MALE 0.8803 1.13600.84410.9180 age.sexCHILD_CHILD 0.7207 1.38750.69700.7452 plan_typeLOW 0.8473 1.18030.82910.8659 uw_load1-501.2400 0.80641.17111.3130 uw_load101-250 1.7351 0.57631.67241.8001 uw_load251+2.6689 0.37472.57892.7620 uw_load51-100 1.5120 0.66141.43161.5970 Concordance= 0.643 (se = 0 ) Rsquare= 0.205 (max possible= 1 ) Likelihood ratio test= 206724 on 11 df, p=0 Wald test= 9207 on 11 df, p=0 Score (logrank) test = 246358 on 11 df, p=0, Robust = 4574 p=0 (Note: the likelihood ratio and score tests assume independence of observations within a cluster, the Wald and robust score tests do not). dev.fit - survexp( ~ 1, ratetable=mod1, data=dev) Error in survexp.cfit(cbind(as.numeric(X), R), Y, conditional, FALSE, : cannot allocate memory block of size 15.2 Gb __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] predict with eha package
Hello list, and thank you in advance. I'm unable to generate predicted values when specifying newdata using phreg and aftreg models in the eha package, but I do not have the same problem with a proportional hazards model from coxph. Without the newdata argument the predicted values are returned, but with newdata=model.dataframe coxph is fine but both aftreg and phreg models return an Error in predict.coxph(f.ph.eha, newdata = mort, type = lp) : Data is not the same size as it was in the original fit message. Since I ultimately want a parametric model and the real data is left truncated and right censored, I think the aftreg function in the eha package is what I must use. Following is my sample code, without the output. #~ All models generated successfully - f.ph - coxph(Surv(enter, exit, event) ~ ses, data = mort) f.ph.eha - phreg(Surv(enter, exit, event) ~ ses, data = mort) f.aft - aftreg(Surv(enter, exit, event) ~ ses, data = mort) #~ All fits generated successfully --- f.ph.fit - predict(f.ph, type='lp') f.ph.eha.fit - predict(f.ph.eha, type='lp') f.aft.fit - predict(f.aft, type='lp') #~ First fit generated successfully, others output error f.ph.fit - predict(f.ph, newdata=mort, type='lp') f.ph.eha.fit - predict(f.ph.eha, newdata=mort, type='lp') f.aft.fit - predict(f.aft, newdata=mort, type='lp') Mike __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] predict 'expected' with eha package
I am unsure what is being returned, and what is supposed to be returned, when using 'predict' with type='expected' for an aftreg survival model. The code below first generates a weibull model, then uses predict to create a vector of the linear predictors, then attempts to create the 'expected' vector, which is empty. The final two steps in the code generate a lognormal model with the same data, and the same empty 'expected' vector. My expectation had been that 'expected' would generate the same transformed dependent variable output as predict with a survreg model using type='response'. Since my 'real' data is left-truncated and right-censored I cannot use survreg, and I wanted to investigate the output from eha. Thanks in advance! Mike data(mort) aftreg(Surv(enter, exit, event) ~ ses, data = mort) Call: aftreg(formula = Surv(enter, exit, event) ~ ses, data = mort) Covariate W.mean Coef Exp(Coef) se(Coef)Wald p ses lower0.416 0 1 (reference) upper0.584-0.348 0.706 0.089 0.000 log(scale)3.60336.698 0.065 0.000 log(shape)0.331 1.392 0.058 0.000 Events276 Total time at risk 17038 Max. log. likelihood -1391.3 LR test statistic 16.1 Degrees of freedom1 Overall p-value 5.91578e-05 m1 - aftreg(Surv(enter, exit, event) ~ ses, data = mort) head(predict(m1, type='lp')) ## produces output 1 2 3 4 5 6 -0.347853 0.00 -0.347853 0.00 0.00 0.00 head(predict(m1, type='expected')) ## is this correct? numeric(0) m2 - aftreg(Surv(enter, exit, event) ~ ses, dist='lognormal', data = mort) head(predict(m2, type='expected')) ## is this correct? numeric(0) from eha (the survival and rms packages are not an option for my 'real' question, since I have left-truncated right-censored data __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Syntax for iter.max in rms
Hello, I would like to increase the number of iterations for running a Buckley-James regression model in the rms package, but something is apparently syntactically wrong. The following code initially is exactly as it appears in the help page (which runs successfully), then a failure to converge message (resulting from specifying an 'identity' link argument, the error message by adding both the iter.max control and 'identity' link arguments, and finally, the error message when testing just an iter.max argument. This was run in R 2.13.0 with rms version 3.3-0 on Ubuntu Linux 11.04. Thank you in advance, and all insights and criticisms are appreciated. Mike library(rms) f - bj(Surv(ftime, stroke) ~ rcs(age,5) + hospital,x=TRUE, y=TRUE) f - bj(Surv(ftime, stroke) ~ rcs(age,5) + hospital, link='identity',x=TRUE, y=TRUE) No convergence in 50 steps Failure in bj.fit f - bj(Surv(ftime, stroke) ~ rcs(age,5) + hospital, link='identity', iter.max=200, x=TRUE, y=TRUE) Error in bj(Surv(ftime, stroke) ~ rcs(age, 5) + hospital, link = identity, : unused argument(s) (iter.max = 200) f - bj(Surv(ftime, stroke) ~ rcs(age,5) + hospital, iter.max=200, x=TRUE, y=TRUE) Error in bj(Surv(ftime, stroke) ~ rcs(age, 5) + hospital, iter.max = 200, : unused argument(s) (iter.max = 200) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ID parameter in model
Thank you, Goran. Please see the package details below: packageDescription('eha') Encoding: UTF-8 Package: eha Version: 1.3-2 Date: 2011-03-01 Title: Event History Analysis Description: A package for survival and event history analysis License: GPL (= 3) Author: Göran Broström Depends: R (= 2.2.0), survival, graphics Maintainer: Göran Broström g...@stat.umu.se Packaged: 2011-03-01 14:56:12 UTC; gb Repository: CRAN Date/Publication: 2011-03-01 15:50:52 Built: R 2.13.0; i386-pc-mingw32; 2011-04-15 08:22:36 UTC; windows Mike On Mon, May 2, 2011 at 10:38 AM, Mike Harwood harwood...@gmail.com wrote: Hello, I am apparently confused about the use of an id parameter for an event history/survival model, and why the EHA documentation for aftreg does not specify one. All assistance and insights are appreciated. Attempting to specifiy an id variable with the documentation example generates an overlapping intervals error, so I sorted the original mort dataframe and set subsequent entry times an id to the previous exit time + 0.0001. This allowed me to see the affect of the id parameter on the coefficients and significance tests, and prompted my question. The code I used is shown below, with the results at the bottom. Thanks in advance! Mike head(mort) ## data clearly contains multiple entries for some of the dataframe ids no.id.aft - aftreg(Surv(enter, exit, event) ~ ses, data = mort) ## Inital model id.aft - aftreg(Surv(enter, exit, event) ~ ses, data = mort, id=id) ## overlapping intervals error mort.sort - ## ensure records ordered mort[ order(mort$id, mort$enter),] ## remove overlap for (i in 2:nrow(mort.sort)){ if (mort.sort[i,'id'] == mort.sort[i-1,'id']) mort.sort[i,'enter'] - mort.sort[i-1, 'exit'] + 0.0001 } no.id.aft.sort - aftreg(Surv(enter, exit, event) ~ ses, data = mort.sort) ## initial model on modified df id.aft.sort - aftreg(Surv(enter, exit, event) ~ ses, id=id, data = mort.sort) ## with id parameter #=== output ===# no.id.aft.sort Call: aftreg(formula = Surv(enter, exit, event) ~ ses, data = mort.sort) Covariate W.mean Coef Exp(Coef) se(Coef)Wald p ses lower0.416 0 1 (reference) upper0.584-0.347 0.707 0.089 0.000 log(scale)3.60336.704 0.065 0.000 log(shape)0.331 1.393 0.058 0.000 Events276 Total time at risk 17045 Max. log. likelihood -1391.4 LR test statistic 16.1 Degrees of freedom1 Overall p-value 6.04394e-05 id.aft.sort Call: aftreg(formula = Surv(enter, exit, event) ~ ses, data = mort.sort, id = id) Covariate W.mean Coef Exp(Coef) se(Coef)Wald p ses lower0.416 0 1 (reference) upper0.584-0.364 0.695 0.090 0.000 log(scale)3.58836.171 0.065 0.000 log(shape)0.338 1.402 0.058 0.000 Events276 Total time at risk 17045 Max. log. likelihood -1390.8 LR test statistic 17.2 Degrees of freedom1 Overall p-value 3.3091e-05 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] ID parameter in model
Hello, I am apparently confused about the use of an id parameter for an event history/survival model, and why the EHA documentation for aftreg does not specify one. All assistance and insights are appreciated. Attempting to specifiy an id variable with the documentation example generates an overlapping intervals error, so I sorted the original mort dataframe and set subsequent entry times an id to the previous exit time + 0.0001. This allowed me to see the affect of the id parameter on the coefficients and significance tests, and prompted my question. The code I used is shown below, with the results at the bottom. Thanks in advance! Mike head(mort) ## data clearly contains multiple entries for some of the dataframe ids no.id.aft - aftreg(Surv(enter, exit, event) ~ ses, data = mort) ## Inital model id.aft - aftreg(Surv(enter, exit, event) ~ ses, data = mort, id=id) ## overlapping intervals error mort.sort - ## ensure records ordered mort[ order(mort$id, mort$enter),] ## remove overlap for (i in 2:nrow(mort.sort)){ if (mort.sort[i,'id'] == mort.sort[i-1,'id']) mort.sort[i,'enter'] - mort.sort[i-1, 'exit'] + 0.0001 } no.id.aft.sort - aftreg(Surv(enter, exit, event) ~ ses, data = mort.sort) ## initial model on modified df id.aft.sort - aftreg(Surv(enter, exit, event) ~ ses, id=id, data = mort.sort) ## with id parameter #=== output ===# no.id.aft.sort Call: aftreg(formula = Surv(enter, exit, event) ~ ses, data = mort.sort) Covariate W.mean Coef Exp(Coef) se(Coef)Wald p ses lower0.416 0 1 (reference) upper0.584-0.347 0.707 0.089 0.000 log(scale)3.60336.704 0.065 0.000 log(shape)0.331 1.393 0.058 0.000 Events276 Total time at risk 17045 Max. log. likelihood -1391.4 LR test statistic 16.1 Degrees of freedom1 Overall p-value 6.04394e-05 id.aft.sort Call: aftreg(formula = Surv(enter, exit, event) ~ ses, data = mort.sort, id = id) Covariate W.mean Coef Exp(Coef) se(Coef)Wald p ses lower0.416 0 1 (reference) upper0.584-0.364 0.695 0.090 0.000 log(scale)3.58836.171 0.065 0.000 log(shape)0.338 1.402 0.058 0.000 Events276 Total time at risk 17045 Max. log. likelihood -1390.8 LR test statistic 17.2 Degrees of freedom1 Overall p-value 3.3091e-05 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] survexp with weights
The system details follow below. Also, I have attempted specifying the variables in a second ratetable statement, but the same missing object error occurs in creating a survexp object/list. R.version _ platform i386-pc-mingw32 arch i386 os mingw32 system i386, mingw32 status major 2 minor 12.2 year 2011 month 02 day25 svn rev54585 language R version.string R version 2.12.2 (2011-02-25) package ‘survival’ version 2.36-5 On Apr 20, 11:03 am, Mike Harwood harwood...@gmail.com wrote: Hello, I probably have a syntax error in trying to generate an expected survival curve from a weighted cox model, but I can't see it. I used the help sample code to generate a weighted model, with the addition of a weights=albumin argument (I only chose albumin because it had no missing values, not because of any real relevance). Below are my code with the resulting error messages. Thanks in advance! pfit - coxph(Surv(time,status0) ~ trt + log(bili) + log(protime) + age + + + platelet, data=pbc + ) pfit Call: coxph(formula = Surv(time, status 0) ~ trt + log(bili) + log(protime) + age + +platelet, data = pbc) coef exp(coef) se(coef) z p trt -0.000624 0.999 0.17304 -0.00360 1. log(bili) 0.985497 2.679 0.08949 11.01262 0. log(protime) 2.794001 16.346 0.95289 2.93215 0.0034 age 0.020590 1.021 0.00805 2.55666 0.0110 platelet -0.001699 0.998 0.00085 -2.00130 0.0450 Likelihood ratio test=164 on 5 df, p=0 n= 308, number of events= 143 (110 observations deleted due to missingness) plot(survfit(Surv(time, status0) ~ trt, data=pbc)) lines(survexp( ~ trt, ratetable=pfit, data=pbc), col='purple') pfit.wtd - coxph(Surv(time,status0) ~ trt + log(bili) + log(protime) + age + + + platelet, weights=albumin, data=pbc + ) pfit.wtd Call: coxph(formula = Surv(time, status 0) ~ trt + log(bili) + log(protime) + age + +platelet, data = pbc, weights = albumin) coef exp(coef) se(coef) z p trt -0.01354 0.987 0.094204 -0.144 8.9e-01 log(bili) 0.99282 2.699 0.048690 20.391 0.0e+00 log(protime) 2.54136 12.697 0.525797 4.833 1.3e-06 age 0.01913 1.019 0.004398 4.350 1.4e-05 platelet -0.00165 0.998 0.000462 -3.578 3.5e-04 Likelihood ratio test=535 on 5 df, p=0 n= 308, number of events= 143 (110 observations deleted due to missingness) plot(survfit(Surv(time, status0) ~ trt, data=pbc)) lines(survexp( ~ trt, ratetable=pfit.wtd, data=pbc), col='purple') Error in eval(expr, envir, enclos) : object 'albumin' not found lines(survexp( ~ trt, ratetable=pfit.wtd, weights=albumin, data=pbc), col='purple') Error in eval(expr, envir, enclos) : object 'albumin' not found In addition: Warning message: In survexp(~trt, ratetable = pfit.wtd, weights = albumin, data = pbc) : Weights ignored lines(survexp( ~ trt, ratetable=pfit.wtd, weights=pbc$albumin, data=pbc), col='purple') Error in eval(expr, envir, enclos) : object 'albumin' not found In addition: Warning message: In survexp(~trt, ratetable = pfit.wtd, weights = pbc$albumin, data = pbc) : Weights ignored __ r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] survexp with weights
Hello, I probably have a syntax error in trying to generate an expected survival curve from a weighted cox model, but I can't see it. I used the help sample code to generate a weighted model, with the addition of a weights=albumin argument (I only chose albumin because it had no missing values, not because of any real relevance). Below are my code with the resulting error messages. Thanks in advance! pfit - coxph(Surv(time,status0) ~ trt + log(bili) + log(protime) + age + + + platelet, data=pbc + ) pfit Call: coxph(formula = Surv(time, status 0) ~ trt + log(bili) + log(protime) + age + +platelet, data = pbc) coef exp(coef) se(coef)z p trt -0.000624 0.999 0.17304 -0.00360 1. log(bili) 0.985497 2.679 0.08949 11.01262 0. log(protime) 2.79400116.346 0.95289 2.93215 0.0034 age 0.020590 1.021 0.00805 2.55666 0.0110 platelet -0.001699 0.998 0.00085 -2.00130 0.0450 Likelihood ratio test=164 on 5 df, p=0 n= 308, number of events= 143 (110 observations deleted due to missingness) plot(survfit(Surv(time, status0) ~ trt, data=pbc)) lines(survexp( ~ trt, ratetable=pfit, data=pbc), col='purple') pfit.wtd - coxph(Surv(time,status0) ~ trt + log(bili) + log(protime) + age + + + platelet, weights=albumin, data=pbc + ) pfit.wtd Call: coxph(formula = Surv(time, status 0) ~ trt + log(bili) + log(protime) + age + +platelet, data = pbc, weights = albumin) coef exp(coef) se(coef) z p trt -0.01354 0.987 0.094204 -0.144 8.9e-01 log(bili) 0.99282 2.699 0.048690 20.391 0.0e+00 log(protime) 2.5413612.697 0.525797 4.833 1.3e-06 age 0.01913 1.019 0.004398 4.350 1.4e-05 platelet -0.00165 0.998 0.000462 -3.578 3.5e-04 Likelihood ratio test=535 on 5 df, p=0 n= 308, number of events= 143 (110 observations deleted due to missingness) plot(survfit(Surv(time, status0) ~ trt, data=pbc)) lines(survexp( ~ trt, ratetable=pfit.wtd, data=pbc), col='purple') Error in eval(expr, envir, enclos) : object 'albumin' not found lines(survexp( ~ trt, ratetable=pfit.wtd, weights=albumin, data=pbc), col='purple') Error in eval(expr, envir, enclos) : object 'albumin' not found In addition: Warning message: In survexp(~trt, ratetable = pfit.wtd, weights = albumin, data = pbc) : Weights ignored lines(survexp( ~ trt, ratetable=pfit.wtd, weights=pbc$albumin, data=pbc), col='purple') Error in eval(expr, envir, enclos) : object 'albumin' not found In addition: Warning message: In survexp(~trt, ratetable = pfit.wtd, weights = pbc$albumin, data = pbc) : Weights ignored __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] debug biglm response error on bigglm model
Thank you, Greg. The issue was in the simulation logic, where one of the values was not changing correctly for some iterations... On Jan 10, 3:20 pm, Greg Snow greg.s...@imail.org wrote: Not sure, but one possible candidate problem is that in your simulations one iteration ended up with fewer levels of a factor than the overall dataset and that caused the error. There is no recode function in the default packages, there are at least 6 recode functions in other packages, we cannot tell which you were using from the code below. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- project.org] On Behalf Of Mike Harwood Sent: Monday, January 10, 2011 6:29 AM To: r-h...@r-project.org Subject: [R] debug biglm response error on bigglm model G'morning What does the error message Error in x %*% coef(object) : non- conformable arguments indicate when calculating the response values for newdata with a model from bigglm (in package biglm), and how can I debug it? I am attempting to do Monte Carlo simulations, which may explain the loop in the code that follows. After the code I have included the output, which shows that the simulations are changing the response and input values, and that there are not any atypical values for the factors in the seventh iteration. At the end of the output is the aforementioned error message. Finally, I have included the model from biglm. Thanks in advance! Code: === iter - nrow(nov.2010) predict.nov.2011 - vector(mode='numeric', length=iter) for (i in 1:iter) { iter.df - nov.2010 ##-- Update values of dynamic variables -- iter.df$age - iter.df$age + 12 iter.df$pct_utilize - iter.df$pct_utilize + mc.util.delta[i] iter.df$updated_varname1 - ceiling(iter.df$updated_varname1 + mc.varname1.delta[i]) if(iter.df$state==WI) iter.df$varname3 - iter.df$varname3 + mc.wi.varname3.delta[i] if(iter.df$state==MN) iter.df$varname3 - iter.df$varname3 + mc.mn.varname3.delta[i] if(iter.df$state==IL) iter.df$varname3 - iter.df$varname3 + mc.il.varname3.delta[i] if(iter.df$state==US) iter.df$varname3 - iter.df$varname3 + mc.us.varname3.delta[i] ##--- Bin Variables -- iter.df$bin_varname1 - as.factor(recode(iter.df$updated_varname1, 300:499 = '300 - 499'; 500:549 = '500 - 549'; 550:599 = '550 - 599'; 600:649 = '600 - 649'; 650:699 = '650 - 699'; 700:749 = '700 - 749'; 750:799 = '750 - 799'; 800:849 = 'GE 800'; else = 'missing'; )) iter.df$bin_age - as.factor(recode(iter.df$age, 0:23 = ' 24mo.'; 24:72 = '24 - 72mo.'; 72:300 = '72 - 300mo'; else = 'missing'; )) iter.df$bin_util - as.factor(recode(iter.df$pct_utilize, 0.0:0.2 = ' 0 - 20%'; 0.2:0.4 = ' 20 - 40%'; 0.4:0.6 = ' 40 - 60%'; 0.6:0.8 = ' 60 - 80%'; 0.8:1.0 = ' 80 - 100%'; 1.0:1.2 = '100 - 120%'; else = 'missing'; )) iter.df$bin_varname2 - as.factor(recode(iter.df$varname2_prop, 0:70 = ' 70%'; 70:85 = ' 70 - 85%'; 85:95 = ' 85 - 95%'; 95:110 = '95 - 110%'; else = 'missing'; )) iter.df$bin_varname1 - relevel(iter.df$bin_varname1, 'missing') iter.df$bin_age - relevel(iter.df$bin_age, 'missing') iter.df$bin_util - relevel(iter.df$bin_util, 'missing') iter.df$bin_varname2 - relevel(iter.df$bin_varname2, 'missing') #~ print(head(iter.df)) if (i=6 i=8){ print('-') browser() print(i) print(table(iter.df$bin_varname1)) print(table(iter.df$bin_age)) print(table(iter.df$bin_util)) print(table(iter.df$bin_varname2)) #~ debug(predict.nov.2011[i] - #~ sum(predict(logModel.1, newdata=iter.df, type='response'))) } predict.nov.2011[i] - sum(predict(logModel.1, newdata=iter.df, type='response')) print(predict.nov.2011[i]) } Output == [1] 36.56073 [1] 561.4516 [1] 4.83483 [1] 5.01398 [1] 7.984146 [1] - Called from: top level Browse[1] [1] 6 missing 300 - 499 500 - 549 550 - 599 600 - 649 650 - 699 700 - 749 750 - 799 GE 800 842 283 690 1094 1695 3404 6659 18374 21562 missing 24mo. 24 - 72mo. 72 - 300mo 16 2997 19709 31881 missing 0 - 20% 20 - 40% 40 - 60% 60 - 80% 80 - 100% 100 - 120% 17906
[R] debug biglm response error on bigglm model
G'morning What does the error message Error in x %*% coef(object) : non- conformable arguments indicate when calculating the response values for newdata with a model from bigglm (in package biglm), and how can I debug it? I am attempting to do Monte Carlo simulations, which may explain the loop in the code that follows. After the code I have included the output, which shows that the simulations are changing the response and input values, and that there are not any atypical values for the factors in the seventh iteration. At the end of the output is the aforementioned error message. Finally, I have included the model from biglm. Thanks in advance! Code: === iter - nrow(nov.2010) predict.nov.2011 - vector(mode='numeric', length=iter) for (i in 1:iter) { iter.df - nov.2010 ##-- Update values of dynamic variables -- iter.df$age - iter.df$age + 12 iter.df$pct_utilize - iter.df$pct_utilize + mc.util.delta[i] iter.df$updated_varname1 - ceiling(iter.df$updated_varname1 + mc.varname1.delta[i]) if(iter.df$state==WI) iter.df$varname3 - iter.df$varname3 + mc.wi.varname3.delta[i] if(iter.df$state==MN) iter.df$varname3 - iter.df$varname3 + mc.mn.varname3.delta[i] if(iter.df$state==IL) iter.df$varname3 - iter.df$varname3 + mc.il.varname3.delta[i] if(iter.df$state==US) iter.df$varname3 - iter.df$varname3 + mc.us.varname3.delta[i] ##--- Bin Variables -- iter.df$bin_varname1 - as.factor(recode(iter.df$updated_varname1, 300:499 = '300 - 499'; 500:549 = '500 - 549'; 550:599 = '550 - 599'; 600:649 = '600 - 649'; 650:699 = '650 - 699'; 700:749 = '700 - 749'; 750:799 = '750 - 799'; 800:849 = 'GE 800'; else= 'missing'; )) iter.df$bin_age - as.factor(recode(iter.df$age, 0:23 = ' 24mo.'; 24:72 = '24 - 72mo.'; 72:300 = '72 - 300mo'; else = 'missing'; )) iter.df$bin_util - as.factor(recode(iter.df$pct_utilize, 0.0:0.2 = ' 0 - 20%'; 0.2:0.4 = ' 20 - 40%'; 0.4:0.6 = ' 40 - 60%'; 0.6:0.8 = ' 60 - 80%'; 0.8:1.0 = ' 80 - 100%'; 1.0:1.2 = '100 - 120%'; else= 'missing'; )) iter.df$bin_varname2 - as.factor(recode(iter.df$varname2_prop, 0:70 = ' 70%'; 70:85 = ' 70 - 85%'; 85:95 = ' 85 - 95%'; 95:110 = '95 - 110%'; else = 'missing'; )) iter.df$bin_varname1 - relevel(iter.df$bin_varname1, 'missing') iter.df$bin_age - relevel(iter.df$bin_age, 'missing') iter.df$bin_util - relevel(iter.df$bin_util, 'missing') iter.df$bin_varname2 - relevel(iter.df$bin_varname2, 'missing') #~ print(head(iter.df)) if (i=6 i=8){ print('-') browser() print(i) print(table(iter.df$bin_varname1)) print(table(iter.df$bin_age)) print(table(iter.df$bin_util)) print(table(iter.df$bin_varname2)) #~ debug(predict.nov.2011[i] - #~ sum(predict(logModel.1, newdata=iter.df, type='response'))) } predict.nov.2011[i] - sum(predict(logModel.1, newdata=iter.df, type='response')) print(predict.nov.2011[i]) } Output == [1] 36.56073 [1] 561.4516 [1] 4.83483 [1] 5.01398 [1] 7.984146 [1] - Called from: top level Browse[1] [1] 6 missing 300 - 499 500 - 549 550 - 599 600 - 649 650 - 699 700 - 749 750 - 799GE 800 842 283 690 1094 1695 3404 6659 18374 21562 missing 24mo. 24 - 72mo. 72 - 300mo 16 2997 19709 31881 missing0 - 20% 20 - 40% 40 - 60% 60 - 80% 80 - 100% 100 - 120% 17906 4832 4599 5154 7205 14865 42 missing 70% 70 - 85% 85 - 95% 95 - 110% 10423 19429 10568 8350 5833 [1] 11.04090 [1] - Called from: top level Browse[1] [1] 7 missing 300 - 499 500 - 549 550 - 599 600 - 649 650 - 699 700 - 749 750 - 799 847 909 1059 1586 3214 6304 16349 24335 missing 24mo. 24 - 72mo. 72 - 300mo 16 2997 19709 31881 missing0 - 20% 20 - 40% 40 - 60% 60 - 80% 80 - 100% 100 - 120% 17145 4972 4617 5020 6634 16139 76 missing 70% 70 - 85% 85 - 95% 95 - 110% 10423 19429 10568 8350 5833 Error in x %*% coef(object) : non-conformable arguments Model === Large data regression model: bigglm(outcome ~ bin_varname1 + bin_varname2 + bin_age + bin_util + state + varname3 + varname3:state, family = binomial(link = logit), data = dev.data, maxit = 75, sandwich = FALSE) Sample size = 1372250 __ R-help@r-project.org mailing list
[R] Error in x %*% coef(object) : non-conformable arguments
Hello, and thanks in advance! What does the error message Error in x %*% coef(object) : non- conformable arguments when indicate when predicting values for newdata with a model from bigglm (in package biglm), and how can I debug it? I am attempting to do Monte Carlo simulations, which may explain the somewhat interesting loop which follows. After the code I have included the output, which shows that a) the simulations are changing the values, and b) there are not any atypical values for the factors in the seventh iteration. At the end of the output is the aforementioned error message. Code: === iter - nrow(nov.2010) predict.nov.2011 - vector(mode='numeric', length=iter) for (i in 1:iter) { iter.df - nov.2010 ##-- Update values of dynamic variables -- iter.df$age - iter.df$age + 12 iter.df$pct_utilize - iter.df$pct_utilize + mc.util.delta[i] iter.df$updated_varname1 - ceiling(iter.df$updated_varname1 + mc.varname1.delta[i]) if(iter.df$state==WI) iter.df$varname3 - iter.df$varname3 + mc.wi.varname3.delta[i] if(iter.df$state==MN) iter.df$varname3 - iter.df$varname3 + mc.mn.varname3.delta[i] if(iter.df$state==IL) iter.df$varname3 - iter.df$varname3 + mc.il.varname3.delta[i] if(iter.df$state==US) iter.df$varname3 - iter.df$varname3 + mc.us.varname3.delta[i] ##--- Bin Variables -- iter.df$bin_varname1 - as.factor(recode(iter.df$updated_varname1, 300:499 = '300 - 499'; 500:549 = '500 - 549'; 550:599 = '550 - 599'; 600:649 = '600 - 649'; 650:699 = '650 - 699'; 700:749 = '700 - 749'; 750:799 = '750 - 799'; 800:849 = 'GE 800'; else= 'missing'; )) iter.df$bin_age - as.factor(recode(iter.df$age, 0:23 = ' 24mo.'; 24:72 = '24 - 72mo.'; 72:300 = '72 - 300mo'; else = 'missing'; )) iter.df$bin_util - as.factor(recode(iter.df$pct_utilize, 0.0:0.2 = ' 0 - 20%'; 0.2:0.4 = ' 20 - 40%'; 0.4:0.6 = ' 40 - 60%'; 0.6:0.8 = ' 60 - 80%'; 0.8:1.0 = ' 80 - 100%'; 1.0:1.2 = '100 - 120%'; else= 'missing'; )) iter.df$bin_varname2 - as.factor(recode(iter.df$varname2_prop, 0:70 = ' 70%'; 70:85 = ' 70 - 85%'; 85:95 = ' 85 - 95%'; 95:110 = '95 - 110%'; else = 'missing'; )) iter.df$bin_varname1 - relevel(iter.df$bin_varname1, 'missing') iter.df$bin_age - relevel(iter.df$bin_age, 'missing') iter.df$bin_util - relevel(iter.df$bin_util, 'missing') iter.df$bin_varname2 - relevel(iter.df$bin_varname2, 'missing') #~ print(head(iter.df)) if (i=6 i=8){ print('-') browser() print(i) print(table(iter.df$bin_varname1)) print(table(iter.df$bin_age)) print(table(iter.df$bin_util)) print(table(iter.df$bin_varname2)) #~ debug(predict.nov.2011[i] - #~ sum(predict(logModel.1, newdata=iter.df, type='response'))) } predict.nov.2011[i] - sum(predict(logModel.1, newdata=iter.df, type='response')) print(predict.nov.2011[i]) } Output == [1] 36.56073 [1] 561.4516 [1] 4.83483 [1] 5.01398 [1] 7.984146 [1] - Called from: top level Browse[1] [1] 6 missing 300 - 499 500 - 549 550 - 599 600 - 649 650 - 699 700 - 749 750 - 799GE 800 842 283 690 1094 1695 3404 6659 18374 21562 missing 24mo. 24 - 72mo. 72 - 300mo 16 2997 19709 31881 missing0 - 20% 20 - 40% 40 - 60% 60 - 80% 80 - 100% 100 - 120% 17906 4832 4599 5154 7205 14865 42 missing 70% 70 - 85% 85 - 95% 95 - 110% 10423 19429 10568 8350 5833 [1] 11.04090 [1] - Called from: top level Browse[1] [1] 7 missing 300 - 499 500 - 549 550 - 599 600 - 649 650 - 699 700 - 749 750 - 799 847 909 1059 1586 3214 6304 16349 24335 missing 24mo. 24 - 72mo. 72 - 300mo 16 2997 19709 31881 missing0 - 20% 20 - 40% 40 - 60% 60 - 80% 80 - 100% 100 - 120% 17145 4972 4617 5020 6634 16139 76 missing 70% 70 - 85% 85 - 95% 95 - 110% 10423 19429 10568 8350 5833 Error in x %*% coef(object) : non-conformable arguments Model === Large data regression model: bigglm(outcome ~ bin_varname1 + bin_varname2 + bin_age + bin_util + state + varname3 + varname3:state, family = binomial(link = logit), data = dev.data, maxit = 75, sandwich = FALSE) Sample size = 1372250 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the
[R] Set axis limits in mixtools plot
Hello, Can the x and y axis limits be specified in a density plot with the mixtools package for a finite mixture model? Uncommenting the xlim2/ ylim2 lines in the plot command below generates 'not a graphical parameter' warnings (and does not change the axis settings), and uncommenting the xlim/ylim lines generates a 'formal argument ylim matched by multiple actual arguments' error. I want to use a consistent set of axis setting for four different sets of data. Thanks! test.data - c(0.0289,0.0342,0.0379,0.0405,0.0421,0.0429,0.0430,0.0426,0.0423 , 0.0425,0.0435,0.0451,0.0466,0.0477,0.0480,0.0477,0.0473,0.0473,0.0479 , 0.0492,0.0507,0.0519,0.0526,0.0523,0.0507,0.0482,0.0452,0.0428,0.0414 , 0.0409,0.0404,0.0388,0.0358,0.0319,0.0283,0.0263,0.0269,0.0298,0.0339 , 0.0378,0.0404,0.0417,0.0425,0.0436,0.0457,0.0489,0.0522,0.0544,0.0546 , 0.0529,0.0501,0.0474,0.0457,0.0454,0.0460,0.0467,0.0466,0.0450,0.0423 , 0.0395,0.0378,0.0377,0.0388,0.0401,0.0404,0.0394,0.0378,0.0366,0.0371 , 0.0395,0.0433,0.0474,0.0512,0.0546,0.0581,0.0622,0.0673,0.0729,0.0782 , 0.0824,0.0854,0.0877,0.0902,0.0934,0.0973,0.1009,0.1032,0.1038,0.1029 , 0.1017,0.1017,0.1034,0.1065,0.1099,0.1126,0.1137,0.1133,0.1123,0.1118 , 0.1124,0.1140,0.1159,0.1175,0.1182,0.1179,0.1170,0.1160,0.1153,0.1153 , 0.1159,0.1164,0.1161,0.1146,0.1118,0.1080,0.1038,0.1002,0.0976,0.0961 , 0.0954,0.0948,0.0940,0.0929,0.0920,0.0916,0.0921,0.0934,0.0951,0.0964 , 0.0964,0.0951,0.0925,0.0894,0.0864,0.0841,0.0827,0.0821,0.0819,0.0817 , 0.0814,0.0808,0.0797,0.0783,0.0768,0.0756,0.0749,0.0749,0.0752,0.0752 , 0.0743,0.0724,0.0694,0.0662,0.0635,0.0616,0.0607,0.0603,0.0600,0.0592 , 0.0580,0.0567,0.0553,0.0539,0.0522,0.0499,0.0470,0.0435,0.0394,0.0346 , 0.0290,0.0227,0.0160,0.0098,0.0045,0.0009,-0.0012,-0.0021,-0.0025,-0.0029 ,-0.0038,-0.0052,-0.0074,-0.0103,-0.0141,-0.0189,-0.0246,-0.0308,-0.0370 ,-0.0432,-0.0495,-0.0561,-0.0627,-0.0681,-0.0716,-0.0732,-0.0736,-0.0740 ,-0.0750,-0.0760,-0.0758,-0.0732,-0.0677,-0.0603,-0.0531,-0.0481,-0.0459 ,-0.0448,-0.0421,-0.0364,-0.0285,-0.0219,-0.0202,-0.0245,-0.0321,-0.0386 ,-0.0399,-0.0352,-0.0272,-0.0206,-0.0187,-0.0220,-0.0280,-0.0330 ) library(mixtools) fit - normalmixEM(test.data) plot(fit ,whichplots=2 #,xlim2=c(-0.1, 0.15) #,ylim2=c(0, 20) #,xlim=c(-0.1, 0.15) #,ylim=c(0, 20) ) grid() __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Reconcile Random Samples
Is there a way to generate identical random samples using R's runif function and SAS's ranuni function? I have assigned the same seed values in both software packages, but the following results show different results. Thanks! R === set.seed(6) random - runif(10) random [1] 0.6062683 0.9376420 0.2643521 0.3800939 0.8074834 0.9780757 0.9579337 [8] 0.7627319 0.5096485 0.0644768 SAS (the log file) === 15 data _null_; 16 do i=1 to 10; 17 random = ranuni(6); 18 put i= random=; 19 end; 20 run; i=1 random=0.1097754189 i=2 random=0.8205322939 i=3 random=0.3989458365 i=4 random=0.5563918723 i=5 random=0.5296154672 i=6 random=0.8156640985 i=7 random=0.2578750389 i=8 random=0.1901503369 i=9 random=0.2987641572 i=10 random=0.3993993096 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] pairs and mfrow
Is there an alternative to par(mfrow=c(2,1)) to get stacked scatterplot matrixes generated with pairs? I am using version 2.11.1 on Windows XP. The logic I am using follows, and the second pairs plot replaces the first plot in the current graphics device, which is not what I expected (or desired). par(mfrow=c(2,1)) pairs(b2007, main=6/2000 - 12/2006) pairs(a2007, main=1/2007 - 06/2009) Thanks in advance! Mike [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Assign Name of Data Frame
Marc, Thank you for your help. I had tried assign, but I did not enclose the table name in quotes in my function call. I needed to see what someone else had written before I ever would have noticed it! On Fri, Feb 12, 2010 at 10:00 AM, Marc Schwartz marc_schwa...@me.comwrote: On Feb 12, 2010, at 8:19 AM, mah wrote: Hello R Experts, How can I assign the name of a data frame with the argument of a function? Specifically I am using RODBC to build local dataframes from SAS datasets on a remote server. I would like the local dataframe have the same name as the source SAS dataset, and the function below is what I am developing. However, the substitute(table) on the left side of the assignment generates the error Error in substitute(table) - sqlQuery(sears, sql.stmt) : could not find function substitute-. Thanks in advance MakeDF - function(table) # # Function makes dataframe from UNIX SAS datasets # { st.time - Sys.time() print(substitute(table)) sql.stmt - paste(select * from swprod., substitute(table), sep=) print(sql.stmt) substitute(table) - sqlQuery(sears, sql.stmt) # deparse(substitute(table)) - sqlQuery(sears, sql.stmt) end.time print(end.time - st.time) } MakeDF(sku_series) My recommendation would be something like this: MakeDF - function(table) { DF - sqlQuery(channel, paste(select * from swprod., table, sep = )) assign(table, DF, envir = parent.frame()) } Then use would be: MakeDF(sku_series) The result would be a data frame called 'sku_series' in the calling environment. You could substitute globalenv() for parent.frame() if you wanted to create the data frame in the global environment. See ?assign HTH, Marc Schwartz [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.