Re: [R] survfit number of variables != number of variable names
I can't reproduce the problem. Tell us what version of R and what version of the survival package. Create a reproducable example. I don't know if some variables are numeric and some are factors, how/where the surv object was defined, etc. Terry Therneau On 11/17/2012 05:00 AM, r-help-requ...@r-project.org wrote: This works ok: cox = coxph(surv ~ bucket*(today + accor + both) + activity, data = data) fit = survfit(cox, newdata=data[1:100,]) but using strata leads to problems: cox.s = coxph(surv ~ bucket*(today + accor + both) + strata(activity), data = data) fit.s = survfit(cox.s, newdata=data[1:100,]) Error in model.frame.default(data = data[1:100, ], formula = ~bucket + : number of variables != number of variable names Note that the following give rise to the same error: fit.s = survfit(cox.s, newdata=data) Error in model.frame.default(data = data, formula = ~bucket + today + : number of variables != number of variable names but if I use data implicitly, all is working fine: fit.s = survfit(cox.s) Any idea on how I could solve this? Best, and thank you, ge __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] survfit number of variables != number of variable names
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi Terry, I attached a small data set to this email. This is what I get (I restricted the formula to avoid NA's): surv = with(small, Surv(time=absence, event=(censored==FALSE))) (cox.s = coxph(surv ~ bucket*(today) + strata(activity), data = small)) Call: coxph(formula = surv ~ bucket * (today) + strata(activity), data = small) coef exp(coef) se(coef) zp bucket5750.4526 1.5720.740 0.612 0.54 todayTRUE -0.0886 0.9150.676 -0.131 0.90 bucket575:todayTRUE -0.1670 0.8460.794 -0.210 0.83 Likelihood ratio test=2.32 on 3 df, p=0.509 n= 100, number of events= 100 fit = survfit(cox.s, newdata=small[1:50,]) Error in model.frame.default(data = small[1:50, ], formula = ~bucket + : number of variables != number of variable names also: R.version _ platform x86_64-redhat-linux-gnu arch x86_64 os linux-gnu system x86_64, linux-gnu status major 2 minor 15.1 year 2012 month 06 day22 svn rev59600 language R version.string R version 2.15.1 (2012-06-22) nickname Roasted Marshmallows package ‘survival’ is version 2.36-14 and finally, variable absence is numeric, bucket activity are factors and all other variables are logical. I tested the same formula without 'strata' and I had no problem. Best and thank you, ge On 11/19/2012 09:01 AM, Terry Therneau wrote: I can't reproduce the problem. Tell us what version of R and what version of the survival package. Create a reproducable example. I don't know if some variables are numeric and some are factors, how/where the surv object was defined, etc. Terry Therneau On 11/17/2012 05:00 AM, r-help-requ...@r-project.org wrote: This works ok: cox = coxph(surv ~ bucket*(today + accor + both) + activity, data = data) fit = survfit(cox, newdata=data[1:100,]) but using strata leads to problems: cox.s = coxph(surv ~ bucket*(today + accor + both) + strata(activity), data = data) fit.s = survfit(cox.s, newdata=data[1:100,]) Error in model.frame.default(data = data[1:100, ], formula = ~bucket + : number of variables != number of variable names Note that the following give rise to the same error: fit.s = survfit(cox.s, newdata=data) Error in model.frame.default(data = data, formula = ~bucket + today + : number of variables != number of variable names but if I use data implicitly, all is working fine: fit.s = survfit(cox.s) Any idea on how I could solve this? Best, and thank you, ge -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.12 (GNU/Linux) Comment: Using GnuPG with undefined - http://www.enigmail.net/ iQEcBAEBAgAGBQJQqoIKAAoJEDf4/Woixvcht1gH/iHxE1liaML5j/8ruEfXX85P vNeQaZHDVZnrYDCbBPxgO/SlpIUmDatUOO9vhG1vBjnalMnftHJqBJCLz8lFswNy z2CepUe2HoX/CcKI5QVlPfXvYzWHBHXbKwYmq9dI+WpNZg0qbyeP3n4ac4ZBsNN+ uzT7gjacA60/zfVOf7D+Rdno+W15Xd8ySrHZU3naPutGN7mGWdgVUlP2wwudad19 2HlTVun40OYLV9TWLJsstYgtead4PamDXvCYrQWZeC29CQesOJ0KzUpojAYWtOpb jZkeh3F+7xKIa4DuBsGQBnIvf8b+vguvSPpVfkrjLCD/6jtVDyyslp6vEISyikw= =+M15 -END PGP SIGNATURE- small.csv.gz.sig Description: PGP signature __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] survfit number of variables != number of variable names
Hi - I've seen a similar issue going on with survfit when using strata in the model, although I get a different error message from ge. If it helps to track down the problem (rather than confusing things further) here is some code that should reproduce the issue I've seen. I'm running R 2.15.2, with survival version 2.36-14 . Cheers, Carina ## library(survival) # use aml dataset from survival create 3 imaginary possible stratas - strat3 has 3 levels, strat4 has 4 levels, strat5 has 5 levels aml$strat3-as.factor(rep(c(1:3),length=nrow(aml))) aml$strat4-as.factor(rep(c(1:4),length=nrow(aml))) aml$strat5-as.factor(rep(c(1:5),length=nrow(aml))) # create a counting process format dataset from aml - call this aml2 aml2-survSplit(aml,cut=c(5,10,50),end=time,start=start,event=status,episode=i,id=indiv) # create a dataset of 4 'new' observations - call this aml.new aml.new-aml[1:4,] aml.new$time-c(30,50,70,100) aml.new$status-1 aml.new$x[1:4]-c(rep(Maintained,2),rep(Nonmaintained,2)) aml.new$strat3[1:4]-1 aml.new$strat4[1:4]-1 aml.new$strat5[1:4]-1 # create a counting process format dataset from aml.new - call this aml2.new aml2.new-survSplit(aml.new,cut=c(5,10,50),end=time,start=start,event=status,episode=i,id=indiv) # First a model using no strata - survfit works fine on new dataset myModel-coxph(Surv(start, time, status) ~ x,data = aml2) plot(survfit(myModel,aml2.new,id=indiv)) # Now a model using strata = strat4 (which has 4 levels) - survfit again works fine on new dataset (which has 4 new individuals) myModel-coxph(Surv(start, time, status) ~ x+strata(strat4),data = aml2) plot(survfit(myModel,aml2.new,id=indiv)) # Now a model using strata = strat3 (which has 3 levels) - survfit works here too myModel-coxph(Surv(start, time, status) ~ x+strata(strat3),data = aml2) plot(survfit(myModel,aml2.new,id=indiv)) # Now a model using strata = strat5 (which has 5 levels) - survfit now does not work, with error saying # Error in survfitcoxph.fit(y, x, wt, x2, risk, newrisk, strata, se.fit, : # 'names' attribute [5] must be the same length as the vector [4] myModel-coxph(Surv(start, time, status) ~ x+strata(strat5),data = aml2) plot(survfit(myModel,aml2.new,id=indiv)) # Now recreate aml.new but with 3 rather than 4 'new' observations rm(aml.new) aml.new-aml[1:3,] aml.new$time-c(30,50,70) aml.new$status-1 aml.new$x[1:3]-c(rep(Maintained,2),rep(Nonmaintained,1)) aml.new$strat3[1:3]-1 aml.new$strat4[1:3]-1 aml.new$strat5[1:3]-1 # create a counting process format dataset from aml.new aml2.new-survSplit(aml.new,cut=c(5,10,50),end=time,start=start,event=status,episode=i,id=indiv) # Survfit on model using strat3 still works myModel-coxph(Surv(start, time, status) ~ x+strata(strat3),data = aml2) plot(survfit(myModel,aml2.new,id=indiv)) # But Survfit on model using strat4 doesn't work now myModel-coxph(Surv(start, time, status) ~ x+strata(strat4),data = aml2) plot(survfit(myModel,aml2.new,id=indiv)) # Survfit on strat5 model doesn't work either myModel-coxph(Surv(start, time, status) ~ x+strata(strat5),data = aml2) plot(survfit(myModel,aml2.new,id=indiv)) On 19 November 2012 17:01, Terry Therneau thern...@mayo.edu wrote: I can't reproduce the problem. Tell us what version of R and what version of the survival package. Create a reproducable example. I don't know if some variables are numeric and some are factors, how/where the surv object was defined, etc. Terry Therneau On 11/17/2012 05:00 AM, r-help-requ...@r-project.org wrote: This works ok: cox = coxph(surv ~ bucket*(today + accor + both) + activity, data = data) fit = survfit(cox, newdata=data[1:100,]) but using strata leads to problems: cox.s = coxph(surv ~ bucket*(today + accor + both) + strata(activity), data = data) fit.s = survfit(cox.s, newdata=data[1:100,]) Error in model.frame.default(data = data[1:100, ], formula = ~bucket + : number of variables != number of variable names Note that the following give rise to the same error: fit.s = survfit(cox.s, newdata=data) Error in model.frame.default(data = data, formula = ~bucket + today + : number of variables != number of variable names but if I use data implicitly, all is working fine: fit.s = survfit(cox.s) Any idea on how I could solve this? Best, and thank you, ge __** R-help@r-project.org mailing list https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/** posting-guide.html http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and
Re: [R] survfit number of variables != number of variable names
Hi! In answer to: I noticed that you were using what might be called an externally created Surv object. I have a memory that Terry Therneau has criticized that practice. I cannot remember if it was in exactly this situation but I might ask if setting up the model as: cox = coxph(Surv(stime, event) ~ bucket*(today + accor + both) + activity, data = data) ... might give the survival machinery a better handle on where everything might be found. I tried to create the Surv object internally but I face the same issue: (cox.s = coxph(Surv(time=absence, event=(censored==FALSE)) ~ bucket*(today) + strata(activity), data = small)) Call: coxph(formula = Surv(time = absence, event = (censored == FALSE)) ~ bucket * (today) + strata(activity), data = small) coef exp(coef) se(coef) zp bucket5750.4526 1.5720.740 0.612 0.54 todayTRUE -0.0886 0.9150.676 -0.131 0.90 bucket575:todayTRUE -0.1670 0.8460.794 -0.210 0.83 Likelihood ratio test=2.32 on 3 df, p=0.509 n= 100, number of events= 100 fit = survfit(cox.s, newdata=small[1:50,]) Error in model.frame.default(data = small[1:50, ], formula = ~bucket + : number of variables != number of variable names Best, and thank you for the suggestion. ge -- View this message in context: http://r.789695.n4.nabble.com/survfit-number-of-variables-number-of-variable-names-tp4649834p4650080.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] survfit number of variables != number of variable names
On Nov 19, 2012, at 5:33 PM, Georges Dupret wrote: Hi David, Sorry for the signature files... this is automatic. I should disable that. Please find in attachment a copy of small.csv.gz I found it but I suspect nobody else will. I think Terry Therneau already got a copy. when you attached it earlier. But the rest of Rhelp did not, since .gz files will get scrubbed by the list-serv. Best, ge On 11/19/2012 02:37 PM, David Winsemius wrote: On Nov 19, 2012, at 2:23 PM, David Winsemius wrote: On Nov 19, 2012, at 11:07 AM, Georges Dupret wrote: Hi! In answer to: I noticed that you were using what might be called an externally created Surv object. I have a memory that Terry Therneau has criticized that practice. I cannot remember if it was in exactly this situation but I might ask if setting up the model as: cox = coxph(Surv(stime, event) ~ bucket*(today + accor + both) + activity, data = data) ... might give the survival machinery a better handle on where everything might be found. I tried to create the Surv object internally but I face the same issue: (cox.s = coxph(Surv(time=absence, event=(censored==FALSE)) ~ bucket*(today) + strata(activity), data = small)) Call: coxph(formula = Surv(time = absence, event = (censored == FALSE)) ~ bucket * (today) + strata(activity), data = small) All of your 'censored' were FALSE so all of your events were TRUE. My guess is that you are having problems because you end up with different model designs in the different strata: with( small, table(activity, today)) today activity FALSE TRUE (100,121] 1 13 (121,149] 28 (149,196] 04 (196,1.33e+03] 18 (30,42]18 (42,55]4 12 (55,68]29 (68,83]29 (83,100] 26 [11,30]08 I do not think it matters that you levels for the factor variable will not be in the expected order: table(small$activity) (100,121] (121,149] (149,196] (196,1.33e+03](30,42] (42,55](55,68](68,83] 14 10 4 9 9 16 11 11 (83,100][11,30] 8 8 But I do also wonder if the small numbers in each strata might be causing problems. Is it really needed to stratify so finely? -- David. coef exp(coef) se(coef) zp bucket5750.4526 1.5720.740 0.612 0.54 todayTRUE -0.0886 0.9150.676 -0.131 0.90 bucket575:todayTRUE -0.1670 0.8460.794 -0.210 0.83 Likelihood ratio test=2.32 on 3 df, p=0.509 n= 100, number of events= 100 fit = survfit(cox.s, newdata=small[1:50,]) Error in model.frame.default(data = small[1:50, ], formula = ~bucket + : number of variables != number of variable names OK. Thanks for doing that. You might want to know that the only attachment that made it through to the emailing list was a file named small.csv.gz.sig That's not a format that my system knows how to decompress ( I tried downloading GnuPG and compiling it but (hit sent button too soon. ) was unable to figure out how to decompress with GnuPG either. (It's hard to imagine this needed to be encrypted.) small.csv.gz David Winsemius, MD Alameda, CA, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] survfit number of variables != number of variable names
Hi! It seems the data file wasn't transmit. Please find a copy in attachment. Best, ge On 11/19/2012 09:02 AM, Terry Therneau-2 [via R] wrote: I can't reproduce the problem. Tell us what version of R and what version of the survival package. Create a reproducable example. I don't know if some variables are numeric and some are factors, how/where the surv object was defined, etc. Terry Therneau On 11/17/2012 05:00 AM, [hidden email] /user/SendEmail.jtp?type=nodenode=4650064i=0 wrote: This works ok: cox = coxph(surv ~ bucket*(today + accor + both) + activity, data = data) fit = survfit(cox, newdata=data[1:100,]) but using strata leads to problems: cox.s = coxph(surv ~ bucket*(today + accor + both) + strata(activity), data = data) fit.s = survfit(cox.s, newdata=data[1:100,]) Error in model.frame.default(data = data[1:100, ], formula = ~bucket + : number of variables != number of variable names Note that the following give rise to the same error: fit.s = survfit(cox.s, newdata=data) Error in model.frame.default(data = data, formula = ~bucket + today + : number of variables != number of variable names but if I use data implicitly, all is working fine: fit.s = survfit(cox.s) Any idea on how I could solve this? Best, and thank you, ge __ [hidden email] /user/SendEmail.jtp?type=nodenode=4650064i=1 mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. If you reply to this email, your message will be added to the discussion below: http://r.789695.n4.nabble.com/survfit-number-of-variables-number-of-variable-names-tp4649834p4650064.html To unsubscribe from survfit number of variables != number of variable names, click here http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4649834code=Z2Vvcmdlcy5kdXByZXRAeWFob28uZnJ8NDY0OTgzNHwtODc0NjQ3MDY=. NAML http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml small.csv.gz (1K) http://r.789695.n4.nabble.com/attachment/4650122/0/small.csv.gz -- View this message in context: http://r.789695.n4.nabble.com/survfit-number-of-variables-number-of-variable-names-tp4649834p4650122.html Sent from the R help mailing list archive at Nabble.com. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] survfit number of variables != number of variable names
On Nov 16, 2012, at 6:05 PM, Georges Dupret wrote: This works ok: cox = coxph(surv ~ bucket*(today + accor + both) + activity, data = data) fit = survfit(cox, newdata=data[1:100,]) but using strata leads to problems: cox.s = coxph(surv ~ bucket*(today + accor + both) + strata(activity), data = data) fit.s = survfit(cox.s, newdata=data[1:100,]) Error in model.frame.default(data = data[1:100, ], formula = ~bucket + : number of variables != number of variable names Note that the following give rise to the same error: fit.s = survfit(cox.s, newdata=data) Error in model.frame.default(data = data, formula = ~bucket + today + : number of variables != number of variable names but if I use data implicitly, all is working fine: fit.s = survfit(cox.s) Any idea on how I could solve this? I noticed that you were using what might be called an externally created Surv object. I have a memory that Terry Therneau has criticized that practice. I cannot remember if it was in exactly this situation but I might ask if setting up the model as: cox = coxph(Surv(stime, event) ~ bucket*(today + accor + both) + activity, data = data) ... might give the survival machinery a better handle on where everything might be found. -- David. David Winsemius, MD Alameda, CA, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.