Re: [R] complex contrasts and logistic regression
Hi, Sorry to take so long to reply, I was travelling last week. Thanks for your suggestions. Actually in this case contrast and predict gave the same result, and what I was looking at was the correct odds from the model. What is still confusing me is the 1st part of my question, looking for a trend in odds ratios. From what I understand testing the interaction: fit1-glmD(survived ~ as.numeric(Covariate)+Therapy + confounder,myDat,X=TRUE, Y=TRUE, family=binomial()) fit2-glmD(survived ~ as.numeric(Covariate)*Therapy + confounder,myDat,X=TRUE, Y=TRUE, family=binomial()) lrtest(fit1,fit2) Would be effectively testing for a trend in odds ratios? Do I have to fiddle with contrasts to make sure I am testing the correct parameter? Thanks Nicholas On Sat, 16 Jun 2007 11:14:12 -0500, Frank E Harrell Jr [EMAIL PROTECTED] said: Nicholas Lewin-Koh wrote: Hi, I am doing a retrospective analysis on a cohort from a designed trial, and I am fitting the model fit-glmD(survived ~ Covariate*Therapy + confounder,myDat,X=TRUE, Y=TRUE, family=binomial()) For logistic regression you can also use Design's lrm function which gives you more options. My covariate has three levels (A,B and C) and therapy has two (treated and control), confounder is a continuous variable. Also patients were randomized to treatment in the trial, but Covariate is something that is measured posthoc and can vary in the population. If by posthoc you mean that the covariate is measured after baseline, it is difficult to get an interpretable analysis. I am trying to wrap my head around how to calculate a few quantities from the model and get reasonable confidence intervals for them, namely I would like to test H0: gamma=0, where gamma is the regression coefficient of the odds ratios of surviving under treatment vs control at each level of Covariate (adjusted for the confounder) You mean regression coefficient on the log odds ratio scale. This is easy to do with the contrast( ) function in Design. Do ?contrast.Design for details and examples. and I would like to get the odds of surviving at each level of Covariate under treatment and control for each level of covariate adjusted for the confounder. I have looked at contrast in the Design library but I don't think it gives me the right quantity, for instance contrast(fit,list(covariate=A, Therapy=Treated, confounder=median(myDat$confounder), X=TRUE) ( A is the baseline level of Covariate) gives me beta0 + beta_Treated + beta_confounder*68 Is this correctly interpreted as the conditional odds of dying? As to the 1st contrast I am not sure how to get it, would it be using type = 'average' with some weights in contrast? The answers are probably staring me in the face, i am just not seeing them today. contrast( ) is for contrasts (differences). Sounds like you want predicted values. Do ?predict ?predict.lrm ?predict.Design. Also do ?gendata which will generate a data frame for getting predictors, with unspecified predictors set to reference values such as medians. Frank Nicholas -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] complex contrasts and logistic regression
Hi, I am doing a retrospective analysis on a cohort from a designed trial, and I am fitting the model fit-glmD(survived ~ Covariate*Therapy + confounder,myDat,X=TRUE, Y=TRUE, family=binomial()) My covariate has three levels (A,B and C) and therapy has two (treated and control), confounder is a continuous variable. Also patients were randomized to treatment in the trial, but Covariate is something that is measured posthoc and can vary in the population. I am trying to wrap my head around how to calculate a few quantities from the model and get reasonable confidence intervals for them, namely I would like to test H0: gamma=0, where gamma is the regression coefficient of the odds ratios of surviving under treatment vs control at each level of Covariate (adjusted for the confounder) and I would like to get the odds of surviving at each level of Covariate under treatment and control for each level of covariate adjusted for the confounder. I have looked at contrast in the Design library but I don't think it gives me the right quantity, for instance contrast(fit,list(covariate=A, Therapy=Treated, confounder=median(myDat$confounder), X=TRUE) ( A is the baseline level of Covariate) gives me beta0 + beta_Treated + beta_confounder*68 Is this correctly interpreted as the conditional odds of dying? As to the 1st contrast I am not sure how to get it, would it be using type = 'average' with some weights in contrast? The answers are probably staring me in the face, i am just not seeing them today. Nicholas __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R vs. Splus in Pharma/Devices Industry
Hi, I just saw this thread. This issue, and the larger scale issue of open source in industry is being addressed. One has to realize that the behemoth that is the clinical aperatus of the pharma industry is very conservative and very slow to change. In many cases switching to R would meean changing a great many processes all based on legacy code. One of the big issues is that the industry demands consistency not necessarily correctness. All that said there are a great many areas where R could be used that would not impact regulatory submission, data security etc. Many in pharma are quietly working on this, but steps are small and incremental. Development takes time in industry because of the amount of documentation necessary. All this impacts the free nature of R and cost and risk (From the industries perspective) need to be justified. Probably when the statistical community is using Z big pharma will be ready to use R. %P Nicholas __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] clustering: Multivariate t mixtures
Hi, Before I write code to do it does anyone know of code for fitting mixtures of multivariate-t distributions. I can't use McLachan's EMMIX code because the license is For non commercial use only. I checked, mclust and flexmix but both only do Gaussian. Thanks Nicholas __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] clustering: Multivariate t mixtures
Hi, Actually that was my plan was to implement a new flexmix class. Thanks for the pointer to the jss paper, that will be helpful. Nicholas On Thu, 8 Sep 2005 23:07:13 +0200, Achim Zeileis [EMAIL PROTECTED] said: On Thu, 08 Sep 2005 15:38:55 -0500 Nicholas Lewin-Koh wrote: Hi, Before I write code to do it does anyone know of code for fitting mixtures of multivariate-t distributions. I can't use McLachan's EMMIX code because the license is For non commercial use only. I checked, mclust and flexmix but both only do Gaussian. The Gaussian case is available in a pre-packaged function FLXmclust(), but the flexmix framework is not limited to that case. There is a paper which appeared in the Journal of Statistical Software (http://www.jstatsoft.org/) that explains how to write new M-steps for flexmix. It is also contained in the package as vignette(flexmix-intro) Best, Z Thanks Nicholas __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] tcltk, X11 protocol error: Bug?
Hi, I am having trouble debugging this one. The code is attached below, but it seems to be a problem at the C-tk interface. If I run this 1 time there are no problems if I run it more than once I start to get warnings that increase in multiples of 11 everytime I run it. Here is a sample session source(clrramp2.r) Loading required package: tcltk Loading Tcl/Tk interface ... done clrRamp() tt-clrRamp() tt function (n) { x - ramp(seq(0, 1, length = n)) rgb(x[, 1], x[, 2], x[, 3], max = 255) } environment: 0x8b8674c image(matrix(1:10),col=tt(10)) tt-clrRamp() There were 22 warnings (use warnings() to see them) image(matrix(1:10),col=tt(10)) There were 11 warnings (use warnings() to see them) warnings() Warning messages: 1: X11 protocol error: BadWindow (invalid Window parameter) 2: X11 protocol error: BadWindow (invalid Window parameter) 3: X11 protocol error: BadWindow (invalid Window parameter) 4: X11 protocol error: BadWindow (invalid Window parameter) 5: X11 protocol error: BadWindow (invalid Window parameter) 6: X11 protocol error: BadWindow (invalid Window parameter) 7: X11 protocol error: BadWindow (invalid Window parameter) 8: X11 protocol error: BadWindow (invalid Window parameter) 9: X11 protocol error: BadWindow (invalid Window parameter) 10: X11 protocol error: BadWindow (invalid Window parameter) 11: X11 protocol error: BadWindow (invalid Window parameter) I am running R-2.1.1 on ubuntu linux 5.04, compiled from source (not the deb package). My version of tcl/tk is 8.4. The code is below. If anyone sees something I am doing foolish let me know, otherwise I will file a bug report. Nicholas # File clrramp2.r ## require(tcltk) clrRamp - function(n.col, b.color=NULL,e.color=NULL){ B.ChangeColor - function() { b.color - tclvalue(tkcmd(tk_chooseColor,initialcolor=e.color, title=Choose a color)) if (nchar(b.color)0){ tkconfigure(canvas.b,bg=b.color) Rmp.Draw() } } E.ChangeColor - function() { e.color - tclvalue(tkcmd(tk_chooseColor,initialcolor=e.color, title=Choose a color)) ##cat(e.color) if (nchar(e.color)0){ tkconfigure(canvas.e,bg=e.color) Rmp.Draw() } } Rmp.Draw -function(){ cr-colorRampPalette(c(b.color,e.color),space=Lab,interpolate=spline) rmpcol - cr(n.col) #rmpcol-rgb( rmpcol[,1],rmpcol[,2],rmpcol[,3]) inc - 300/n.col xl - 0 for(i in 1:n.col){ ##item - tkitemconfigure(canvas.r,barlst[[i]],fill=rmpcol[i],outline=rmpcol[i]) #xl - xl+inc } } save.ramp - function(){ cr-colorRampPalette(c(b.color,e.color),space=Lab,interpolate=spline) tkdestroy(tt) ##invisible(cr) } tt - tktoplevel() tkwm.title(tt,Color Ramp Tool) frame - tkframe(tt) bframe - tkframe(frame,relief=groove,borderwidth=1) if(is.null(b.color)) b.color - blue if(is.null(e.color)) e.color - yellow if(missing(n.col)) n.col - 100 canvas.b - tkcanvas(bframe,width=50,height=25,bg=b.color) canvas.e - tkcanvas(bframe,width=50,height=25,bg=e.color) canvas.r - tkcanvas(tt,width=300,height=50,bg=white) BColor.button - tkbutton(bframe,text=Begin Color,command=B.ChangeColor) ##tkgrid(canvas.b,BColor.button) EColor.button - tkbutton(bframe,text=End Color,command=E.ChangeColor) killbutton - tkbutton(bframe,text=Save,command=save.ramp) tkgrid(canvas.b,BColor.button,canvas.e,EColor.button) tkgrid(bframe) tkgrid(frame) tkgrid(canvas.r) tkgrid(killbutton) cr-colorRampPalette(c(b.color,e.color),space=Lab,interpolate=spline) ##rmpcol - hex(mixcolor(alpha,bc,ec,where=LUV)) rmpcol - cr(n.col) inc - 300/n.col xl - 0 #barlst - vector(length=n.col,mode=list) barlst - tclArray() for(i in 1:n.col){ item-tkcreate(canvas.r,rect,xl,0,xl+inc,50, fill=rmpcol[i],outline=rmpcol[i]) ##tkaddtag(canvas.r, point, withtag, item) barlst[[i]]-item xl - xl+inc } tkgrab.set(tt) tkwait.window(tt) ##tkdestroy(tt) invisible(cr) } __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Correct variance for prediction intervals from nlme and gnls objects
Hi, An approximate prediction interval for f(x,b) from Rupert and Carrol (1988) on pg 53 define q^2=g^2((f(x0,b),x0,a) + 1/N f'b(x0,b)^T S^-1 f'b(x0,b) where g is the variance function and f'b is the derivative in terms of b evaluted at x0 and S is S=(1/N) Sum(1,N) f'b(xi,b)f'b(xi,b)^T/g^2((f(xi,b),a) and the interval is defined at f(x0,b)+/- t(N-p,alpha/2) sqrt(sigma^2 q^2) My question is if I were to fit the model fit-gnls(y~SSlogis(x,Asymp,xmid,scal),data=foo, weights=varPower) is attr(fit$parAssign,varBetaFact) the correct quantity to use for S^-1? or is (fit$dim$N/fit$sigma^2)*fit$VarBeta. I looked in Pinero and Bates chapter 7.5 but i could not figure out exactly how varBeta is estimated in gnls. A pointer to a reference or some guidance would be very helpful. Thanks Nicholas PS Sorry for the sloppy notation, email is not latex On Tue, 19 Jul 2005 11:48:18 -0500, Douglas Bates [EMAIL PROTECTED] said: On 7/19/05, Nicholas Lewin-Koh [EMAIL PROTECTED] wrote: Hello, I am writing a set of functions to do prediction and calibration intervals for at least the subset of selfstarting models if not some more general ones. I need to be able to extract the varFunction from a fit object and evaluate it at a predicted point. Are there any examples around? Also in a self start model, say SSlogis, how would I get the gradient at a point? I think that the output of example(SSlogis) should answer that question for you. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Matrix: Help with syntax and comparison with SparseM
Hi, I am writing some basic smoothers in R for cleaning some spectral data. I wanted to see if I could get close to matlab for speed, so I was trying to compare SparseM with Matrix to see which could do the choleski decomposition the fastest. Here is the function using SparseM difsm - function(y, lambda, d){ # Smoothing with a finite difference penalty # y: signal to be smoothed # lambda: smoothing parameter # d: order of differences in penalty (generally 2) # based on code by Paul Eilers 2002 require(SparseM) m - length(y) E - as(m,matrix.diag.csr) D - diff(E,differences=d) B - E + (lambda * t(D)%*%D) z - solve(B,y) z } To do this in Matrix, I am not sure how to get an Identity matrix of dimension m. From looking at the documentation I would think that E-new(cscMatrix, nrow=m, i=integer(m),x=rep(1,m),p=0:(m-1))) Should do what I want, but it fails? m-10 E-new(cscMatrix, nrow=m, i=integer(m),x=rep(1,m),p=0:(m-1)) Error in initialize(value, ...) : Invalid names for slots of class cscMatrix: nrow E-new(cscMatrix, i=integer(m),x=rep(1,m),p=0:(m-1)) Error in validObject(.Object) : Invalid cscMatrix object: last element of slot p must match length of slots i and x Granted I am not very fascile with the S4 classes, so I may be doing something stupid. Also to do the next step there is no diff method for matrices in Matrix, that would be nice :) I guess the last step would be easy z - solve((E + (lambda * crossprod(D))),y) But I can't get the Identity matrix??? Thanks for the help, it is probably obvious, but I am missing it. Nicholas __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] RSPerl: getting test.pl to run
Hi, I am trying to install and get RSPerl running so that we can use it locally in some of our scripts. I have followed the directions in the documentation and fixed some of the problems I found in the archives like the bootstrap problem. I am using R 1.9.0 and RSPerl 0.5-7 Here is the problem: cd $R_HOME/library/RSPerl/examples/ setenv LD_LIBRARY_PATH $R_HOME/bin:$R_HOME/library/RSPerl/libs/ perl -I../share/blib/arch -I../share/blib/lib -I../scripts/ test.pl 1..1 ok 1 perl: relocation error: ../share/blib/arch/auto/R/R.so: undefined symbol: tryEval Since tryEval is one of the functions in RSPerl.so, I assume it is not being linked properly has anyone run into this or know how to fix it? Thanks Nicholas __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] RE: Kernel Density Estimator for 2D Binned Data
Hi James, You can try the hexbin package at www.bioconductor.org. Do the following bin-hexbin(x,y) ## This will give you hexagonal bins of the data binsm-smooth.hexbin(bin) plot(binsm) This is an approximation to what you want. The other way is to use a 2d bspline on the bin center of masses of the hexagons and use the bin counts as weights. Nicholas __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] RE: Savitzky-Golay smoothing -- an R implementation
Hi, Savitzky and Golay were indeed pioneers of local least squares methods. However the SG smoother is hard to implement in practice because of missing values and problems at the boundary. Paul Eilers at Leiden has presented a very nice method for smoothing series based on penalized least squares known as Whittaker smoothing, develeoped in 1923 for life tables. Look at Analytical Chemistry (2003) 75, 3299-3304. Here is an R implementation that requires the SparseM package. The smoothing parameter lambda, controls the amount of smoothing, and good values can be found by cross validation. difsm - function(y, lambda, d){ # Smoothing with a finite difference penalty # y: signal to be smoothed # lambda: smoothing parameter # d: order of differences in penalty (generally 2) # Paul Eilers, 2002, ported from matlab by Nicholas Lewin-Koh require(SparseM) m - length(y) E - as(m,matrix.diag.csr) D - diff(E,differences=d) B - E + (lambda * t(D)%*%D) z - solve(B,y) z } __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Memory explosion, plotting nmle grouped data object
Hi I am using R 1.7.1 on RH linux 9.0 sum(unlist(lapply(ls(),function(x)object.size(get(x)/1024^2 [1] 2.424263 so I am not using much memory (I have a gig of ram on my machine) now in nlme gtest-groupedData(log(X8)~Time|sub,all[,c(names(all)[1:9],X8)],outer=~A*B) object.size(gtest)/1024 [1] 59.98438 plot(gtest,outer=~Dose*chem,key=FALSE,asp=.5) Plotting takes forever and from top while it is running 5663 wf00223 15 0 1151M 546M 1780 R 6.1 54.2 3:19 0 R.bin Any Ideas why R is using so much memory? Thanks Nicholas __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] Re: [Rd]Comparing fp numbers, was Bug in %in% (match)
Hi, Thanks, I should have thought of that. If I do tst-(10*seq(100,125,by=.2))%in%(10*seq(0,800,by=.1)) sum(tst) [1] 122 tst-(seq(100,125,by=.2))%in%(seq(0,800,by=.1)) sum(tst) [1] 76 The problem is corrected. However in my code, where I am comparing much longer sequences it doesn't always work, even with rounding and multiplying to an integer. Is there a more robust way to do this? Here is the function where things are going wrong spectots-function(t,x,begin=min(t,na.rm=TRUE),end=max(t,na.rm=TRUE), by.inter=.5,precis=1){ #browser() j-!is.na(t) k-!((tend) | (tbegin)) j-jk t-t[j] x-x[j] get.mult-function(pre)return(10^pre) new.t-seq(begin,end,by=by.inter) new.x-rep(NA,length(new.t)) #new.n-length(new.t) t.err-rep(NA,length(new.t)) # Here is the matching tst-(get.mult(precis)*new.t)%in%(get.mult(precis)*round(t,precis)) check-sum(tst) if(check!=length(t))warning(not all values matched!) new.x[tst]-x t.err[tst]-t-new.t[tst] x.impute-ifelse(is.na(new.x),.5*min(new.x,na.rm=TRUE),new.x) ts(cbind(x.impute,new.x,t.err),start=begin,end=end,freq=1/by.inter) } On Sat, 2003-04-05 at 03:09, Peter Dalgaard BSA wrote: Nicholas Lewin-Koh [EMAIL PROTECTED] writes: Hi, Am I hitting some limit in match? Consider the following example: tst-seq(100,125,by=.2)%in%seq(0,800,by=.1) sum(tst) [1] 76 Gives the correct answer. Did I miss something? The fact that 1/5 and 1/10 are not represented exactly in a binary computer? sum(tst-seq(100,125,by=.2)%in%seq(0,800,by=.1)) [1] 76 sum(tst-seq(100,125,by=.2)%in%round(seq(0,800,by=.1),1)) [1] 126 And, just to be sure: sum(round(seq(100,125,by=.2),1)%in%round(seq(0,800,by=.1),1)) [1] 126 -- O__ Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] Re: [Rd]Comparing fp numbers, was Bug in %in% (match)
Hi, I am dense some days, the following worked; tst-as.integer(get.mult(precis)*new.t)%in%as.integer(get.mult(precis)*round(t,precis)) I didn't catch this part of the documentation Factors are converted to character vectors, and then `x' and `table' are coerced to a common type (the later of the two types in R's ordering, logical integer numeric complex character) before matching. Nicholas On Mon, 2003-04-07 at 11:10, Nicholas Lewin-Koh wrote: Hi, Thanks, I should have thought of that. If I do tst-(10*seq(100,125,by=.2))%in%(10*seq(0,800,by=.1)) sum(tst) [1] 122 tst-(seq(100,125,by=.2))%in%(seq(0,800,by=.1)) sum(tst) [1] 76 The problem is corrected. However in my code, where I am comparing much longer sequences it doesn't always work, even with rounding and multiplying to an integer. Is there a more robust way to do this? Here is the function where things are going wrong spectots-function(t,x,begin=min(t,na.rm=TRUE),end=max(t,na.rm=TRUE), by.inter=.5,precis=1){ #browser() j-!is.na(t) k-!((tend) | (tbegin)) j-jk t-t[j] x-x[j] get.mult-function(pre)return(10^pre) new.t-seq(begin,end,by=by.inter) new.x-rep(NA,length(new.t)) #new.n-length(new.t) t.err-rep(NA,length(new.t)) # Here is the matching tst-(get.mult(precis)*new.t)%in%(get.mult(precis)*round(t,precis)) check-sum(tst) if(check!=length(t))warning(not all values matched!) new.x[tst]-x t.err[tst]-t-new.t[tst] x.impute-ifelse(is.na(new.x),.5*min(new.x,na.rm=TRUE),new.x) ts(cbind(x.impute,new.x,t.err),start=begin,end=end,freq=1/by.inter) } On Sat, 2003-04-05 at 03:09, Peter Dalgaard BSA wrote: Nicholas Lewin-Koh [EMAIL PROTECTED] writes: Hi, Am I hitting some limit in match? Consider the following example: tst-seq(100,125,by=.2)%in%seq(0,800,by=.1) sum(tst) [1] 76 Gives the correct answer. Did I miss something? The fact that 1/5 and 1/10 are not represented exactly in a binary computer? sum(tst-seq(100,125,by=.2)%in%seq(0,800,by=.1)) [1] 76 sum(tst-seq(100,125,by=.2)%in%round(seq(0,800,by=.1),1)) [1] 126 And, just to be sure: sum(round(seq(100,125,by=.2),1)%in%round(seq(0,800,by=.1),1)) [1] 126 -- O__ Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help