[R] Help with confidence intervals for gam model using mgcv
Hi, I would be very grateful for advice on getting confidence intervals for the ordinary (non smoothed) parameter estimates from a gam. Motivation I am studying hospital outcomes in a large data set. The outcomes of interest to me are all binary variables. The one in the example here, Dead30d, is death within 30 days of admission. Sexf is gender (M or F), Age is age in years at the start of the admission. The standard glm is a logistic regression :- glmDead.AS - glm(Dead30d~Sexf+Age, data=HIPE,family=binomial(link=logit)) The corresponding GAM, with a smooth for age, is :- gamDead.AS - gam(Dead30d~Sexf+s(Age), data=HIPE,family=binomial(link=logit)) For my work, age is a nuisance. We already know exactly the effect of age (which has an odd shape). I have no interest in this parameter, nor in CIs for it. The GAM fits notably better than the GLM. The substantive interest is in the effects of the other variables, Sexf, and many more. For the GLM, the confidence intervals are simple matter of confint(glmDead.AS). For my discipline CI's are required, and the profile CI's that confint produces are ideal. There doesn't seem to be an analogous function for mgcv. The advice most commonly given is to use predict.gam with se.fit=TRUE. This does not seem to produce CI's for the non-smoothed parameters, which is what I need to calculate. CIs for the smooth, which are the focus of interest in many other cases are not of interest to me. Any suggestions? Am I missing something very obvious? Best wishes, Anthony Staines -- Anthony Staines, Professor of Health Systems, School of Nursing and Human Sciences, DCU, Dublin 9,Ireland. Tel:- +353 1 700 7807. Mobile:- +353 86 606 9713 http://astaines.eu/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Advice on recoding a variable depending on another which contains NAs
Dear colleagues, I would be very grateful for your help with the following. I have banged my head off this question several times in the past, and repeatedly over the last week. I have looked in the usual places and found no obvious solution. I fear that this just means I didn't recognize it, but I'd be very grateful for your help. I am scoring 8000 psychometric tests - the SCQ, if you have heard of it. On this test the scoring rules depends on one variable SCQ1 - if this is answered yes, the final score is a function of 39 variables, and if no, of 31 variables. I've calculated both of these scores (SCQScore1 and SCQScore2)for all the children in my study, and I wish to create a final score, which is SCQScore1 when SCQ1 is 1, and SCQScore2 when SCQ1 is 2. There are also missing values for SCQ1, and I have chosen, for the moment, to set the final score to SCQScore1 for these. [[This is a debatable choice, but I am not asking your advice on that choice!]] d$SCQScore - 99 ##Distinct value for any other values I've missed d$SCQScore[SCQ1 == 1] - d$SCQScore1[SCQ1 == 1] ## Talks using phrases/sentences, so sum S2CQ:SCQ40 d$SCQScore[SCQ1 == 2] - d$SCQScore2[SCQ1 == 2] ## Can't do this, so sum SCQ8:SCQ40 d$SCQScore[is.na(d$SCQ1)] - d$SCQScore1 [is.na(d$SCQ1)] ## SCQ1 is missing This fails on line 2 (d$SCQScore[SCQ1 == 1] - d$SCQScore1[SCQ1 == 1]) with the error message NAs are not allowed in subscripted assignments, presumably because SCQ1 does indeed contain missing values. This can be fixed, got around, or otherwise bypassed, by creating a new variable SCQ1, with no missing values, as shown :- SCQ1 - d$SCQ1 SCQ1[is.na(SCQ1)] - 3 d$SCQScore[SCQ1 == 1] - d$SCQScore1[SCQ1 == 1] ## Talks using phrases/sentences so sum S2CQ:SCQ40 d$SCQScore[SCQ1 == 2] - d$SCQScore2[SCQ1 == 2] ## Can't do this, so sum SCQ8:SCQ40 d$SCQScore[SCQ1 == 3] - d$SCQScore1[SCQ1 == 3] ## We don't know if he/she can talk, so guess - sum S2:S40 This type of thing is a common problem in my little world. Is there a better/less klutzy/smarter way of solving it than creating a new variable each time? Please bear in mind that it is critical, for later analysis, to keep the missing values in SCQ1. Best wishes, Anthony Staines -- Anthony Staines, Professor of Health Systems, School of Nursing and Human Sciences, DCU, Dublin 9,Ireland. Tel:- +353 1 700 7807. Mobile:- +353 86 606 9713 http://astaines.eu/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Advice on recoding a variable depending on another which contains NAs
Ah, so that's how ifelse gets used... Presumably if I had more than 2 non-missing values in the control variable, I could use it several times. Thank you very much, for a really useful answer, and thanks for getting back so quickly! All the best, Anthony Staines On 11/19/11 23:55, David Winsemius wrote: On Nov 19, 2011, at 6:31 PM, Anthony Staines wrote: This would seem to be an obvious task for ifelse() SCQScore - NA d$SCQScore - ifelse( SCQ1 == 1, d$SCQScore1, d$SCOScore2) (And don't use 99 for missing. Use NA. It will protect you better than 99.) I suppose you could enforce the two level testing with: d$SCQScore - ifelse( SCQ1 == 1, d$SCQScore1, ifelse(SCQ1 ==2, d$SCOScore2, NA)) d$SCQScore - 99 ##Distinct value for any other values I've missed d$SCQScore[SCQ1 == 1] - d$SCQScore1[SCQ1 == 1] ## Talks using phrases/sentences, so sum S2CQ:SCQ40 d$SCQScore[SCQ1 == 2] - d$SCQScore2[SCQ1 == 2] ## Can't do this, so sum SCQ8:SCQ40 d$SCQScore[is.na(d$SCQ1)] - d$SCQScore1 [is.na(d$SCQ1)] ## SCQ1 is missing This fails on line 2 (d$SCQScore[SCQ1 == 1] - d$SCQScore1[SCQ1 == 1]) with the error message NAs are not allowed in subscripted assignments, presumably because SCQ1 does indeed contain missing values. This can be fixed, got around, or otherwise bypassed, by creating a new variable SCQ1, with no missing values, as shown :- SCQ1 - d$SCQ1 SCQ1[is.na(SCQ1)] - 3 d$SCQScore[SCQ1 == 1] - d$SCQScore1[SCQ1 == 1] ## Talks using phrases/sentences so sum S2CQ:SCQ40 d$SCQScore[SCQ1 == 2] - d$SCQScore2[SCQ1 == 2] ## Can't do this, so sum SCQ8:SCQ40 d$SCQScore[SCQ1 == 3] - d$SCQScore1[SCQ1 == 3] ## We don't know if he/she can talk, so guess - sum S2:S40 This type of thing is a common problem in my little world. Is there a better/less klutzy/smarter way of solving it than creating a new variable each time? Please bear in mind that it is critical, for later analysis, to keep the missing values in SCQ1. Best wishes, Anthony Staines -- Anthony Staines, Professor of Health Systems, School of Nursing and Human Sciences, DCU, Dublin 9,Ireland. Tel:- +353 1 700 7807. Mobile:- +353 86 606 9713 http://astaines.eu/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT -- Anthony Staines, Professor of Health Systems, School of Nursing and Human Sciences, DCU, Dublin 9,Ireland. Tel:- +353 1 700 7807. Mobile:- +353 86 606 9713 http://astaines.eu/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Advice on obscuring unique IDs in R
Dear colleagues, This may be a question with a really obvious answer, but I can't find it. I have access to a large file with real medical record identifiers (mixed strings of characters and numbers) in it. These represent medical events for many thousands of people. It's important to be able to link events for the same people. It's much more important that the real record numbers are strongly obscured. I'm interested in some kind of strong one-way hash function to which I can feed the real numbers and get back unique codes for each record identifier fed in. I can do this on the health service system, and I have to do this before making further use of the data! There is the 'digest' function, in the digest package, but this seems to work on the whole vector of IDs, producing, in my case, a vector with 60,000 identical entries. H.Out$P_ID = digest(H.In$MRNr,serialize=FALSE, algo='md5') I could do this in Perl, but I'd have to do quite a bit of work to get it installed. Any quick suggestions? Anthony Staines -- Anthony Staines, Professor of Health Systems Research, School of Nursing, Dublin City University, Dublin 9,Ireland. Tel:- +353 1 700 7807. Mobile:- +353 86 606 9713 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Installing rJava fails on Gentoo (amd64) with Sun JDK - checking JNI data types... error
/bin/javah Java archive tool: /usr/bin/jar Java library path: $(JAVA_HOME)/lib/amd64/server:$(JAVA_HOME)/lib/amd64:$(JAVA_HOME)/../lib/amd64::/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib JNI linker flags : -L$(JAVA_HOME)/lib/amd64/server -L$(JAVA_HOME)/lib/amd64 -L$(JAVA_HOME)/../lib/amd64 -L -L/usr/java/packages/lib/amd64 -L/usr/lib64 -L/lib64 -L/lib -L/usr/lib -ljvm JNI cpp flags: -I$(JAVA_HOME)/../include -I$(JAVA_HOME)/../include/linux Updating Java configuration in /usr/lib64/R Done. AND Kitchen ~ # env JAVA_HOME=/opt/sun-jdk-1.6.0.20/ R CMD INSTALL rJava_0.8-5.tar.gz * installing to library ‘/usr/lib64/R/library’ * installing *source* package ‘rJava’ ... checking for gcc... x86_64-pc-linux-gnu-gcc -std=gnu99 checking for C compiler default output file name... a.out checking whether the C compiler works... yes checking whether we are cross compiling... no checking Java support in R... present: interpreter : '/usr/bin/java' archiver: '/usr/bin/jar' compiler: '/home/astaines/.gentoo/java-config-2/current-user-vm/bin/javac' header prep.: '/usr/bin/javah' cpp flags : '-I/opt/sun-jdk-1.6.0.20/jre/../include -I/opt/sun-jdk-1.6.0.20/jre/../include/linux' java libs : '-L/opt/sun-jdk-1.6.0.20/jre/lib/amd64/server -L/opt/sun-jdk-1.6.0.20/jre/lib/amd64 -L/opt/sun-jdk-1.6.0.20/jre/../lib/amd64 -L -L/usr/java/packages/lib/amd64 -L/usr/lib64 -L/lib64 -L/lib -L/usr/lib -ljvm' checking whether JNI programs can be compiled... yes checking JNI data types... configure: error: One or more JNI types differ from the corresponding native type. You may need to use non-standard compiler flags or a different compiler in order to fix this. ERROR: configuration failed for package ‘rJava’ * removing ‘/usr/lib64/R/library/rJava’ All advice gratefully received -- Anthony Staines, Professor of Health Systems Research, School of Nursing, Dublin City University, Dublin 9,Ireland. Tel:- +353 1 700 7807. Mobile:- +353 86 606 9713 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Passing the name of a variable to a function
Dear colleagues, I have a problem which has bitten me occasionally. I often need to prepare graphs for many variables in a data set, but seldom for all. or for any large number of sequential or sequentially named variables. Often I need several graphs for different subsets of the dataset for a given variable. I run into similar problems with other needs besides graphing. What I would like to do is something like write a function which takes the *name* of a variable, presumably a s a character string, from a dataframe, as one argument, and the dataframe, as a second argument. For example, where y is to be the the name of a variable in a given dataframe d, and the other variables needed, T, M and so on, are to be found in the same dataframe :- pf - function (y,data,...) { p1 - xyplot(y~x|T,data) p2 - xyplot(y~x|T,subset(data,M == 2)) p3 - xyplot(y~x|T,subset(data,M == 4)) #print(p1,p2,p3) } pf(Score1,data) pf(Score2,data) This fails, because, of course, Score 1, Score 2 etc.. are not defined, or if you pass them as pf(data$Score2,data), then when you subset the data, data$Score2 is now the wrong shape. I've come up with various inelegant hacks, (often with for loops), for getting around this over the last few years, but I can't help feeling that I'm missing something obvious, which I've been too dim to spot. Please note the answer to my requirements is *not* a clever use of lattice. This question goes beyond graphs. All suggestions gratefully received! Best wishes, Anthony Staines -- Anthony Staines, Professor of Health Systems Research, School of Nursing, Dublin City University, Dublin 9, Ireland. Tel:- +353 1 7007807. Mobile:- +353 86 606 9713 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Passing the name of a variable to a function
Thanks to Joshua and Erik for very helpful suggestions. This clarifies what I had found to be a tricky area of the documentation! Best wishes, Anthony Staines -- Anthony Staines, Professor of Health Systems Research, School of Nursing, Dublin City University, Dublin 9, Ireland. Tel:- +353 1 7007807. Mobile:- +353 86 606 9713 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Maptools runs out of memory installing the help files for spCbind-methods
Hi, I'd be grateful for help with the following:- Running R version 2.9.2 (2009-08-24) on Gentoo Linux, on an x86 PC. I am trying to install maptools, (via CRAN from maptools_0.7-29.tar.gz'), for the first time. All runs smoothly until the installation gets to *** installing help indices for spCbind-methods, about two thirds of the way through the help files, at which point the installation hangs until R runs out of memory. The last few lines of the output are :- sp2tmaptexthtmllatex example spCbind-methodstexthtmllatex example Out of memory! and it freezes. Memory available as reported by gc() used (Mb) gc trigger (Mb) max used (Mb) Ncells 133838 3.6 35 9.4 35 9.4 Vcells 81771 0.7 786432 6.0 406077 3.1 top, in another shell, reports that I am using almost all the memory (2G) and almost all the swap (1G) the command perl /usr/lib/R/share/perl/build-help.pl --txt --html --latex is running as well as R. I tried this on an amd64 Linux system, and got exactly the same results. I've never seen this before, and I can't find any similar issues in the bug tracker, or on the forums. Any suggestions? Am I missing something that it needs? Anthony -- Anthony Staines, Professor of Health Systems Research, School of Nursing, Dublin City University, Dublin 9,Ireland. Tel:- +353 1 700 7807. Mobile:- +353 86 606 9713 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Prediction from BMA
Dear colleagues, I am doing a little bit of work using the BMA package to examine predictive models for types of pain - the outcome variable is a binary (Yes/No) for the type of pain experienced by a patient. Our observation is that BMA makes a lot of sense in this application, as the data suggests that there are several well-fitting possible logistic models, each with posterior probabilities close to 0.1, as well as a straggle of somewhat less likely models. I am unsure how best to produce predictions from the BMA output - i.e. the posterior means of model coefficients. There isn't a predict.BMA or similar, that I can see. What's the recommended way to produce predictions (i.e. fitted values, or estimates from new data) from these BMA models? Best wishes, Anthony Staines -- Anthony Staines, Professor of Health Systems Research, School of Nursing, Dublin City University, Dublin 9,Ireland. Tel:- +353 1 700 7807. Mobile:- +353 86 606 9713 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.