[R] Help with confidence intervals for gam model using mgcv

2012-03-10 Thread Anthony Staines

Hi,

I would be very grateful for advice on getting confidence 
intervals for the ordinary (non smoothed) parameter 
estimates from a gam.


Motivation
I am studying hospital outcomes in a large data set. The 
outcomes of interest to me are all binary variables. The one 
in the example here, Dead30d, is death within 30 days of 
admission. Sexf is gender (M or F), Age is age in years at 
the start of the admission. The standard glm is a logistic 
regression :-


glmDead.AS - glm(Dead30d~Sexf+Age, 
data=HIPE,family=binomial(link=logit))


The corresponding GAM, with a smooth for age, is :-

gamDead.AS - gam(Dead30d~Sexf+s(Age), 
data=HIPE,family=binomial(link=logit))


For my work, age is a nuisance. We already know exactly the 
effect of age (which has an odd shape). I have no interest 
in this parameter, nor in CIs for it. The GAM fits notably 
better than the GLM. The substantive interest is in the 
effects of the other variables, Sexf, and many more.


For the GLM, the confidence intervals are simple matter of 
confint(glmDead.AS). For my discipline CI's are required, 
and the profile CI's that confint produces are ideal.


There doesn't seem to be an analogous function for mgcv. The 
advice most commonly given is to use predict.gam with 
se.fit=TRUE. This does not seem to produce CI's for the 
non-smoothed parameters, which is what I need to calculate. 
CIs for the smooth, which are the focus of interest in many 
other cases are not of interest to me.


Any suggestions? Am I missing something very obvious?

Best wishes,
Anthony Staines
--
Anthony Staines, Professor of Health Systems,
School of Nursing and Human Sciences, DCU, Dublin 9,Ireland.
Tel:- +353 1 700 7807. Mobile:- +353 86 606 9713
http://astaines.eu/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Advice on recoding a variable depending on another which contains NAs

2011-11-19 Thread Anthony Staines

Dear colleagues,

I would be very grateful for your help with the following. I 
have banged my head off this question several times in the 
past, and repeatedly over the last week. I have looked in 
the usual places and found no obvious solution. I fear that 
this just means I didn't recognize it, but I'd be very 
grateful for your help.


I am scoring 8000 psychometric tests - the SCQ, if you have 
heard of it. On this test the scoring rules depends on one 
variable SCQ1 - if this is answered yes, the final score is 
a function of 39 variables, and if no, of 31 variables.


I've calculated both of these scores (SCQScore1 and 
SCQScore2)for all the children in my study, and I wish to 
create a final score, which is SCQScore1 when SCQ1 is 1, and 
SCQScore2 when SCQ1 is 2. There are also missing values for 
SCQ1, and I have chosen, for the moment, to set the final 
score to SCQScore1 for these. [[This is a debatable choice, 
but I am not asking your advice on that choice!]]


d$SCQScore - 99
##Distinct value for any other values I've missed

d$SCQScore[SCQ1 == 1] - d$SCQScore1[SCQ1 == 1]
## Talks using phrases/sentences, so sum S2CQ:SCQ40

d$SCQScore[SCQ1 == 2] - d$SCQScore2[SCQ1 == 2]
## Can't do this, so sum SCQ8:SCQ40

d$SCQScore[is.na(d$SCQ1)] - d$SCQScore1 [is.na(d$SCQ1)]
## SCQ1 is missing

This fails on line 2
(d$SCQScore[SCQ1 == 1] - d$SCQScore1[SCQ1 == 1])
 with the error message
NAs are not allowed in subscripted assignments,
presumably because SCQ1 does indeed contain missing values.

This can be fixed, got around, or otherwise bypassed, by 
creating a new variable SCQ1, with no missing values, as 
shown :-


SCQ1 - d$SCQ1
SCQ1[is.na(SCQ1)] - 3

d$SCQScore[SCQ1 == 1] - d$SCQScore1[SCQ1 == 1]
## Talks using phrases/sentences so sum S2CQ:SCQ40
d$SCQScore[SCQ1 == 2] - d$SCQScore2[SCQ1 == 2]
## Can't do this, so sum SCQ8:SCQ40
d$SCQScore[SCQ1 == 3] - d$SCQScore1[SCQ1 == 3]
## We don't know if he/she can talk, so guess - sum S2:S40

This type of thing is a common problem in my little world. 
Is there a better/less klutzy/smarter way of solving it than 
creating a new variable each time? Please bear in mind that 
it is critical, for later analysis, to keep the missing 
values in SCQ1.


Best wishes,
Anthony Staines
--
Anthony Staines, Professor of Health Systems,
School of Nursing and Human Sciences, DCU, Dublin 9,Ireland.
Tel:- +353 1 700 7807. Mobile:- +353 86 606 9713
http://astaines.eu/
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Advice on recoding a variable depending on another which contains NAs

2011-11-19 Thread Anthony Staines

Ah,
so that's how ifelse gets used...

Presumably if I had more than 2 non-missing values in the 
control variable, I could use it several times.


Thank you very much, for a really useful answer, and thanks 
for getting back so quickly!


All the best,
Anthony Staines

On 11/19/11 23:55, David Winsemius wrote:


On Nov 19, 2011, at 6:31 PM, Anthony Staines wrote:


This would seem to be an obvious task for ifelse()

SCQScore - NA
d$SCQScore - ifelse( SCQ1 == 1, d$SCQScore1, d$SCOScore2)

(And don't use 99 for missing. Use NA. It will protect you
better than 99.)


I suppose you could enforce the two level testing with:

d$SCQScore - ifelse( SCQ1 == 1, d$SCQScore1,
ifelse(SCQ1 ==2, d$SCOScore2, NA))



d$SCQScore - 99
##Distinct value for any other values I've missed

d$SCQScore[SCQ1 == 1] - d$SCQScore1[SCQ1 == 1]
## Talks using phrases/sentences, so sum S2CQ:SCQ40

d$SCQScore[SCQ1 == 2] - d$SCQScore2[SCQ1 == 2]
## Can't do this, so sum SCQ8:SCQ40

d$SCQScore[is.na(d$SCQ1)] - d$SCQScore1 [is.na(d$SCQ1)]
## SCQ1 is missing

This fails on line 2
(d$SCQScore[SCQ1 == 1] - d$SCQScore1[SCQ1 == 1])
with the error message
NAs are not allowed in subscripted assignments,
presumably because SCQ1 does indeed contain missing values.

This can be fixed, got around, or otherwise bypassed, by
creating a new variable SCQ1, with no missing values, as
shown :-

SCQ1 - d$SCQ1
SCQ1[is.na(SCQ1)] - 3

d$SCQScore[SCQ1 == 1] - d$SCQScore1[SCQ1 == 1]
## Talks using phrases/sentences so sum S2CQ:SCQ40
d$SCQScore[SCQ1 == 2] - d$SCQScore2[SCQ1 == 2]
## Can't do this, so sum SCQ8:SCQ40
d$SCQScore[SCQ1 == 3] - d$SCQScore1[SCQ1 == 3]
## We don't know if he/she can talk, so guess - sum S2:S40

This type of thing is a common problem in my little world.
Is there a better/less klutzy/smarter way of solving it
than creating a new variable each time? Please bear in
mind that it is critical, for later analysis, to keep the
missing values in SCQ1.

Best wishes,
Anthony Staines
--
Anthony Staines, Professor of Health Systems,
School of Nursing and Human Sciences, DCU, Dublin 9,Ireland.
Tel:- +353 1 700 7807. Mobile:- +353 86 606 9713
http://astaines.eu/
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained,
reproducible code.


David Winsemius, MD
West Hartford, CT



--
Anthony Staines, Professor of Health Systems,
School of Nursing and Human Sciences, DCU, Dublin 9,Ireland.
Tel:- +353 1 700 7807. Mobile:- +353 86 606 9713
http://astaines.eu/
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Advice on obscuring unique IDs in R

2011-01-05 Thread Anthony Staines
Dear colleagues,

This may be a question with a really obvious answer, but I
can't find it. I have access to a large file with real
medical record identifiers (mixed strings of characters and
numbers) in it. These represent medical events for many
thousands of people. It's important to be able to link
events for the same people.

It's much more important that the real record numbers are
strongly obscured. I'm interested in some kind of strong
one-way hash function to which I can feed the real numbers
and get back unique codes for each record  identifier fed
in. I can do this on the health service system, and I have
to do this before making further use of the data!

There is the 'digest' function, in the digest package, but
this seems to work on the whole vector of IDs, producing, in
my case, a vector with 60,000 identical entries.

H.Out$P_ID = digest(H.In$MRNr,serialize=FALSE, algo='md5')

I could do this in Perl, but I'd have to do quite a bit of
work to get it installed.

Any quick suggestions?
Anthony Staines
-- 
Anthony Staines, Professor of Health Systems Research,
School of Nursing, Dublin City University, Dublin 9,Ireland.
Tel:- +353 1 700 7807. Mobile:- +353 86 606 9713

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Installing rJava fails on Gentoo (amd64) with Sun JDK - checking JNI data types... error

2010-08-13 Thread Anthony Staines
/bin/javah
Java archive tool: /usr/bin/jar
Java library path:
$(JAVA_HOME)/lib/amd64/server:$(JAVA_HOME)/lib/amd64:$(JAVA_HOME)/../lib/amd64::/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
JNI linker flags : -L$(JAVA_HOME)/lib/amd64/server
-L$(JAVA_HOME)/lib/amd64 -L$(JAVA_HOME)/../lib/amd64 -L
-L/usr/java/packages/lib/amd64 -L/usr/lib64 -L/lib64 -L/lib
-L/usr/lib -ljvm
JNI cpp flags: -I$(JAVA_HOME)/../include
-I$(JAVA_HOME)/../include/linux

Updating Java configuration in /usr/lib64/R
Done.


AND

Kitchen ~ # env JAVA_HOME=/opt/sun-jdk-1.6.0.20/ R CMD
INSTALL rJava_0.8-5.tar.gz
* installing to library ‘/usr/lib64/R/library’
* installing *source* package ‘rJava’ ...
checking for gcc... x86_64-pc-linux-gnu-gcc -std=gnu99
checking for C compiler default output file name... a.out
checking whether the C compiler works... yes
checking whether we are cross compiling... no

checking Java support in R... present:
interpreter : '/usr/bin/java'
archiver: '/usr/bin/jar'
compiler:
'/home/astaines/.gentoo/java-config-2/current-user-vm/bin/javac'
header prep.: '/usr/bin/javah'
cpp flags   : '-I/opt/sun-jdk-1.6.0.20/jre/../include
-I/opt/sun-jdk-1.6.0.20/jre/../include/linux'
java libs   : '-L/opt/sun-jdk-1.6.0.20/jre/lib/amd64/server
-L/opt/sun-jdk-1.6.0.20/jre/lib/amd64
-L/opt/sun-jdk-1.6.0.20/jre/../lib/amd64 -L
-L/usr/java/packages/lib/amd64 -L/usr/lib64 -L/lib64 -L/lib
-L/usr/lib -ljvm'
checking whether JNI programs can be compiled... yes
checking JNI data types... configure: error: One or more JNI
types differ from the corresponding native type. You may
need to use non-standard compiler flags or a different
compiler in order to fix this.
ERROR: configuration failed for package ‘rJava’
* removing ‘/usr/lib64/R/library/rJava’


All advice gratefully received
-- 
Anthony Staines, Professor of Health Systems Research,
School of Nursing, Dublin City University, Dublin 9,Ireland.
Tel:- +353 1 700 7807. Mobile:- +353 86 606 9713

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Passing the name of a variable to a function

2010-08-04 Thread Anthony Staines
Dear colleagues,

I have a problem which has bitten me occasionally. I often need to
prepare graphs for many variables in a data set, but seldom for all.
or for any large number of sequential or sequentially named variables.
Often I need several graphs for different subsets of the dataset
for a given variable.  I run into similar problems with other needs
besides graphing.

 What I would like to do is something like write a function which
takes the *name* of a variable, presumably a s a character string,
from a dataframe, as one argument, and the dataframe, as a second argument.

For example, where y is to be the the name of a variable in a given
dataframe d, and the other variables needed, T, M and so on, are
to be found in the same dataframe :-

pf - function (y,data,...) {
p1 - xyplot(y~x|T,data)
p2 - xyplot(y~x|T,subset(data,M == 2))
p3 - xyplot(y~x|T,subset(data,M == 4))
#print(p1,p2,p3)
}
 pf(Score1,data)
pf(Score2,data)


This fails, because, of course, Score 1, Score 2 etc.. are  not
defined, or if you pass them as pf(data$Score2,data), then when you
subset the
data, data$Score2 is now the wrong shape.  I've come up with various
inelegant hacks, (often with for loops), for getting around this over
the
last few years, but I can't help feeling that I'm missing something
obvious, which I've been too dim to spot.

Please note the answer to my requirements is *not* a clever use of
lattice. This question goes beyond graphs.
All suggestions gratefully received!

Best wishes,
Anthony Staines
-- 
Anthony Staines,  Professor of Health Systems Research,
School  of Nursing, Dublin City University, Dublin 9, Ireland.
Tel:- +353 1 7007807. Mobile:- +353 86 606 9713

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Passing the name of a variable to a function

2010-08-04 Thread Anthony Staines
Thanks to Joshua and Erik for very helpful suggestions. This clarifies
what I had found to be a tricky area of the documentation!
Best wishes,
Anthony Staines

-- 
Anthony Staines,  Professor of Health Systems Research,
School  of Nursing, Dublin City University, Dublin 9, Ireland.
Tel:- +353 1 7007807. Mobile:- +353 86 606 9713

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Maptools runs out of memory installing the help files for spCbind-methods

2010-01-26 Thread Anthony Staines
Hi,
I'd be grateful for help with the following:-

Running R version 2.9.2 (2009-08-24) on Gentoo Linux, on an
x86 PC. I am trying to install maptools, (via CRAN from
maptools_0.7-29.tar.gz'), for the first time.

All runs smoothly until the installation gets to
*** installing help indices for spCbind-methods, about two
thirds of the way through the help files, at which point the
installation hangs until R runs out of memory.
The last few lines of the output are :-
sp2tmaptexthtmllatex   example
spCbind-methodstexthtmllatex   example
Out of memory!
and it freezes.

Memory available as reported by
 gc()
 used (Mb) gc trigger (Mb) max used (Mb)
Ncells 133838  3.6 35  9.4   35  9.4
Vcells  81771  0.7 786432  6.0   406077  3.1

top, in another shell, reports that I am using almost all
the memory (2G) and almost all the swap (1G)

the command
perl /usr/lib/R/share/perl/build-help.pl --txt --html
--latex is running as well as R.

I tried this on an amd64 Linux system, and got exactly the
same results.

I've never seen this before, and I can't find any similar
issues in the bug tracker, or on the forums.

Any suggestions? Am I missing something that it needs?
Anthony
-- 
Anthony Staines, Professor of Health Systems Research,
School of Nursing, Dublin City University, Dublin 9,Ireland.
Tel:- +353 1 700 7807. Mobile:- +353 86 606 9713

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Prediction from BMA

2009-10-07 Thread Anthony Staines
Dear colleagues,

I am doing a little bit of work using the BMA package to examine
predictive models for types of pain - the outcome variable is a binary
(Yes/No) for the type of pain experienced by a patient.
Our observation is that BMA makes a lot of sense in this application, as
the data suggests that there are several well-fitting possible logistic
models, each with posterior probabilities close to 0.1, as well as a
straggle of somewhat less likely models.

I am unsure how best to produce predictions from the BMA output - i.e.
the posterior means of model coefficients. There isn't a predict.BMA or
similar, that I can see. What's the recommended way to produce
predictions (i.e. fitted values, or estimates from new data) from these
BMA models?

Best wishes,
Anthony Staines
-- 
Anthony Staines, Professor of Health Systems Research,
School of Nursing, Dublin City University, Dublin 9,Ireland.
Tel:- +353 1 700 7807. Mobile:- +353 86 606 9713

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.