Friends

Seems I've run into another snag. More of the nitty-gritty r-details I don't 
understand.

So, as I mentioned below, dataset[[var_sub]] seems to be understood well by the 
functions I previously used and I was able to run my loop successfully with the 
[[var_sub]] as a variable-substitution method. However, now I want to do the 
same with TukeyHSD, and this function does not play nice with this kind of 
syntax. So if I do


fac<-as.factor(dataset$factor)
res<-aov(dataset$var~dataset$factor)
tuk<-TukeyHSD(res,"fac")

things work fine. But if I try (similar to the script below which worked for 
ROCR functions):

fac<-as.factor(dataset$factor)
var_sub<-noquotes("var")
res<-aov(dataset[[var_sub]]~dataset$factor)
tuk<-TukeyHSD(res,"fac")

TukeyHSD craps out with an error, even though "res" is identical in both cases, 
apart from the formula syntax.

So, TukeyHSD seems to be picky about syntax. Is there any other way I can do 
variable substitution (so I can read variable names from my list) and get this 
loop to work for TukeyHSD?

Thanks

Jon


Friends

First, thanks to all for great feed-back. Open-source rocks! I have a workable 
solution to my question, attached below in case it might be of any use to 
anyone. I'm sure there are more elegant ways of doing this, so any further 
feedback is welcome!

Things I've learned (for other noobs like me to learn from):

1) dataset[[j]] seems equivalent to dataset$var if j<-var, though quotes can 
mess 
you up, hence j<-noquote(varlist[i]) in the script (it also makes a difference 
that variables in varlist be stored as a space-separated string. tab- or 
line-break-separated lists don't seem to work, though a different method might 
handle 
that)

dataset[["var"]] is "equivalent" to dataset$var given var does not contain any 
special characters. Otherwise j == "var" has to be TRUE.

2) Loops will abort if they encounter an error (like ROCR encountering a 
prediction that is singular). Error handling can be built in, but is a little 
tricky. I reduplicated the method with a function to test and advance the loop 
on failure. You can suppress error messages if you like)
Not tricky, just use try().


3) Some stats methods don't have NA handling built into them (eg: "prediction" 
in ROCR chokes if there are empty cells in the variables) hence it seems a good 
idea to 
strip these out before starting. The subsetting with na.omit does this

... given you know what you are doing (and omitting).


4) You reference pieces (slots) of results (S3/S4 objects) by using obj...@slot.
The @ operator is defined for slots of *S4* classes.


Best,
Uwe Ligges

> Hence, you pull out the the auc value in ROCR-"performance" by p...@y.value 
> in the script. you can see what slots are in an object by simply listing the 
> object contents at the command line>object.
Thanks again for all the help!

Jon

Soli Deo Gloria

Jon Erik Ween, MD, MS
Scientist, Kunin-Lunenfeld Applied Research Unit
Director, Stroke Clinic, Brain Health Clinic, Baycrest Centre
Assistant Professor, Dept. of Medicine, Div. of Neurology
     University of Toronto Faculty of Medicine


...code




################################################################################
## R script for automating stats crunching in large datasets                  ##
## Needs space separated list of variable names matching dataset column names ##
## You have to tinker with the code to customize for your application         ##
##                                                                              
                                          ##
## Jon Erik Ween MD, MSc,  26 Feb 2010                                        ##
################################################################################

library(ROCR) # Load stats package to use if not standard
varslist<-scan("/Users/jween/Desktop/INCASvars.txt","list") # Read variable list
results<-as.data.frame(array(,c(3,length(varslist)))) # Initialize results
array, one type of stat at a time for now

for (i in 1:length(varslist)){ # Loop throught the variables you want to 
process. Determined by varslist
        j<-noquote(varslist[i])
        vars<-c(varslist[i],"Issue_class") # Variables to be analyzed
        temp<-na.omit(incas[vars]) # Have to subset to get rid of NA values
causing ROCR to choke
        n<-nrow(temp) # Record how many cases the analysis ios based on. Need 
to figure out how to calc cases/controls
        #.table<-table(temp$SubjClass)  # Maybe for later figure out 
cases/controls
        results[1,i]<-j # Name particular results column
        results[2,i]<-n # Number of subjects in analysis
        test<-try(aucval(i,j),silent=TRUE) # Error handling in case procedure 
craps oust so loop can continue. Supress annoying error messages
        if(class(test)=="try-error") next else # Run procedure only if OK, 
otherwise skip
        pred<-prediction(incas[[j]],incas$Issue_class); # Procedure
        perf<-performance(pred,"auc");
        results[3,i]<-as.numeric(p...@y.values) # Enter result into appropriate 
row
}
write.table(results,"/Users/jween/Desktop/IncasRres_ 
Issue_class.csv",sep=",",col.names=FALSE,row.names=FALSE) # Write results to 
table
rm(aucval,i,n,temp,vars,results,test,pred,perf,j,varslist) # Clean up

aucval<-function(i,j){ # Function to trap errors. Should be the same as real 
procedure above
        pred<-prediction(incas[[j]],incas$Issue_class) # Don't put any real
results here, they don't seem to be passed back
        perf<-performance(pred,"auc")
}





        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to