Hello everyone,
I working in a public health project and we have created a Decision Tree for
categorical variables usign the package rpart. Our goal is to develop a model
(Using the ROC tool) in order to predict presence/ausent of diabetes and get a
better understanding of what are the important factors in a particular chilean
population. There are some importants variable that we have found. Now we
want to apply this model over a big dataset in order to determinate a possible
outcome (probability of getting the deseasse), but we only have the combination
of predictive variables for a particular person.
We have created this code:
library( rpart)
fit1 <- rpart(sickness~ aetinghabit+gse+age+sex, method="class", data=data)
prediccion<-predict(fit1,bigdatabase, type="prob")
predictionsyes<-prediccion[,2]
pred <- prediction(predictionsyes, datos$sickness) # but this is
My question is. How do I put the people's conditions in this model in order to
get the people probability of getting this desease? It's possible to do a ROC
curve using only this bigdatabase? Because we don't have the outcome if this
people got or not this disease.
It would be very helpful if someone can give us some light about it. Any web
source of doing it will be very appreciated.
Thanks in advance.
Best Regards,
José Bustos
Escuela de Enfermeria
Pontificia Universidad Católica de Chile
Proyecto FONIS 2010
Celular 95939144
[[alternative HTML version deleted]]
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.