[R] goodness of "prediction" using a model (lm, glm, gam, brt, regression tree .... )

Corrado Wed, 02 Sep 2009 22:58:16 -0700

Dear R-friends,

How do you test the goodness of prediction of a model, when you predict on a 
set of data DIFFERENT from the training set?


I explain myself: you train your model M (e.g. glm,gam,regression tree, brt) 
on a set of data A with a response variable Y. You then predict the value of 
that same response variable Y on a different set of data B (e.g. predict.glm, 
predict.gam and so on). Dataset A and dataset B are different in the sense that 
they contain the same variable, for example temperature, measured in different 
sites, or on a different interval (e.g. B is a subinterval of A for 
interpolation, or a different interval for extrapolation). If you have the 
measured values for Y on the new interval, i.e. B, how do you measure how good 
is the prediction, that is how well model fits the Y on B (that is, how well 
does it predict)?

In other words:

Y~T,data=A for training
Y~T,data=B for predicting

I have devised a couple of method based around 1) standard deviation 2) R^2, 
but I am unhappy with them.

Regards 
-- 
Corrado Topi

Global Climate Change & Biodiversity Indicators
Area 18,Department of Biology
University of York, York, YO10 5YW, UK
Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] goodness of "prediction" using a model (lm, glm, gam, brt, regression tree .... )

Reply via email to