Dear R-friends, How do you test the goodness of prediction of a model, when you predict on a set of data DIFFERENT from the training set?
I explain myself: you train your model M (e.g. glm,gam,regression tree, brt) on a set of data A with a response variable Y. You then predict the value of that same response variable Y on a different set of data B (e.g. predict.glm, predict.gam and so on). Dataset A and dataset B are different in the sense that they contain the same variable, for example temperature, measured in different sites, or on a different interval (e.g. B is a subinterval of A for interpolation, or a different interval for extrapolation). If you have the measured values for Y on the new interval, i.e. B, how do you measure how good is the prediction, that is how well model fits the Y on B (that is, how well does it predict)? In other words: Y~T,data=A for training Y~T,data=B for predicting I have devised a couple of method based around 1) standard deviation 2) R^2, but I am unhappy with them. Regards -- Corrado Topi Global Climate Change & Biodiversity Indicators Area 18,Department of Biology University of York, York, YO10 5YW, UK Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.