Dear all, I am a statistician doing research in QSAR, building regression models where the dependent variable is a numerical expression of some chemical activity and input variables are chemical descriptors, e.g. molecular weight, number of carbon atoms, etc.
I am building regression models and I am confronted with a widely a technique called Y-RANDOMIZATION for which I have difficulties in finding references in general statistical literature regarding regression analysis. I would be grateful if someone could point me to papers/literature in statistical regression analysis which give scientific (statistical) foundation for using Y-RANDOMIZATION. Y-RANDOMIZATION is a widely used technique in QSAR community to unsure the robustness of a QSPR (regression) model. It is used after the "best" regression model is selected and to make sure that there are no chance correlations. Here is a short description. The dependent variable vector (Y-vector) is randomly shuffled and a new QSPR (regression) model is fitted using the original independent variable matrix. By repeating this a number of times, say 100 times, one will get hundred R2 and q2 (leave one out cross-validation R2) based on hundred shuffled Y. It is expected that the resulting regression models should generally have low R2 and low q2 values. However, if the majority of hundred regression models obtained in the Y-randomization have relatively high R2 and high q2 then it implies that an acceptable regression model cannot be obtained for the given data set by the current modelling method. I cannot find any references to Y-randomization or Y-scrambling anywhere in the literature outside chemometrics/QSAR. Any links or references would be much appreciated. Thanks in advance. DK ---------------------------------------------- Damjan Krstajic Director Research Centre for Cheminformatics Belgrade, Serbia ---------------------------------------------- _________________________________________________________________ Tell us your greatest, weirdest and funniest Hotmail stories [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.