Dear all,

I am a statistician doing research in QSAR, building regression models where 
the dependent variable is a numerical expression of some chemical activity and 
input variables are chemical descriptors, e.g. molecular weight, number of 
carbon atoms, etc.

I am building regression models and I am confronted with a widely a technique 
called Y-RANDOMIZATION for which I have difficulties in finding references in 
general statistical literature regarding regression analysis. I would be 
grateful if someone could point me to papers/literature in statistical 
regression analysis which give scientific (statistical) foundation for using 
Y-RANDOMIZATION.

Y-RANDOMIZATION is a widely used technique in QSAR community to unsure the 
robustness of a QSPR (regression) model. It is used after the "best" regression 
model is selected and to make sure that there are no chance correlations. Here 
is a short description. The dependent variable vector (Y-vector) is randomly 
shuffled and a new QSPR (regression) model is fitted using the original 
independent variable matrix. By repeating this a number of times, say 100 
times, one will get hundred R2 and q2 (leave one out cross-validation R2) based 
on hundred shuffled Y. It is expected that the resulting regression models 
should generally have low R2 and low q2 values. However, if the majority of 
hundred regression models obtained in the Y-randomization have relatively high 
R2 and high q2 then it implies that an acceptable regression model cannot be 
obtained for the given data set by the current modelling method.

I cannot find any references to Y-randomization or Y-scrambling anywhere in the 
literature outside chemometrics/QSAR. Any links or references would be much 
appreciated.

Thanks in advance.

DK
----------------------------------------------
Damjan Krstajic
Director
Research Centre for Cheminformatics
Belgrade, Serbia

----------------------------------------------

                                          
_________________________________________________________________
Tell us your greatest, weirdest and funniest Hotmail stories

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to