Hello, I am kind of confused about the use of sample_weights parameter in the fit() function of RandomForestRegressor. Here is my problem:
I am trying to predict the binding affinity of small molecules to a protein. I have a training set of 709 molecules and a blind test set of 180 molecules. I want to find those features that are more important for the correct prediction of the binding affinity of those 180 molecules of my blind test set. My rationale is that if I give more emphasis to the similar molecules in the training set, then I will get higher importances for those features that have higher predictive ability for this specific blind test set of 180 molecules. To this end, I weighted the 709 training set molecules by their maximum similarity to the 180 molecules, selected only those features with high importance and trained a new RF with all 709 molecules. I got some results but I am not satisfied. Is this the right way to use sample_weights in RF. I would appreciate any advice or suggested work flow. -- ====================================================================== Dr Thomas Evangelidis Post-doctoral Researcher CEITEC - Central European Institute of Technology Masaryk University Kamenice 5/A35/2S049, 62500 Brno, Czech Republic email: tev...@pharm.uoa.gr teva...@gmail.com website: https://sites.google.com/site/thomasevangelidishomepage/
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn