Greetings, I am thinking of alternative ways of removing the invariant scalar features from my feature vectors before training MLPs. So far I tried removing columns with 0-variance and columns with Pearson's R=1.0 or R=-1.0. If I remove columns with |R|<1.0 the performance drops. However, R measures the linear correlation. Now I am thinking to try removing columns with high Mutual Information, but first I need to normalize it. I found in the documentation under "Univariate Feature Selection" the function "mutual_info_regression".
https://scikit-learn.org/stable/modules/feature_selection.html#univariate-feature-selection I used this function to measure the correlation between columns (features) but sometimes returns values >1.0. On the other hand, there is also this function https://scikit-learn.org/stable/modules/generated/sklearn.metrics.adjusted_mutual_info_score.html#sklearn.metrics.adjusted_mutual_info_score which is upper limited to 1.0 but it is for categorical data (clusters). So my question is, is there a way to computer normalized Mutual Information for continuous variables, too? Thanks in advance for any advice. Thomas -- ====================================================================== Dr. Thomas Evangelidis Research Scientist IOCB - Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences <https://www.uochb.cz/web/structure/31.html?lang=en>, Prague, Czech Republic & CEITEC - Central European Institute of Technology <https://www.ceitec.eu/>, Brno, Czech Republic email: teva...@gmail.com, Twitter: tevangelidis <https://twitter.com/tevangelidis>, LinkedIn: Thomas Evangelidis <https://www.linkedin.com/in/thomas-evangelidis-495b45125/> website: https://sites.google.com/site/thomasevangelidishomepage/
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn