Greetings,

I am thinking of alternative ways of removing the invariant scalar features
from my feature vectors before training MLPs. So far I tried removing
columns with 0-variance and columns with Pearson's R=1.0 or R=-1.0. If I
remove columns with |R|<1.0 the performance drops. However, R measures the
linear correlation. Now I am thinking to try removing columns with high
Mutual Information, but first I need to normalize it. I found in the
documentation under "Univariate Feature Selection" the function
"mutual_info_regression".

https://scikit-learn.org/stable/modules/feature_selection.html#univariate-feature-selection

I used this function to measure the correlation between columns (features)
but sometimes returns values >1.0. On the other hand, there is also this
function

https://scikit-learn.org/stable/modules/generated/sklearn.metrics.adjusted_mutual_info_score.html#sklearn.metrics.adjusted_mutual_info_score

which is upper limited to 1.0 but it is for categorical data (clusters). So
my question is, is there a way to computer normalized Mutual Information
for continuous variables, too?

Thanks in advance for any advice.
Thomas


-- 

======================================================================

Dr. Thomas Evangelidis

Research Scientist

IOCB - Institute of Organic Chemistry and Biochemistry of the Czech Academy
of Sciences <https://www.uochb.cz/web/structure/31.html?lang=en>, Prague,
Czech Republic
  &
CEITEC - Central European Institute of Technology
<https://www.ceitec.eu/>, Brno,
Czech Republic

email: teva...@gmail.com, Twitter: tevangelidis
<https://twitter.com/tevangelidis>, LinkedIn: Thomas Evangelidis
<https://www.linkedin.com/in/thomas-evangelidis-495b45125/>

website: https://sites.google.com/site/thomasevangelidishomepage/
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Reply via email to