For estimating mutual information on continuous variables, have a look at the corresponding package https://pypi.org/project/mutual-info/
G On Wed, Feb 01, 2023 at 02:32:03PM +0100, m m wrote: > Hello, > I have two continuous variables (heart rate samples over a period of time), > and > would like to compute mutual information between them as a measure of > similarity. > I've read some posts suggesting to use the mutual_info_score from scikit-learn > but will this work for continuous variables? One stackoverflow answer > suggested > converting the data into probabilities with np.histogram2d() and passing the > contingency table to the mutual_info_score. > from sklearn.metrics import mutual_info_score > def calc_MI(x, y, bins): > c_xy = np.histogram2d(x, y, bins)[0] > mi = mutual_info_score(None, None, contingency=c_xy) > return mi > # generate data > L = np.linalg.cholesky( [[1.0, 0.60], [0.60, 1.0]]) > uncorrelated = np.random.standard_normal((2, 300)) > correlated = np.dot(L, uncorrelated) > A = correlated[0] > B = correlated[1] > x = (A - np.mean(A)) / np.std(A) > y = (B - np.mean(B)) / np.std(B) > # calculate MI > mi = calc_MI(x, y, 50) > Is calc_MI a valid approach? I'm asking because I also read that when > variables > are continuous, then the sums in the formula for discrete data become > integrals, but I'm not sure if this procedure is implemented in scikit-learn? > Thanks! > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn -- Gael Varoquaux Research Director, INRIA http://gael-varoquaux.info http://twitter.com/GaelVaroquaux _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn