Hello,

I have two continuous variables (heart rate samples over a period of time),
and would like to compute mutual information between them as a measure of
similarity.

I've read some posts suggesting to use the mutual_info_score from
scikit-learn but will this work for continuous variables? One stackoverflow
answer suggested converting the data into probabilities with
np.histogram2d() and passing the contingency table to the mutual_info_score.

from sklearn.metrics import mutual_info_score

def calc_MI(x, y, bins):
    c_xy = np.histogram2d(x, y, bins)[0]
    mi = mutual_info_score(None, None, contingency=c_xy)
    return mi

# generate data
L = np.linalg.cholesky( [[1.0, 0.60], [0.60, 1.0]])
uncorrelated = np.random.standard_normal((2, 300))
correlated = np.dot(L, uncorrelated)
A = correlated[0]
B = correlated[1]
x = (A - np.mean(A)) / np.std(A)
y = (B - np.mean(B)) / np.std(B)

# calculate MI
mi = calc_MI(x, y, 50)

Is calc_MI a valid approach? I'm asking because I also read that when
variables are continuous, then the sums in the formula for discrete data
become integrals, but I'm not sure if this procedure is implemented in
scikit-learn?

Thanks!
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Reply via email to