Hi, I am facing some problem with the "BayesianGaussianMixture" function, but I do not know if it is because of my poor knowledge on this type of statistics or if it is something related to the algorithm. I have set of data of around 1000 to 4000 observation (every feature is a spectrum of around 200 point) so in the end I have n_samples = ~1000 and n_features = ~20. The good things is that I am getting the same results of KMeans however the "predict_proba" has value only of 0 or 1.
I have wrote a small function to simulate my problem with random data that is reported below. The first 1/2 of the array has the point with a positive slope while the second 1/2 has a negative slope, so the cross in the middle. What I have seen is that for a small number of features I obtain good probability, but if the number of features increases (say 50) than the probability become only 0 or 1. Can someone help me in interpret this result? Here is the code I wrote with the generated random number, I'll generally run it with ncomponent=2 and nfeatures=5 or 10 or 50 or 100. I am not sure if it will work in every case is not very highly tested. I have also attached as a file! ########################################################################## import numpy as np from sklearn.mixture import GaussianMixture, BayesianGaussianMixture import matplotlib.pyplot as plt def test_bgm(ncomponent, nfeatures): temp = np.random.randn(500,nfeatures) temp = temp + np.arange(-1,1, 2.0/nfeatures) temp1 = np.random.randn(400,nfeatures) temp1 = temp1 + np.arange(1,-1, (-2.0/nfeatures)) X = np.vstack((temp, temp1)) bgm = BayesianGaussianMixture(ncomponent,degrees_of_freedom_prior=nfeatures*2).fit(X) bgm_proba = bgm.predict_proba(X) bgm_labels = bgm.predict(X) plt.figure(-1) plt.imshow(bgm_labels.reshape(30,-1), origin='lower', interpolatio='none') plt.colorbar() for i in np.arange(0,ncomponent): plt.figure(i) plt.imshow(bgm_proba[:,i].reshape(30,-1), origin='lower', interpolatio='none') plt.colorbar() plt.show() ############################################################################## Thank you in advance Tommaso -- Please do NOT send Microsoft Office Attachments: http://www.gnu.org/philosophy/no-word-attachments.html
import numpy as np from sklearn.mixture import GaussianMixture, BayesianGaussianMixture import matplotlib.pyplot as plt def test_bgm(ncomponent, nfeatures): temp = np.random.randn(500,nfeatures) temp = temp + np.arange(-1,1, 2.0/nfeatures) temp1 = np.random.randn(400,nfeatures) temp1 = temp1 + np.arange(1,-1, (-2.0/nfeatures)) X = np.vstack((temp, temp1)) bgm = BayesianGaussianMixture(ncomponent,degrees_of_freedom_prior=nfeatures*2).fit(X) bgm_proba = bgm.predict_proba(X) bgm_labels = bgm.predict(X) plt.figure(-1) plt.imshow(bgm_labels.reshape(30,-1), origin='lower', interpolatio='none') plt.colorbar() for i in np.arange(0,ncomponent): plt.figure(i) plt.imshow(bgm_proba[:,i].reshape(30,-1), origin='lower', interpolatio='none') plt.colorbar() plt.show()
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn