Dear Andreas, thank you so much for your answser now I can see my mistake. What I am trying to do is convince myself that the fact that when I analyze my data I am getting probability of only 0 and 1 is it because the data are well separated so I was trying to make some synthetic data where there is a probabioity different from 0 or 1, but I did it in the wrong way. Does it sounds correct if I make 300 samples with random number centered at 0 and STD 1 and other 300 centered at 0.5 and then adding some samples in between these two gaussian distributions (say in between 0.15 and 0.35)? In this case I think that I should expect probability different from 0 or 1 in the two components (when using 2 components).
Thank you in advance Tommaso On Nov 28, 2016 11:58 AM, "Andreas Mueller" <t3k...@gmail.com> wrote: > Hi Tommaso. > So what's the issue? The distributions are very distinct, so there is no > confusion. > The higher the dimensionality, the further apart the points are (compare > the distance between (-1, 1) and (1, -1) to the one between (-1, -.5, 0, > .5, 1) and (1, .5, 0, -.5, -1). > I'm not sure what you mean by "the cross in the middle". > You create two fixed points, one at np.arange(-1,1, 2.0/nfeatures) and one > at np.arange(1,-1, (-2.0/nfeatures)). In high dimensions, these points are > very far apart. > Then you add standard normal noise to it. So this data is two perfect > Gaussians. In low dimensions, they are "close together" so there is some > confusion, > in high dimensions, they are "far apart" so there is less confusion. > > Hth, > Andy > > On 11/27/2016 11:47 AM, Tommaso Costanzo wrote: > > Hi Jacob, > > I have just changed my code from BayesianGaussianMixture to > GaussianMixture, and the results is the same. I attached here the picture > of the first component when I runned the code with 5, 10, and 50 nfeatures > and 2 components. In my short test function I expect to have point that > they can be in one component as well as another has visible for small > number of nfeatures, but 0 1 for nfeatures >50 does not sounds correct. > Seems that is just related to the size of the model and in particular to > the number of features. With the BayesianGaussianMixture I have seen that > it is sligthly better to increase the degree of freedoms to 2*nfeatures > instead of the default nfeatures. However, this does not change the result > when the nfeatures are 50 or more. > > Thank you in advance > Tommaso > > 2016-11-25 21:32 GMT-05:00 Jacob Schreiber <jmschreibe...@gmail.com>: > >> Typically this means that the model is so confident in its predictions it >> does not believe it possible for the sample to come from the other >> component. Do you get the same results with a regular GaussianMixture? >> >> On Fri, Nov 25, 2016 at 11:34 AM, Tommaso Costanzo < >> tommaso.costanz...@gmail.com> wrote: >> >>> Hi, >>> >>> I am facing some problem with the "BayesianGaussianMixture" function, >>> but I do not know if it is because of my poor knowledge on this type of >>> statistics or if it is something related to the algorithm. I have set of >>> data of around 1000 to 4000 observation (every feature is a spectrum of >>> around 200 point) so in the end I have n_samples = ~1000 and n_features = >>> ~20. The good things is that I am getting the same results of KMeans >>> however the "predict_proba" has value only of 0 or 1. >>> >>> I have wrote a small function to simulate my problem with random data >>> that is reported below. The first 1/2 of the array has the point with a >>> positive slope while the second 1/2 has a negative slope, so the cross in >>> the middle. What I have seen is that for a small number of features I >>> obtain good probability, but if the number of features increases (say 50) >>> than the probability become only 0 or 1. >>> Can someone help me in interpret this result? >>> >>> Here is the code I wrote with the generated random number, I'll >>> generally run it with ncomponent=2 and nfeatures=5 or 10 or 50 or 100. I am >>> not sure if it will work in every case is not very highly tested. I have >>> also attached as a file! >>> >>> ############################################################ >>> ############## >>> import numpy as np >>> >>> from sklearn.mixture import GaussianMixture, >>> BayesianGaussianMixture >>> import matplotlib.pyplot as plt >>> >>> >>> >>> def test_bgm(ncomponent, nfeatures): >>> >>> temp = np.random.randn(500,nfeatures) >>> >>> temp = temp + np.arange(-1,1, 2.0/nfeatures) >>> >>> temp1 = np.random.randn(400,nfeatures) >>> >>> temp1 = temp1 + np.arange(1,-1, (-2.0/nfeatures)) >>> >>> X = np.vstack((temp, temp1)) >>> >>> >>> >>> bgm = >>> BayesianGaussianMixture(ncomponent,degrees_of_freedom_prior=nfeatures*2).fit(X) >>> >>> bgm_proba = bgm.predict_proba(X) >>> >>> bgm_labels = bgm.predict(X) >>> >>> >>> >>> plt.figure(-1) >>> >>> plt.imshow(bgm_labels.reshape(30,-1), origin='lower', >>> interpolatio='none') >>> plt.colorbar() >>> >>> >>> >>> for i in np.arange(0,ncomponent): >>> >>> plt.figure(i) >>> >>> plt.imshow(bgm_proba[:,i].reshape(30,-1), origin='lower', >>> interpolatio='none') >>> plt.colorbar() >>> >>> >>> >>> plt.show() >>> ############################################################ >>> ################## >>> >>> Thank you in advance >>> Tommaso >>> >>> >>> -- >>> Please do NOT send Microsoft Office Attachments: >>> http://www.gnu.org/philosophy/no-word-attachments.html >>> >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn@python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >>> >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn@python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> > > > -- > Please do NOT send Microsoft Office Attachments: > http://www.gnu.org/philosophy/no-word-attachments.html > > > _______________________________________________ > scikit-learn mailing > listscikit-learn@python.orghttps://mail.python.org/mailman/listinfo/scikit-learn > > > > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn > >
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn