Dear Andreas,
thank you so much for your answser now I can see my mistake. What I am
trying to do is convince myself that the fact that when I analyze my
data I am getting probability of only 0 and 1 is it because the data
are well separated so I was trying to make some synthetic data where
there is a probabioity different from 0 or 1, but I did it in the
wrong way. Does it sounds correct if I make 300 samples with random
number centered at 0 and STD 1 and other 300 centered at 0.5 and then
adding some samples in between these two gaussian distributions (say
in between 0.15 and 0.35)? In this case I think that I should expect
probability different from 0 or 1 in the two components (when using 2
components).
Thank you in advance
Tommaso
On Nov 28, 2016 11:58 AM, "Andreas Mueller" <t3k...@gmail.com
<mailto:t3k...@gmail.com>> wrote:
Hi Tommaso.
So what's the issue? The distributions are very distinct, so there
is no confusion.
The higher the dimensionality, the further apart the points are
(compare the distance between (-1, 1) and (1, -1) to the one
between (-1, -.5, 0, .5, 1) and (1, .5, 0, -.5, -1).
I'm not sure what you mean by "the cross in the middle".
You create two fixed points, one at np.arange(-1,1, 2.0/nfeatures)
and one at np.arange(1,-1, (-2.0/nfeatures)). In high dimensions,
these points are very far apart.
Then you add standard normal noise to it. So this data is two
perfect Gaussians. In low dimensions, they are "close together" so
there is some confusion,
in high dimensions, they are "far apart" so there is less confusion.
Hth,
Andy
On 11/27/2016 11:47 AM, Tommaso Costanzo wrote:
Hi Jacob,
I have just changed my code from BayesianGaussianMixture to
GaussianMixture, and the results is the same. I attached here the
picture of the first component when I runned the code with 5, 10,
and 50 nfeatures and 2 components. In my short test function I
expect to have point that they can be in one component as well as
another has visible for small number of nfeatures, but 0 1 for
nfeatures >50 does not sounds correct. Seems that is just
related to the size of the model and in particular to the number
of features. With the BayesianGaussianMixture I have seen that it
is sligthly better to increase the degree of freedoms to
2*nfeatures instead of the default nfeatures. However, this does
not change the result when the nfeatures are 50 or more.
Thank you in advance
Tommaso
2016-11-25 21:32 GMT-05:00 Jacob Schreiber
<jmschreibe...@gmail.com <mailto:jmschreibe...@gmail.com>>:
Typically this means that the model is so confident in its
predictions it does not believe it possible for the sample to
come from the other component. Do you get the same results
with a regular GaussianMixture?
On Fri, Nov 25, 2016 at 11:34 AM, Tommaso Costanzo
<tommaso.costanz...@gmail.com
<mailto:tommaso.costanz...@gmail.com>> wrote:
Hi,
I am facing some problem with the
"BayesianGaussianMixture" function, but I do not know if
it is because of my poor knowledge on this type of
statistics or if it is something related to the
algorithm. I have set of data of around 1000 to 4000
observation (every feature is a spectrum of around 200
point) so in the end I have n_samples = ~1000 and
n_features = ~20. The good things is that I am getting
the same results of KMeans however the "predict_proba"
has value only of 0 or 1.
I have wrote a small function to simulate my problem with
random data that is reported below. The first 1/2 of the
array has the point with a positive slope while the
second 1/2 has a negative slope, so the cross in the
middle. What I have seen is that for a small number of
features I obtain good probability, but if the number of
features increases (say 50) than the probability become
only 0 or 1.
Can someone help me in interpret this result?
Here is the code I wrote with the generated random
number, I'll generally run it with ncomponent=2 and
nfeatures=5 or 10 or 50 or 100. I am not sure if it will
work in every case is not very highly tested. I have also
attached as a file!
##########################################################################
import numpy as np
from sklearn.mixture import GaussianMixture,
BayesianGaussianMixture
import matplotlib.pyplot as plt
def test_bgm(ncomponent, nfeatures):
temp = np.random.randn(500,nfeatures)
temp = temp + np.arange(-1,1, 2.0/nfeatures)
temp1 = np.random.randn(400,nfeatures)
temp1 = temp1 + np.arange(1,-1, (-2.0/nfeatures))
X = np.vstack((temp, temp1))
bgm =
BayesianGaussianMixture(ncomponent,degrees_of_freedom_prior=nfeatures*2).fit(X)
bgm_proba = bgm.predict_proba(X)
bgm_labels = bgm.predict(X)
plt.figure(-1)
plt.imshow(bgm_labels.reshape(30,-1), origin='lower',
interpolatio='none')
plt.colorbar()
for i in np.arange(0,ncomponent):
plt.figure(i)
plt.imshow(bgm_proba[:,i].reshape(30,-1), origin='lower',
interpolatio='none')
plt.colorbar()
plt.show()
##############################################################################
Thank you in advance
Tommaso
--
Please do NOT send Microsoft Office Attachments:
http://www.gnu.org/philosophy/no-word-attachments.html
<http://www.gnu.org/philosophy/no-word-attachments.html>
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org <mailto:scikit-learn@python.org>
https://mail.python.org/mailman/listinfo/scikit-learn
<https://mail.python.org/mailman/listinfo/scikit-learn>
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org <mailto:scikit-learn@python.org>
https://mail.python.org/mailman/listinfo/scikit-learn
<https://mail.python.org/mailman/listinfo/scikit-learn>
--
Please do NOT send Microsoft Office Attachments:
http://www.gnu.org/philosophy/no-word-attachments.html
<http://www.gnu.org/philosophy/no-word-attachments.html>
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org <mailto:scikit-learn@python.org>
https://mail.python.org/mailman/listinfo/scikit-learn
<https://mail.python.org/mailman/listinfo/scikit-learn>
_______________________________________________ scikit-learn
mailing list scikit-learn@python.org
<mailto:scikit-learn@python.org>
https://mail.python.org/mailman/listinfo/scikit-learn
<https://mail.python.org/mailman/listinfo/scikit-learn>
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn