Hi Tommaso.
So what's the issue? The distributions are very distinct, so there is no
confusion.
The higher the dimensionality, the further apart the points are (compare
the distance between (-1, 1) and (1, -1) to the one between (-1, -.5, 0,
.5, 1) and (1, .5, 0, -.5, -1).
I'm not sure what you mean by "the cross in the middle".
You create two fixed points, one at np.arange(-1,1, 2.0/nfeatures) and
one at np.arange(1,-1, (-2.0/nfeatures)). In high dimensions, these
points are very far apart.
Then you add standard normal noise to it. So this data is two perfect
Gaussians. In low dimensions, they are "close together" so there is some
confusion,
in high dimensions, they are "far apart" so there is less confusion.
Hth,
Andy
On 11/27/2016 11:47 AM, Tommaso Costanzo wrote:
Hi Jacob,
I have just changed my code from BayesianGaussianMixture to
GaussianMixture, and the results is the same. I attached here the
picture of the first component when I runned the code with 5, 10, and
50 nfeatures and 2 components. In my short test function I expect to
have point that they can be in one component as well as another has
visible for small number of nfeatures, but 0 1 for nfeatures >50 does
not sounds correct. Seems that is just related to the size of the
model and in particular to the number of features. With the
BayesianGaussianMixture I have seen that it is sligthly better to
increase the degree of freedoms to 2*nfeatures instead of the default
nfeatures. However, this does not change the result when the nfeatures
are 50 or more.
Thank you in advance
Tommaso
2016-11-25 21:32 GMT-05:00 Jacob Schreiber <jmschreibe...@gmail.com
<mailto:jmschreibe...@gmail.com>>:
Typically this means that the model is so confident in its
predictions it does not believe it possible for the sample to come
from the other component. Do you get the same results with a
regular GaussianMixture?
On Fri, Nov 25, 2016 at 11:34 AM, Tommaso Costanzo
<tommaso.costanz...@gmail.com
<mailto:tommaso.costanz...@gmail.com>> wrote:
Hi,
I am facing some problem with the "BayesianGaussianMixture"
function, but I do not know if it is because of my poor
knowledge on this type of statistics or if it is something
related to the algorithm. I have set of data of around 1000 to
4000 observation (every feature is a spectrum of around 200
point) so in the end I have n_samples = ~1000 and n_features =
~20. The good things is that I am getting the same results of
KMeans however the "predict_proba" has value only of 0 or 1.
I have wrote a small function to simulate my problem with
random data that is reported below. The first 1/2 of the array
has the point with a positive slope while the second 1/2 has a
negative slope, so the cross in the middle. What I have seen
is that for a small number of features I obtain good
probability, but if the number of features increases (say 50)
than the probability become only 0 or 1.
Can someone help me in interpret this result?
Here is the code I wrote with the generated random number,
I'll generally run it with ncomponent=2 and nfeatures=5 or 10
or 50 or 100. I am not sure if it will work in every case is
not very highly tested. I have also attached as a file!
##########################################################################
import numpy as np
from sklearn.mixture import GaussianMixture,
BayesianGaussianMixture
import matplotlib.pyplot as plt
def test_bgm(ncomponent, nfeatures):
temp = np.random.randn(500,nfeatures)
temp = temp + np.arange(-1,1, 2.0/nfeatures)
temp1 = np.random.randn(400,nfeatures)
temp1 = temp1 + np.arange(1,-1, (-2.0/nfeatures))
X = np.vstack((temp, temp1))
bgm =
BayesianGaussianMixture(ncomponent,degrees_of_freedom_prior=nfeatures*2).fit(X)
bgm_proba = bgm.predict_proba(X)
bgm_labels = bgm.predict(X)
plt.figure(-1)
plt.imshow(bgm_labels.reshape(30,-1), origin='lower',
interpolatio='none')
plt.colorbar()
for i in np.arange(0,ncomponent):
plt.figure(i)
plt.imshow(bgm_proba[:,i].reshape(30,-1),
origin='lower', interpolatio='none')
plt.colorbar()
plt.show()
##############################################################################
Thank you in advance
Tommaso
--
Please do NOT send Microsoft Office Attachments:
http://www.gnu.org/philosophy/no-word-attachments.html
<http://www.gnu.org/philosophy/no-word-attachments.html>
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org <mailto:scikit-learn@python.org>
https://mail.python.org/mailman/listinfo/scikit-learn
<https://mail.python.org/mailman/listinfo/scikit-learn>
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org <mailto:scikit-learn@python.org>
https://mail.python.org/mailman/listinfo/scikit-learn
<https://mail.python.org/mailman/listinfo/scikit-learn>
--
Please do NOT send Microsoft Office Attachments:
http://www.gnu.org/philosophy/no-word-attachments.html
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn