Hi Tommaso.
So what's the issue? The distributions are very distinct, so there is no confusion. The higher the dimensionality, the further apart the points are (compare the distance between (-1, 1) and (1, -1) to the one between (-1, -.5, 0, .5, 1) and (1, .5, 0, -.5, -1).
I'm not sure what you mean by "the cross in the middle".
You create two fixed points, one at np.arange(-1,1, 2.0/nfeatures) and one at np.arange(1,-1, (-2.0/nfeatures)). In high dimensions, these points are very far apart. Then you add standard normal noise to it. So this data is two perfect Gaussians. In low dimensions, they are "close together" so there is some confusion,
in high dimensions, they are "far apart" so there is less confusion.

Hth,
Andy

On 11/27/2016 11:47 AM, Tommaso Costanzo wrote:
Hi Jacob,

I have just changed my code from BayesianGaussianMixture to GaussianMixture, and the results is the same. I attached here the picture of the first component when I runned the code with 5, 10, and 50 nfeatures and 2 components. In my short test function I expect to have point that they can be in one component as well as another has visible for small number of nfeatures, but 0 1 for nfeatures >50 does not sounds correct. Seems that is just related to the size of the model and in particular to the number of features. With the BayesianGaussianMixture I have seen that it is sligthly better to increase the degree of freedoms to 2*nfeatures instead of the default nfeatures. However, this does not change the result when the nfeatures are 50 or more.

Thank you in advance
Tommaso

2016-11-25 21:32 GMT-05:00 Jacob Schreiber <jmschreibe...@gmail.com <mailto:jmschreibe...@gmail.com>>:

    Typically this means that the model is so confident in its
    predictions it does not believe it possible for the sample to come
    from the other component. Do you get the same results with a
    regular GaussianMixture?

    On Fri, Nov 25, 2016 at 11:34 AM, Tommaso Costanzo
    <tommaso.costanz...@gmail.com
    <mailto:tommaso.costanz...@gmail.com>> wrote:

        Hi,

        I am facing some problem with the "BayesianGaussianMixture"
        function, but I do not know if it is because of my poor
        knowledge on this type of statistics or if it is something
        related to the algorithm. I have set of data of around 1000 to
        4000 observation (every feature is a spectrum of around 200
        point) so in the end I have n_samples = ~1000 and n_features =
        ~20. The good things is that I am getting the same results of
        KMeans however the "predict_proba" has value only of 0 or 1.

        I have wrote a small function to simulate my problem with
        random data that is reported below. The first 1/2 of the array
        has the point with a positive slope while the second 1/2 has a
        negative slope, so the cross in the middle. What I have seen
        is that for a small number of features I obtain good
        probability, but if the number of features increases (say 50)
        than the probability become only 0 or 1.
        Can someone help me in interpret this result?

        Here is the code I wrote with the generated random number,
        I'll generally run it with ncomponent=2 and nfeatures=5 or 10
        or 50 or 100. I am not sure if it will work in every case is
        not very highly tested. I have also attached as a file!

        
##########################################################################
        import numpy as np
        from sklearn.mixture import GaussianMixture,
        BayesianGaussianMixture
        import matplotlib.pyplot as plt

        def test_bgm(ncomponent, nfeatures):
            temp = np.random.randn(500,nfeatures)
            temp = temp + np.arange(-1,1, 2.0/nfeatures)
            temp1 = np.random.randn(400,nfeatures)
            temp1 = temp1 + np.arange(1,-1, (-2.0/nfeatures))
            X = np.vstack((temp, temp1))

            bgm =
        
BayesianGaussianMixture(ncomponent,degrees_of_freedom_prior=nfeatures*2).fit(X)

            bgm_proba = bgm.predict_proba(X)
            bgm_labels = bgm.predict(X)

            plt.figure(-1)
            plt.imshow(bgm_labels.reshape(30,-1), origin='lower',
        interpolatio='none')
            plt.colorbar()

            for i in np.arange(0,ncomponent):
                plt.figure(i)
                plt.imshow(bgm_proba[:,i].reshape(30,-1),
        origin='lower', interpolatio='none')
                plt.colorbar()

            plt.show()
        
##############################################################################

        Thank you in advance
        Tommaso


-- Please do NOT send Microsoft Office Attachments:
        http://www.gnu.org/philosophy/no-word-attachments.html
        <http://www.gnu.org/philosophy/no-word-attachments.html>

        _______________________________________________
        scikit-learn mailing list
        scikit-learn@python.org <mailto:scikit-learn@python.org>
        https://mail.python.org/mailman/listinfo/scikit-learn
        <https://mail.python.org/mailman/listinfo/scikit-learn>



    _______________________________________________
    scikit-learn mailing list
    scikit-learn@python.org <mailto:scikit-learn@python.org>
    https://mail.python.org/mailman/listinfo/scikit-learn
    <https://mail.python.org/mailman/listinfo/scikit-learn>




--
Please do NOT send Microsoft Office Attachments:
http://www.gnu.org/philosophy/no-word-attachments.html


_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Reply via email to