Re: [scikit-learn] Bayesian Gaussian Mixture

Andreas Mueller Mon, 28 Nov 2016 08:58:43 -0800

Hi Tommaso.

So what's the issue? The distributions are very distinct, so there is noconfusion.The higher the dimensionality, the further apart the points are (comparethe distance between (-1, 1) and (1, -1) to the one between (-1, -.5, 0,.5, 1) and (1, .5, 0, -.5, -1).

I'm not sure what you mean by "the cross in the middle".

You create two fixed points, one at np.arange(-1,1, 2.0/nfeatures) andone at np.arange(1,-1, (-2.0/nfeatures)). In high dimensions, thesepoints are very far apart.Then you add standard normal noise to it. So this data is two perfectGaussians. In low dimensions, they are "close together" so there is someconfusion,

in high dimensions, they are "far apart" so there is less confusion.


Hth,
Andy

On 11/27/2016 11:47 AM, Tommaso Costanzo wrote:

Hi Jacob,

I have just changed my code from BayesianGaussianMixture toGaussianMixture, and the results is the same. I attached here thepicture of the first component when I runned the code with 5, 10, and50 nfeatures and 2 components. In my short test function I expect tohave point that they can be in one component as well as another hasvisible for small number of nfeatures, but 0 1 for nfeatures >50 doesnot sounds correct. Seems that is just related to the size of themodel and in particular to the number of features. With theBayesianGaussianMixture I have seen that it is sligthly better toincrease the degree of freedoms to 2*nfeatures instead of the defaultnfeatures. However, this does not change the result when the nfeaturesare 50 or more.


Thank you in advance
Tommaso

2016-11-25 21:32 GMT-05:00 Jacob Schreiber <jmschreibe...@gmail.com<mailto:jmschreibe...@gmail.com>>:


    Typically this means that the model is so confident in its
    predictions it does not believe it possible for the sample to come
    from the other component. Do you get the same results with a
    regular GaussianMixture?

    On Fri, Nov 25, 2016 at 11:34 AM, Tommaso Costanzo
    <tommaso.costanz...@gmail.com
    <mailto:tommaso.costanz...@gmail.com>> wrote:

        Hi,

        I am facing some problem with the "BayesianGaussianMixture"
        function, but I do not know if it is because of my poor
        knowledge on this type of statistics or if it is something
        related to the algorithm. I have set of data of around 1000 to
        4000 observation (every feature is a spectrum of around 200
        point) so in the end I have n_samples = ~1000 and n_features =
        ~20. The good things is that I am getting the same results of
        KMeans however the "predict_proba" has value only of 0 or 1.

        I have wrote a small function to simulate my problem with
        random data that is reported below. The first 1/2 of the array
        has the point with a positive slope while the second 1/2 has a
        negative slope, so the cross in the middle. What I have seen
        is that for a small number of features I obtain good
        probability, but if the number of features increases (say 50)
        than the probability become only 0 or 1.
        Can someone help me in interpret this result?

        Here is the code I wrote with the generated random number,
        I'll generally run it with ncomponent=2 and nfeatures=5 or 10
        or 50 or 100. I am not sure if it will work in every case is
        not very highly tested. I have also attached as a file!

        
##########################################################################
        import numpy as np
        from sklearn.mixture import GaussianMixture,
        BayesianGaussianMixture
        import matplotlib.pyplot as plt

        def test_bgm(ncomponent, nfeatures):
            temp = np.random.randn(500,nfeatures)
            temp = temp + np.arange(-1,1, 2.0/nfeatures)
            temp1 = np.random.randn(400,nfeatures)
            temp1 = temp1 + np.arange(1,-1, (-2.0/nfeatures))
            X = np.vstack((temp, temp1))

            bgm =
        
BayesianGaussianMixture(ncomponent,degrees_of_freedom_prior=nfeatures*2).fit(X)

            bgm_proba = bgm.predict_proba(X)
            bgm_labels = bgm.predict(X)

            plt.figure(-1)
            plt.imshow(bgm_labels.reshape(30,-1), origin='lower',
        interpolatio='none')
            plt.colorbar()

            for i in np.arange(0,ncomponent):
                plt.figure(i)
                plt.imshow(bgm_proba[:,i].reshape(30,-1),
        origin='lower', interpolatio='none')
                plt.colorbar()

            plt.show()
        
##############################################################################

        Thank you in advance
        Tommaso

--Please do NOT send Microsoft Office Attachments:

        http://www.gnu.org/philosophy/no-word-attachments.html
        <http://www.gnu.org/philosophy/no-word-attachments.html>

        _______________________________________________
        scikit-learn mailing list
        scikit-learn@python.org <mailto:scikit-learn@python.org>
        https://mail.python.org/mailman/listinfo/scikit-learn
        <https://mail.python.org/mailman/listinfo/scikit-learn>



    _______________________________________________
    scikit-learn mailing list
    scikit-learn@python.org <mailto:scikit-learn@python.org>
    https://mail.python.org/mailman/listinfo/scikit-learn
    <https://mail.python.org/mailman/listinfo/scikit-learn>




--
Please do NOT send Microsoft Office Attachments:
http://www.gnu.org/philosophy/no-word-attachments.html


_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] Bayesian Gaussian Mixture

Reply via email to