There are plenty of examples and plots on the scikit-learn website.

On 11/30/2016 12:17 PM, Tommaso Costanzo wrote:

Dear Andreas,

thank you so much for your answser now I can see my mistake. What I am trying to do is convince myself that the fact that when I analyze my data I am getting probability of only 0 and 1 is it because the data are well separated so I was trying to make some synthetic data where there is a probabioity different from 0 or 1, but I did it in the wrong way. Does it sounds correct if I make 300 samples with random number centered at 0 and STD 1 and other 300 centered at 0.5 and then adding some samples in between these two gaussian distributions (say in between 0.15 and 0.35)? In this case I think that I should expect probability different from 0 or 1 in the two components (when using 2 components).

Thank you in advance
Tommaso

On Nov 28, 2016 11:58 AM, "Andreas Mueller" <t3k...@gmail.com <mailto:t3k...@gmail.com>> wrote:

    Hi Tommaso.
    So what's the issue? The distributions are very distinct, so there
    is no confusion.
    The higher the dimensionality, the further apart the points are
    (compare the distance between (-1, 1) and (1, -1) to the one
    between (-1, -.5, 0, .5, 1)  and (1, .5, 0, -.5, -1).
    I'm not sure what you mean by "the cross in the middle".
    You create two fixed points, one at np.arange(-1,1, 2.0/nfeatures)
    and one at np.arange(1,-1, (-2.0/nfeatures)). In high dimensions,
    these points are very far apart.
    Then you add standard normal noise to it. So this data is two
    perfect Gaussians. In low dimensions, they are "close together" so
    there is some confusion,
    in high dimensions, they are "far apart" so there is less confusion.

    Hth,
    Andy

    On 11/27/2016 11:47 AM, Tommaso Costanzo wrote:
    Hi Jacob,

    I have just changed my code from BayesianGaussianMixture to
    GaussianMixture, and the results is the same. I attached here the
    picture of the first component when I runned the code with 5, 10,
    and 50 nfeatures and 2 components. In my short test function I
    expect to have point that they can be in one component as well as
    another has visible for small number of nfeatures, but 0 1 for
    nfeatures >50 does  not sounds correct. Seems that is just
    related to the size of the model and in particular to the number
    of features. With the BayesianGaussianMixture I have seen that it
    is sligthly better to increase the degree of freedoms to
    2*nfeatures instead of the default nfeatures. However, this does
    not change the result when the nfeatures are 50 or more.

    Thank you in advance
    Tommaso

    2016-11-25 21:32 GMT-05:00 Jacob Schreiber
    <jmschreibe...@gmail.com <mailto:jmschreibe...@gmail.com>>:

        Typically this means that the model is so confident in its
        predictions it does not believe it possible for the sample to
        come from the other component. Do you get the same results
        with a regular GaussianMixture?

        On Fri, Nov 25, 2016 at 11:34 AM, Tommaso Costanzo
        <tommaso.costanz...@gmail.com
        <mailto:tommaso.costanz...@gmail.com>> wrote:

            Hi,

            I am facing some problem with the
            "BayesianGaussianMixture" function, but I do not know if
            it is because of my poor knowledge on this type of
            statistics or if it is something related to the
            algorithm. I have set of data of around 1000 to 4000
            observation (every feature is a spectrum of around 200
            point) so in the end I have n_samples = ~1000 and
            n_features = ~20. The good things is that I am getting
            the same results of KMeans however the "predict_proba"
            has value only of 0 or 1.

            I have wrote a small function to simulate my problem with
            random data that is reported below. The first 1/2 of the
            array has the point with a positive slope while the
            second 1/2 has a negative slope, so the cross in the
            middle. What I have seen is that for a small number of
            features I obtain good probability, but if the number of
            features increases (say 50) than the probability become
            only 0 or 1.
            Can someone help me in interpret this result?

            Here is the code I wrote with the generated random
            number, I'll generally run it with ncomponent=2 and
            nfeatures=5 or 10 or 50 or 100. I am not sure if it will
            work in every case is not very highly tested. I have also
            attached as a file!

            
##########################################################################
            import numpy as np
            from sklearn.mixture import GaussianMixture,
            BayesianGaussianMixture
            import matplotlib.pyplot as plt

            def test_bgm(ncomponent, nfeatures):
                temp = np.random.randn(500,nfeatures)
                temp = temp + np.arange(-1,1, 2.0/nfeatures)
                temp1 = np.random.randn(400,nfeatures)
                temp1 = temp1 + np.arange(1,-1, (-2.0/nfeatures))
                X = np.vstack((temp, temp1))

                bgm =
            
BayesianGaussianMixture(ncomponent,degrees_of_freedom_prior=nfeatures*2).fit(X)

                bgm_proba = bgm.predict_proba(X)
                bgm_labels = bgm.predict(X)

                plt.figure(-1)
                plt.imshow(bgm_labels.reshape(30,-1), origin='lower',
            interpolatio='none')
                plt.colorbar()

                for i in np.arange(0,ncomponent):
            plt.figure(i)
            plt.imshow(bgm_proba[:,i].reshape(30,-1), origin='lower',
            interpolatio='none')
            plt.colorbar()

                plt.show()
            
##############################################################################

            Thank you in advance
            Tommaso


-- Please do NOT send Microsoft Office Attachments:
            http://www.gnu.org/philosophy/no-word-attachments.html
            <http://www.gnu.org/philosophy/no-word-attachments.html>

            _______________________________________________
            scikit-learn mailing list
            scikit-learn@python.org <mailto:scikit-learn@python.org>
            https://mail.python.org/mailman/listinfo/scikit-learn
            <https://mail.python.org/mailman/listinfo/scikit-learn>



        _______________________________________________
        scikit-learn mailing list
        scikit-learn@python.org <mailto:scikit-learn@python.org>
        https://mail.python.org/mailman/listinfo/scikit-learn
        <https://mail.python.org/mailman/listinfo/scikit-learn>




-- Please do NOT send Microsoft Office Attachments:
    http://www.gnu.org/philosophy/no-word-attachments.html
    <http://www.gnu.org/philosophy/no-word-attachments.html>


    _______________________________________________
    scikit-learn mailing list
    scikit-learn@python.org <mailto:scikit-learn@python.org>
    https://mail.python.org/mailman/listinfo/scikit-learn
    <https://mail.python.org/mailman/listinfo/scikit-learn>
    _______________________________________________ scikit-learn
    mailing list scikit-learn@python.org
    <mailto:scikit-learn@python.org>
    https://mail.python.org/mailman/listinfo/scikit-learn
<https://mail.python.org/mailman/listinfo/scikit-learn>
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Reply via email to