Hi,

I am facing some problem with the "BayesianGaussianMixture" function, but I
do not know if it is because of my poor knowledge on this type of
statistics or if it is something related to the algorithm. I have set of
data of around 1000 to 4000 observation (every feature is a spectrum of
around 200 point) so in the end I have n_samples = ~1000 and n_features =
~20. The good things is that I am getting the same results of KMeans
however the "predict_proba" has value only of 0 or 1.

I have wrote a small function to simulate my problem with random data that
is reported below. The first 1/2 of the array has the point with a positive
slope while the second 1/2 has a negative slope, so the cross in the
middle. What I have seen is that for a small number of features I obtain
good probability, but if the number of features increases (say 50) than the
probability become only 0 or 1.
Can someone help me in interpret this result?

Here is the code I wrote with the generated random number, I'll generally
run it with ncomponent=2 and nfeatures=5 or 10 or 50 or 100. I am not sure
if it will work in every case is not very highly tested. I have also
attached as a file!

##########################################################################
import numpy as
np

from sklearn.mixture import GaussianMixture,
BayesianGaussianMixture
import matplotlib.pyplot as
plt


def test_bgm(ncomponent,
nfeatures):
    temp =
np.random.randn(500,nfeatures)

    temp = temp + np.arange(-1,1,
2.0/nfeatures)
    temp1 =
np.random.randn(400,nfeatures)

    temp1 = temp1 + np.arange(1,-1,
(-2.0/nfeatures))
    X = np.vstack((temp,
temp1))


    bgm =
BayesianGaussianMixture(ncomponent,degrees_of_freedom_prior=nfeatures*2).fit(X)

    bgm_proba =
bgm.predict_proba(X)

    bgm_labels =
bgm.predict(X)




plt.figure(-1)

    plt.imshow(bgm_labels.reshape(30,-1), origin='lower',
interpolatio='none')

plt.colorbar()



    for i in
np.arange(0,ncomponent):


plt.figure(i)

        plt.imshow(bgm_proba[:,i].reshape(30,-1), origin='lower',
interpolatio='none')

plt.colorbar()



    plt.show()
##############################################################################

Thank you in advance
Tommaso


-- 
Please do NOT send Microsoft Office Attachments:
http://www.gnu.org/philosophy/no-word-attachments.html
import numpy as np
from sklearn.mixture import GaussianMixture, BayesianGaussianMixture
import matplotlib.pyplot as plt

def test_bgm(ncomponent, nfeatures):
    temp = np.random.randn(500,nfeatures)
    temp = temp + np.arange(-1,1, 2.0/nfeatures)
    temp1 = np.random.randn(400,nfeatures)
    temp1 = temp1 + np.arange(1,-1, (-2.0/nfeatures))
    X = np.vstack((temp, temp1))

    bgm = BayesianGaussianMixture(ncomponent,degrees_of_freedom_prior=nfeatures*2).fit(X)
    bgm_proba = bgm.predict_proba(X)
    bgm_labels = bgm.predict(X)

    plt.figure(-1)
    plt.imshow(bgm_labels.reshape(30,-1), origin='lower', interpolatio='none')
    plt.colorbar()

    for i in np.arange(0,ncomponent):
        plt.figure(i)
        plt.imshow(bgm_proba[:,i].reshape(30,-1), origin='lower', interpolatio='none')
        plt.colorbar()

    plt.show()
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Reply via email to