To scikit-learn-general,
I fit the bimodal 1D distribution with the strong overlap of Gaussian
components using scikits.mixture.GMM. The scikits.mixture.GMM.fit gives
result which is inconsistent with parameters of input distribution.
The code below demonstrates the issue.
In case the two components are well separated, for example (mu1 = -1.5 in
the code), the fit produces correct results.
I would be grateful for any information on constraints of
scikits.mixture.GMM.fit and on possibility to obtain appropriate results in
case of strong overlap of Gaussian components.
Sorry if this is not the appropriate mail list for such questions.
Best regards,
Dmitry
import numpy as np
from sklearn import mixture # sklearn v0.13.1
np.random.seed(1)
g = mixture.GMM(n_components=2, covariance_type='full')
n = 10000
frac2 = 0.1
mu1 = -0.5
std1 = 0.5
mu2 = 0.0
std2 = 0.2
obs = np.concatenate( (np.random.normal(mu1, std1, np.int(n*(1-frac2))), \
np.random.normal(mu2, std2, np.int(n*frac2))))
g.fit(obs)
print 'fractions: '
print np.round(g.weights_, 2)
print 'means: '
print np.round(g.means_, 2)
print 'stds: '
print np.round(np.sqrt(g._get_covars()), 2)
#output:
#fractions:
#[ 0.48 0.52]
#means:
#[[-0.74]
# [-0.18]]
#stds:
#[[[ 0.45]]
#
# [[ 0.4 ]]]
------------------------------------------------------------------------------
Flow-based real-time traffic analytics software. Cisco certified tool.
Monitor traffic, SLAs, QoS, Medianet, WAAS etc. with NetFlow Analyzer
Customize your own dashboards, set traffic alerts and generate reports.
Network behavioral analysis & security monitoring. All-in-one tool.
http://pubads.g.doubleclick.net/gampad/clk?id=126839071&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general