Hi sklearn community,

I'm new on this list, Python user of many years, and maybe an advanced beginner 
with scikit-learn, which I've used for a previous project. I'll just jump in 
with my question. 

I'm trying to use sklearn.mixture.GMM to fit (fairly) bimodal scalar data. The 
data values can theoretically vary between 0 and 1. They're represented as a 
float32 Numpy arras. The scalar value is in fact calculated using a combination 
of two spectral bands (infrared remote sensing), and I'm trying to find the 
band combination that produces an index that best separates the two modes.

I find that for some band combinations (and therefore histograms), GMM very 
nicely fits two Gaussians. Example plots of very good fits:
https://dl.dropboxusercontent.com/u/372734/IMG/boundary_HFDI_GMM_193_216.png
https://dl.dropboxusercontent.com/u/372734/IMG/boundary_HFDI_GMM_191_219.png

Examples of bad fits (that is, one Gaussian dominates with a weight of approx. 
99%, the other one is flat):
https://dl.dropboxusercontent.com/u/372734/IMG/boundary_HFDI_GMM_192_216.png
https://dl.dropboxusercontent.com/u/372734/IMG/boundary_HFDI_GMM_193_212.png

I'm calling the model as follows. The scalar index is called hfdi, and it lives 
on a 2D grid.

> from sklearn.mixture import GMM
> ...
> g = GMM(n_components=2)
> g.fit(hfdi.flatten())

g.converged_ nearly always returns True. 

I also tried to play with some of the arguments:
> g = GMM(n_components=2, thresh=0.0001, n_init=5, n_iter=1000)
 
... but with no improvement other than if I reduce the threshold too much I 
produce division-by-zero errors (I think). 

I only have about 200 samples. Maybe that's not enough. Any advice?

Thanks,

Chris Waigl

-- 
Chris Waigl - cwa...@alaska.edu -  +1-907-474-5483 - Skype: cwaigl_work
Geophysical Institute, UAF, 903 Koyukuk Drive, Fairbanks, AK 99775-7320, USA


------------------------------------------------------------------------------
"Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
Instantly run your Selenium tests across 300+ browser/OS combos.
Get unparalleled scalability from the best Selenium testing platform available
Simple to use. Nothing to install. Get started now for free."
http://p.sf.net/sfu/SauceLabs
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to