hi Chris,

you should share a gist on gist.github.com with a .npy containing the
data to reproduce the problem.

Best,
Alex

On Thu, May 15, 2014 at 2:15 AM, Chris Waigl <cwa...@alaska.edu> wrote:
> Hi sklearn community,
>
> I'm new on this list, Python user of many years, and maybe an advanced 
> beginner with scikit-learn, which I've used for a previous project. I'll just 
> jump in with my question.
>
> I'm trying to use sklearn.mixture.GMM to fit (fairly) bimodal scalar data. 
> The data values can theoretically vary between 0 and 1. They're represented 
> as a float32 Numpy arras. The scalar value is in fact calculated using a 
> combination of two spectral bands (infrared remote sensing), and I'm trying 
> to find the band combination that produces an index that best separates the 
> two modes.
>
> I find that for some band combinations (and therefore histograms), GMM very 
> nicely fits two Gaussians. Example plots of very good fits:
> https://dl.dropboxusercontent.com/u/372734/IMG/boundary_HFDI_GMM_193_216.png
> https://dl.dropboxusercontent.com/u/372734/IMG/boundary_HFDI_GMM_191_219.png
>
> Examples of bad fits (that is, one Gaussian dominates with a weight of 
> approx. 99%, the other one is flat):
> https://dl.dropboxusercontent.com/u/372734/IMG/boundary_HFDI_GMM_192_216.png
> https://dl.dropboxusercontent.com/u/372734/IMG/boundary_HFDI_GMM_193_212.png
>
> I'm calling the model as follows. The scalar index is called hfdi, and it 
> lives on a 2D grid.
>
>> from sklearn.mixture import GMM
>> ...
>> g = GMM(n_components=2)
>> g.fit(hfdi.flatten())
>
> g.converged_ nearly always returns True.
>
> I also tried to play with some of the arguments:
>> g = GMM(n_components=2, thresh=0.0001, n_init=5, n_iter=1000)
>
> ... but with no improvement other than if I reduce the threshold too much I 
> produce division-by-zero errors (I think).
>
> I only have about 200 samples. Maybe that's not enough. Any advice?
>
> Thanks,
>
> Chris Waigl
>
> --
> Chris Waigl - cwa...@alaska.edu -  +1-907-474-5483 - Skype: cwaigl_work
> Geophysical Institute, UAF, 903 Koyukuk Drive, Fairbanks, AK 99775-7320, USA
>
>
> ------------------------------------------------------------------------------
> "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
> Instantly run your Selenium tests across 300+ browser/OS combos.
> Get unparalleled scalability from the best Selenium testing platform available
> Simple to use. Nothing to install. Get started now for free."
> http://p.sf.net/sfu/SauceLabs
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------
"Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
Instantly run your Selenium tests across 300+ browser/OS combos.
Get unparalleled scalability from the best Selenium testing platform available
Simple to use. Nothing to install. Get started now for free."
http://p.sf.net/sfu/SauceLabs
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to