hi Chris, you should share a gist on gist.github.com with a .npy containing the data to reproduce the problem.
Best, Alex On Thu, May 15, 2014 at 2:15 AM, Chris Waigl <cwa...@alaska.edu> wrote: > Hi sklearn community, > > I'm new on this list, Python user of many years, and maybe an advanced > beginner with scikit-learn, which I've used for a previous project. I'll just > jump in with my question. > > I'm trying to use sklearn.mixture.GMM to fit (fairly) bimodal scalar data. > The data values can theoretically vary between 0 and 1. They're represented > as a float32 Numpy arras. The scalar value is in fact calculated using a > combination of two spectral bands (infrared remote sensing), and I'm trying > to find the band combination that produces an index that best separates the > two modes. > > I find that for some band combinations (and therefore histograms), GMM very > nicely fits two Gaussians. Example plots of very good fits: > https://dl.dropboxusercontent.com/u/372734/IMG/boundary_HFDI_GMM_193_216.png > https://dl.dropboxusercontent.com/u/372734/IMG/boundary_HFDI_GMM_191_219.png > > Examples of bad fits (that is, one Gaussian dominates with a weight of > approx. 99%, the other one is flat): > https://dl.dropboxusercontent.com/u/372734/IMG/boundary_HFDI_GMM_192_216.png > https://dl.dropboxusercontent.com/u/372734/IMG/boundary_HFDI_GMM_193_212.png > > I'm calling the model as follows. The scalar index is called hfdi, and it > lives on a 2D grid. > >> from sklearn.mixture import GMM >> ... >> g = GMM(n_components=2) >> g.fit(hfdi.flatten()) > > g.converged_ nearly always returns True. > > I also tried to play with some of the arguments: >> g = GMM(n_components=2, thresh=0.0001, n_init=5, n_iter=1000) > > ... but with no improvement other than if I reduce the threshold too much I > produce division-by-zero errors (I think). > > I only have about 200 samples. Maybe that's not enough. Any advice? > > Thanks, > > Chris Waigl > > -- > Chris Waigl - cwa...@alaska.edu - +1-907-474-5483 - Skype: cwaigl_work > Geophysical Institute, UAF, 903 Koyukuk Drive, Fairbanks, AK 99775-7320, USA > > > ------------------------------------------------------------------------------ > "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE > Instantly run your Selenium tests across 300+ browser/OS combos. > Get unparalleled scalability from the best Selenium testing platform available > Simple to use. Nothing to install. Get started now for free." > http://p.sf.net/sfu/SauceLabs > _______________________________________________ > Scikit-learn-general mailing list > Scikit-learn-general@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general ------------------------------------------------------------------------------ "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE Instantly run your Selenium tests across 300+ browser/OS combos. Get unparalleled scalability from the best Selenium testing platform available Simple to use. Nothing to install. Get started now for free." http://p.sf.net/sfu/SauceLabs _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general