Hello everyone, what's needed is not difficult but far from trivial, I can be a co-champ if someone wants to work in teams.
Sadly I have my day-to-day job in which I already have to put lots of time..........this would be more fun Cheers! Demian -- Demian Wassermann, PhD [email protected] LMI / PNL / SPL Labs Harvard Medical School Brigham and Women's Hospital 1249 Boylston, Boston, MA, USA On Nov 29, 2011, at 7:24 PM, Olivier Grisel wrote: > 2011/11/29 Kenneth C. Arnold <[email protected]>: >> On Tue, Nov 29, 2011 at 4:53 PM, Olivier Grisel >> <[email protected]> wrote: >>> Now back to you problem I think we should support fitting models with >>> just one sample just for the sake of consistency / continuity even if >>> theds is no practical application of fitting models with a single >>> sample: fitting models with 2 samples would be almost as stupid as >>> fitting a model with only one sample and there is no principled or >>> natural, pre-determined threshold I know of that would give us the >>> minimum number of samples to provide to an estimator. >>> >>> IMHO this is a bug. GaussianProcess and other scikit-learn estimators >>> should accept to fit with singleton training sets and provide >>> predictions that are mathematically consistent even if useless in >>> practice. >> >> I misspoke earlier: the MLE for a GP conditioned on a single point is >> just the value at that point, just as the maximum likelihood predictor >> for a Gaussian fit to one data point is that data point. (The variance >> is indeed ill-posed, but the prediction is just the mean.) > > That makes sense. Fortunately we don't have an API to compute the > expected variance of a prediction :) > >> https://github.com/scikit-learn/scikit-learn/pull/97 looks like >> activity fizzled right as it was about ready to merge. What's the >> status? [Yes, I'm cautiously expressing and gauging interest without >> implicitly promising work.] > > Indeed this pull request good forgotten and need a champion to revive > it: upgrade it to the current status of the master and give a status > of the pending points that were raised in the previous comments, make > sure that the documentation is up to date and that the test pass with > a good coverage. > > -- > Olivier > http://twitter.com/ogrisel - http://github.com/ogrisel > > ------------------------------------------------------------------------------ > All the data continuously generated in your IT infrastructure > contains a definitive record of customers, application performance, > security threats, fraudulent activity, and more. Splunk takes this > data and makes sense of it. IT sense. And common sense. > http://p.sf.net/sfu/splunk-novd2d > _______________________________________________ > Scikit-learn-general mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general ------------------------------------------------------------------------------ All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-novd2d _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
