2012/4/30 Jacob VanderPlas <[email protected]>: > Hi, > I've been working on some modifications of methods in scikit-learn > recently, and there's one deficiency of the interface that I'm having > trouble with: errors in variables. I know that few (perhaps none?) of > the scikit-learn routines take measurement error into account, but it's > an important aspect many analyses in the scientific domain. Has anybody > thought about the best way to include these in the scikit-learn class > interface? > > For concreteness, what I'm working on is fitting a GMM to data that has > (correlated) Gaussian errors. Because everything is gaussian, results > are analytic and taking errors into account can be accomplished with a > straightforward extension of log_multivariate_normal_density() in gmm.py. > > The error information can be represented in an array of size [n_samples, > n_features] (for diagonal errors) or [n_samples, n_features, n_features] > (for correlated errors). My current thought is to add this as a keyword > 'sigmaX' in the fit function, which would default to None. Any thoughts > on that? > Jake
That proposal sounds helpful. However I am starting to think that using single letter variables for the input data was a bad idea and using a greek letter name for the error data seems a bad idea too. I think I would prefer a more explicit variable names such as `errors`. -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
