Hi, I've been working on some modifications of methods in scikit-learn recently, and there's one deficiency of the interface that I'm having trouble with: errors in variables. I know that few (perhaps none?) of the scikit-learn routines take measurement error into account, but it's an important aspect many analyses in the scientific domain. Has anybody thought about the best way to include these in the scikit-learn class interface?
For concreteness, what I'm working on is fitting a GMM to data that has (correlated) Gaussian errors. Because everything is gaussian, results are analytic and taking errors into account can be accomplished with a straightforward extension of log_multivariate_normal_density() in gmm.py. The error information can be represented in an array of size [n_samples, n_features] (for diagonal errors) or [n_samples, n_features, n_features] (for correlated errors). My current thought is to add this as a keyword 'sigmaX' in the fit function, which would default to None. Any thoughts on that? Jake ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
