Hi,
I've been working on some modifications of methods in scikit-learn 
recently, and there's one deficiency of the interface that I'm having 
trouble with: errors in variables.  I know that few (perhaps none?) of 
the scikit-learn routines take measurement error into account, but it's 
an important aspect many analyses in the scientific domain.  Has anybody 
thought about the best way to include these in the scikit-learn class 
interface?

For concreteness, what I'm working on is fitting a GMM to data that has 
(correlated) Gaussian errors.  Because everything is gaussian, results 
are analytic and taking errors into account can be accomplished with a 
straightforward extension of log_multivariate_normal_density() in gmm.py.

The error information can be represented in an array of size [n_samples, 
n_features] (for diagonal errors) or [n_samples, n_features, n_features] 
(for correlated errors).  My current thought is to add this as a keyword 
'sigmaX' in the fit function, which would default to None.  Any thoughts 
on that?
   Jake

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to