2012/4/30 Jacob VanderPlas <[email protected]>:
> Hi,
> I've been working on some modifications of methods in scikit-learn
> recently, and there's one deficiency of the interface that I'm having
> trouble with: errors in variables.  I know that few (perhaps none?) of
> the scikit-learn routines take measurement error into account, but it's
> an important aspect many analyses in the scientific domain.  Has anybody
> thought about the best way to include these in the scikit-learn class
> interface?
>
> For concreteness, what I'm working on is fitting a GMM to data that has
> (correlated) Gaussian errors.  Because everything is gaussian, results
> are analytic and taking errors into account can be accomplished with a
> straightforward extension of log_multivariate_normal_density() in gmm.py.
>
> The error information can be represented in an array of size [n_samples,
> n_features] (for diagonal errors) or [n_samples, n_features, n_features]
> (for correlated errors).  My current thought is to add this as a keyword
> 'sigmaX' in the fit function, which would default to None.  Any thoughts
> on that?
>   Jake

That proposal sounds helpful.

However I am starting to think that using single letter variables for
the input data was a bad idea and using a greek letter name for the
error data seems a bad idea too. I think I would prefer a more
explicit variable names such as `errors`.

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to