Reminds me of the PR by Robert about performing clustering from similarity
matrix or directly from the data.
So I would be in favour of having a X_is_cov keyword.

Sorry for biasing the discussion with cov_init, I answered to quikly ;)

On Wed, Nov 9, 2011 at 5:16 PM, Gael Varoquaux <
[email protected]> wrote:

> On Wed, Nov 09, 2011 at 10:05:53AM -0500, [email protected] wrote:
> > graph_lasso(X,....) takes the data array as an argument, but except
> > calculating the empirical_covariance at the beginning X is not used
> > anymore, as far as I could see.
>
> > The algorithm looks very interesting, but I would have cases where I
> > need to calculate the empirical_covariance myself (e.g. long run
> > covariance which is a weighted average of covariance and covariance
> > with lags).
>
> > Would it be possible to use an empirical covariance instead of X as
> > the main argument, or would you get design inconsistencies?
>
> That's a very good remark, and there are other situations in it arises.
> Indeed, the empirical covariance matrix is a sufficient statistic for the
> population covariance matrix in the case of Gaussian models, so there are
> many models in which the situation arises, for instance the oracle
> approximate shrinkage.
>
> On the other hand, some models don't rely on the Gaussian assumption.
> Therefore, they use the full X data, and not just the empirical
> covariance. For instance the Ledoit-Wolf estimator.
>
> My gut feeling is that the estimator object should really take X by
> default, but I don't see why the function itself could not take a
> covariance matrix as an input. Of course, people can misuse it, and put
> in a shrunk covariance matrix (my guess it that they will), and we just
> have to accept it.
>
> Actually, I would almost favor an optional argument to the estimator so
> that it can take a covariance matrix as an input. This would be similar
> to the behavior of the kernel PCA with kernel='precomputed'. I used to
> have a 'data_is_cov' boolean keyword argument in my codebase. I could
> turn it into a 'X_is_cov' one.
>
> There are situations in which I would be interested in using the estimator
> object and, like you, I cannot afford carrying around the full time
> series. This can be useful for instance to use the cross-validated
> estimator, which carries a fair amount of logic to do the parameter
> search, or to compare different estimators. This sort of breaks the
> cross validation in the scikit, but not completely, as tricks can be used
> passing in lists of empirical covariances.
>
> What do people think? Should I:
>
>  1. change graph_lasso to take the empirical covariance as an input
>
>  2. add an 'X_is_cov' parameter to the estimators
>
> Gael
>
> PS: As noted by Joseph: cov_init doesn't answer this usecase.
>
>
> ------------------------------------------------------------------------------
> RSA(R) Conference 2012
> Save $700 by Nov 18
> Register now
> http://p.sf.net/sfu/rsa-sfdev2dev1
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
------------------------------------------------------------------------------
RSA(R) Conference 2012
Save $700 by Nov 18
Register now
http://p.sf.net/sfu/rsa-sfdev2dev1
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to