On Wed, Nov 9, 2011 at 12:20 PM, Virgile Fritsch <[email protected]> wrote: > Reminds me of the PR by Robert about performing clustering from similarity > matrix or directly from the data. > So I would be in favour of having a X_is_cov keyword. > > Sorry for biasing the discussion with cov_init, I answered to quikly ;) > > On Wed, Nov 9, 2011 at 5:16 PM, Gael Varoquaux > <[email protected]> wrote: >> >> On Wed, Nov 09, 2011 at 10:05:53AM -0500, [email protected] wrote: >> > graph_lasso(X,....) takes the data array as an argument, but except >> > calculating the empirical_covariance at the beginning X is not used >> > anymore, as far as I could see. >> >> > The algorithm looks very interesting, but I would have cases where I >> > need to calculate the empirical_covariance myself (e.g. long run >> > covariance which is a weighted average of covariance and covariance >> > with lags). >> >> > Would it be possible to use an empirical covariance instead of X as >> > the main argument, or would you get design inconsistencies? >> >> That's a very good remark, and there are other situations in it arises. >> Indeed, the empirical covariance matrix is a sufficient statistic for the >> population covariance matrix in the case of Gaussian models, so there are >> many models in which the situation arises, for instance the oracle >> approximate shrinkage. >> >> On the other hand, some models don't rely on the Gaussian assumption. >> Therefore, they use the full X data, and not just the empirical >> covariance. For instance the Ledoit-Wolf estimator. >> >> My gut feeling is that the estimator object should really take X by >> default, but I don't see why the function itself could not take a >> covariance matrix as an input. Of course, people can misuse it, and put >> in a shrunk covariance matrix (my guess it that they will), and we just >> have to accept it. >> >> Actually, I would almost favor an optional argument to the estimator so >> that it can take a covariance matrix as an input. This would be similar >> to the behavior of the kernel PCA with kernel='precomputed'. I used to >> have a 'data_is_cov' boolean keyword argument in my codebase. I could >> turn it into a 'X_is_cov' one. >> >> There are situations in which I would be interested in using the estimator >> object and, like you, I cannot afford carrying around the full time >> series. This can be useful for instance to use the cross-validated >> estimator, which carries a fair amount of logic to do the parameter >> search, or to compare different estimators. This sort of breaks the >> cross validation in the scikit, but not completely, as tricks can be used >> passing in lists of empirical covariances. >> >> What do people think? Should I: >> >> 1. change graph_lasso to take the empirical covariance as an input >> >> 2. add an 'X_is_cov' parameter to the estimators >> >> Gael >> >> PS: As noted by Joseph: cov_init doesn't answer this usecase.
Thanks for considering this, I leave any implementation discussion to you. Josef with an "f" who is not French (although my father-in-law is French and spelled with "ph" and our older son has "ph" in his first and middle names but he is a Canadian.) >> >> >> ------------------------------------------------------------------------------ >> RSA(R) Conference 2012 >> Save $700 by Nov 18 >> Register now >> http://p.sf.net/sfu/rsa-sfdev2dev1 >> _______________________________________________ >> Scikit-learn-general mailing list >> [email protected] >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > > > ------------------------------------------------------------------------------ > RSA(R) Conference 2012 > Save $700 by Nov 18 > Register now > http://p.sf.net/sfu/rsa-sfdev2dev1 > _______________________________________________ > Scikit-learn-general mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > > ------------------------------------------------------------------------------ RSA(R) Conference 2012 Save $700 by Nov 18 Register now http://p.sf.net/sfu/rsa-sfdev2dev1 _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
