On Wed, Nov 9, 2011 at 12:20 PM, Virgile Fritsch
<[email protected]> wrote:
> Reminds me of the PR by Robert about performing clustering from similarity
> matrix or directly from the data.
> So I would be in favour of having a X_is_cov keyword.
>
> Sorry for biasing the discussion with cov_init, I answered to quikly ;)
>
> On Wed, Nov 9, 2011 at 5:16 PM, Gael Varoquaux
> <[email protected]> wrote:
>>
>> On Wed, Nov 09, 2011 at 10:05:53AM -0500, [email protected] wrote:
>> > graph_lasso(X,....) takes the data array as an argument, but except
>> > calculating the empirical_covariance at the beginning X is not used
>> > anymore, as far as I could see.
>>
>> > The algorithm looks very interesting, but I would have cases where I
>> > need to calculate the empirical_covariance myself (e.g. long run
>> > covariance which is a weighted average of covariance and covariance
>> > with lags).
>>
>> > Would it be possible to use an empirical covariance instead of X as
>> > the main argument, or would you get design inconsistencies?
>>
>> That's a very good remark, and there are other situations in it arises.
>> Indeed, the empirical covariance matrix is a sufficient statistic for the
>> population covariance matrix in the case of Gaussian models, so there are
>> many models in which the situation arises, for instance the oracle
>> approximate shrinkage.
>>
>> On the other hand, some models don't rely on the Gaussian assumption.
>> Therefore, they use the full X data, and not just the empirical
>> covariance. For instance the Ledoit-Wolf estimator.
>>
>> My gut feeling is that the estimator object should really take X by
>> default, but I don't see why the function itself could not take a
>> covariance matrix as an input. Of course, people can misuse it, and put
>> in a shrunk covariance matrix (my guess it that they will), and we just
>> have to accept it.
>>
>> Actually, I would almost favor an optional argument to the estimator so
>> that it can take a covariance matrix as an input. This would be similar
>> to the behavior of the kernel PCA with kernel='precomputed'. I used to
>> have a 'data_is_cov' boolean keyword argument in my codebase. I could
>> turn it into a 'X_is_cov' one.
>>
>> There are situations in which I would be interested in using the estimator
>> object and, like you, I cannot afford carrying around the full time
>> series. This can be useful for instance to use the cross-validated
>> estimator, which carries a fair amount of logic to do the parameter
>> search, or to compare different estimators. This sort of breaks the
>> cross validation in the scikit, but not completely, as tricks can be used
>> passing in lists of empirical covariances.
>>
>> What do people think? Should I:
>>
>>  1. change graph_lasso to take the empirical covariance as an input
>>
>>  2. add an 'X_is_cov' parameter to the estimators
>>
>> Gael
>>
>> PS: As noted by Joseph: cov_init doesn't answer this usecase.

Thanks for considering this, I leave any implementation discussion to you.

Josef
with an "f" who is not French
(although my father-in-law is French and spelled with "ph" and our
older son has "ph" in his first and middle names but he is a
Canadian.)

>>
>>
>> ------------------------------------------------------------------------------
>> RSA(R) Conference 2012
>> Save $700 by Nov 18
>> Register now
>> http://p.sf.net/sfu/rsa-sfdev2dev1
>> _______________________________________________
>> Scikit-learn-general mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
> ------------------------------------------------------------------------------
> RSA(R) Conference 2012
> Save $700 by Nov 18
> Register now
> http://p.sf.net/sfu/rsa-sfdev2dev1
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>

------------------------------------------------------------------------------
RSA(R) Conference 2012
Save $700 by Nov 18
Register now
http://p.sf.net/sfu/rsa-sfdev2dev1
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to