Hi everyone,

I am trying to apply |glasso| on a very simple as well as sparse dataset
made by 60+ features and 30k+ observations.
Here(http://www.mediafire.com/download/ek8kk0pg3jpc6ll/weight_comp_simple_prop.df.train.csv)
<https://www.mediafire.com/?ek8kk0pg3jpc6ll> you can find it in a csv
format, if you are interested in reproducing the issue.

I am using the sklearn implementation
<http://scikit-learn.org/stable/modules/generated/sklearn.covariance.GraphLasso.html#sklearn.covariance.GraphLasso.mahalanobis>
with very few lines of code, by trying different values for the
regularization coefficient α:

|for alpha in [0.00000001, 0.0000001, 0.000001, 0.00001, 0.0001]:
    glasso_model = GraphLasso(alpha=alpha, mode='lars', max_iter=2000)
    glasso_model.fit(scaled_train)
|

What I am experiencing is that the model cannot fit a covariance
estimate since it stops after raising an exception complaining about the
non PSD nature of the problem:

|/usr/local/lib/python3.4/dist-packages/sklearn/covariance/graph_lasso_.py in 
graph_lasso(emp_cov, alpha, cov_init, mode, tol, max_iter, verbose, 
return_costs, eps, return_n_iter)
    245         e.args = (e.args[0]
    246                   + '. The system is too ill-conditioned for this 
solver',)
--> 247         raise e
    248 
    249     if return_costs:

/usr/local/lib/python3.4/dist-packages/sklearn/covariance/graph_lasso_.py in 
graph_lasso(emp_cov, alpha, cov_init, mode, tol, max_iter, verbose, 
return_costs, eps, return_n_iter)
    236                 break
    237             if not np.isfinite(cost) and i > 0:
--> 238                 raise FloatingPointError('Non SPD result: the system is 
'
    239                                          'too ill-conditioned for this 
solver')
    240         else:

FloatingPointError: Non SPD result: the system is too ill-conditioned for this 
solver. The system is too ill-conditioned for this solver
|

If I try to do an mle of the covariance with another function by
sklearn(http://scikit-learn.org/stable/modules/generated/sklearn.covariance.empirical_covariance.html#sklearn.covariance.empirical_covariance)
<http://scikit-learn.org/stable/modules/generated/sklearn.covariance.empirical_covariance.html#sklearn.covariance.empirical_covariance>
(which is btw the same function that the |graph_lasso| procedure uses),
this matrix is indeed PSD. So, I suspect that the problem lies somewhere
in the computation of the code.

Now I am normalizing or standardazing the data (zero mean, 1.0 var) the
data before applying the method but the problem still persist.

The same data works nice under the R package glasso. So it may be an
sklearn issue. Ah, I am using python 3.4.

Any idea about it? Am I missing some keypoint in applying the glasso?

------------------------------------------------------------------------------
Monitor Your Dynamic Infrastructure at Any Scale With Datadog!
Get real-time metrics from all of your servers, apps and tools
in one place.
SourceForge users - Click here to start your Free Trial of Datadog now!
http://pubads.g.doubleclick.net/gampad/clk?id=241902991&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to