Hi everyone,
I am trying to apply |glasso| on a very simple as well as sparse dataset
made by 60+ features and 30k+ observations.
Here(http://www.mediafire.com/download/ek8kk0pg3jpc6ll/weight_comp_simple_prop.df.train.csv)
<https://www.mediafire.com/?ek8kk0pg3jpc6ll> you can find it in a csv
format, if you are interested in reproducing the issue.
I am using the sklearn implementation
<http://scikit-learn.org/stable/modules/generated/sklearn.covariance.GraphLasso.html#sklearn.covariance.GraphLasso.mahalanobis>
with very few lines of code, by trying different values for the
regularization coefficient α:
|for alpha in [0.00000001, 0.0000001, 0.000001, 0.00001, 0.0001]:
glasso_model = GraphLasso(alpha=alpha, mode='lars', max_iter=2000)
glasso_model.fit(scaled_train)
|
What I am experiencing is that the model cannot fit a covariance
estimate since it stops after raising an exception complaining about the
non PSD nature of the problem:
|/usr/local/lib/python3.4/dist-packages/sklearn/covariance/graph_lasso_.py in
graph_lasso(emp_cov, alpha, cov_init, mode, tol, max_iter, verbose,
return_costs, eps, return_n_iter)
245 e.args = (e.args[0]
246 + '. The system is too ill-conditioned for this
solver',)
--> 247 raise e
248
249 if return_costs:
/usr/local/lib/python3.4/dist-packages/sklearn/covariance/graph_lasso_.py in
graph_lasso(emp_cov, alpha, cov_init, mode, tol, max_iter, verbose,
return_costs, eps, return_n_iter)
236 break
237 if not np.isfinite(cost) and i > 0:
--> 238 raise FloatingPointError('Non SPD result: the system is
'
239 'too ill-conditioned for this
solver')
240 else:
FloatingPointError: Non SPD result: the system is too ill-conditioned for this
solver. The system is too ill-conditioned for this solver
|
If I try to do an mle of the covariance with another function by
sklearn(http://scikit-learn.org/stable/modules/generated/sklearn.covariance.empirical_covariance.html#sklearn.covariance.empirical_covariance)
<http://scikit-learn.org/stable/modules/generated/sklearn.covariance.empirical_covariance.html#sklearn.covariance.empirical_covariance>
(which is btw the same function that the |graph_lasso| procedure uses),
this matrix is indeed PSD. So, I suspect that the problem lies somewhere
in the computation of the code.
Now I am normalizing or standardazing the data (zero mean, 1.0 var) the
data before applying the method but the problem still persist.
The same data works nice under the R package glasso. So it may be an
sklearn issue. Ah, I am using python 3.4.
Any idea about it? Am I missing some keypoint in applying the glasso?
------------------------------------------------------------------------------
Monitor Your Dynamic Infrastructure at Any Scale With Datadog!
Get real-time metrics from all of your servers, apps and tools
in one place.
SourceForge users - Click here to start your Free Trial of Datadog now!
http://pubads.g.doubleclick.net/gampad/clk?id=241902991&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general