[Scikit-learn-general] buggy sparsify() / count_nonzero() combination ?

Eustache DIEMERT Mon, 02 Dec 2013 03:14:44 -0800

Hi List,

I was counting the number of non-zero coefficients of a SGDClassifier and
got a very strange ValueError after calling predict() again.


After some research it seems that our own sklearn.utils.fixes.count_nonzero
has a side effect that changes the type of the coef_ matrix. Any further
calls to densify/sparsify/predict just fails with this exception.

It seems that a similar bug is pending here:
https://github.com/scikit-learn/scikit-learn/issues/1968 ; on my code it
breaks with different scipy versions: .11, .12, .13, .13.1, numpy 1.8.0 and
sklearn git/HEAD.

Here is the traceback:
"""
Traceback (most recent call last):
  File
"/home/ediemert/sources/oss/scikit-learn/examples/applications/sgd_elasticnet_bug.py",
line 22, in <module>
    clf.predict(X_test)
  File
"/home/ediemert/sources/oss/scikit-learn/sklearn/linear_model/base.py",
line 223, in predict
    scores = self.decision_function(X)
  File
"/home/ediemert/sources/oss/scikit-learn/sklearn/linear_model/base.py",
line 199, in decision_function
    X = atleast2d_or_csr(X)
  File
"/home/ediemert/sources/oss/scikit-learn/sklearn/utils/validation.py", line
148, in atleast2d_or_csr
    "tocsr", force_all_finite)
  File
"/home/ediemert/sources/oss/scikit-learn/sklearn/utils/validation.py", line
123, in _atleast2d_or_sparse
    _assert_all_finite(X.data)
  File
"/home/ediemert/sources/oss/scikit-learn/sklearn/utils/validation.py", line
39, in _assert_all_finite
    if (X.dtype.char in np.typecodes['AllFloat'] and not
np.isfinite(X.sum())
  File "/usr/local/lib/python2.7/dist-packages/numpy/core/_methods.py",
line 25, in _sum
    out=out, keepdims=keepdims)
  File "/usr/local/lib/python2.7/dist-packages/scipy/sparse/base.py", line
183, in __bool__
    raise ValueError("The truth value of an array with more than one "
ValueError: The truth value of an array with more than one element is
ambiguous. Use a.any() or a.all().
"""

Ans a minimal script to reproduce the bug:
"""
from scipy.sparse.csr import csr_matrix
from sklearn import datasets
from sklearn.linear_model.stochastic_gradient import SGDClassifier
from sklearn.utils import shuffle
import numpy as np
from sklearn.utils.fixes import count_nonzero

np.random.seed(0)

twentyng_data = datasets.fetch_20newsgroups_vectorized(subset='all')
X, y = shuffle(twentyng_data.data, twentyng_data.target)
offset = int(X.shape[0] * 0.5)
X_train, y_train = X[:offset], y[:offset]
X_test, y_test = X[offset:], y[offset:]
X_test = csr_matrix(X_test)
X_train = csr_matrix(X_train)

clf = SGDClassifier()
clf.fit(X_train, y_train)
clf.sparsify()
# uncomment next line to trigger the buggy behavior
#count_nonzero(clf.coef_)
clf.predict(X_test)
"""

Any ideas welcome !

------------------------------------------------------------------------------
Rapidly troubleshoot problems before they affect your business. Most IT 
organizations don't have a clear picture of how application performance 
affects their revenue. With AppDynamics, you get 100% visibility into your 
Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349351&iu=/4140/ostg.clktrk

_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

[Scikit-learn-general] buggy sparsify() / count_nonzero() combination ?

Reply via email to