2011/10/27 SK Sn <[email protected]>:
> Hi all, I was playing around with KFold CV and found I need to transfer an X
> (scipy sparse matrix after text vectorization) by todense() in order to work
> with Kfold CV using following code:
> ----
> for train_index, test_index in kf:
>     X_train, X_test = X[train_index], X[test_index]
>     y_train, y_test = y[train_index], y[test_index]
> ----
> Here is question 1: Instead of using X_train, X_test = X[train_index],
> X[test_index] and working with the todense()ed numpy array X, is there any
> other way to use Kfold CV for scipy sparse matrix for text classification
> without todense() it?

Use an indices=True keyword argument when constructing the KFold
object. scipy.sparse matrices have a different indexing API than NumPy
arrays.

> To apply classifiers on the densed X, I tried both MultinomialNB
> and BernoulliNB along with others, such as LinearSVC, KNeighborsClassifier,
> RidgeClassifier.
> However, while working on the numpy dense array which is todense()ed from
> scipy text parse matrix, both Naive Bayes classifiers get error as shown
> below. This problem is reproducible simply add two lines of todense() of X
> test and X train from the example.
> Why only NB has error here? I cannot tell much from the error message I got.
> Would like to learn more. Thanks for your kind help!

.todense() converts a scipy.sparse matrix to a numpy.matrix and
numpy.matrix input is only supported in very recent development
versions of scikit-learn. If you use .toarray() or .A instead, it
should work with older versions as well. If you were using a recent
dev version, please tell us, because then it's a bug.

HTH,

-- 
Lars Buitinck
Scientific programmer, ILPS
University of Amsterdam

------------------------------------------------------------------------------
The demand for IT networking professionals continues to grow, and the
demand for specialized networking skills is growing even more rapidly.
Take a complimentary Learning@Cisco Self-Assessment and learn 
about Cisco certifications, training, and career opportunities. 
http://p.sf.net/sfu/cisco-dev2dev
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to