2011/10/27 SK Sn <[email protected]>: > Hi all, I was playing around with KFold CV and found I need to transfer an X > (scipy sparse matrix after text vectorization) by todense() in order to work > with Kfold CV using following code: > ---- > for train_index, test_index in kf: > X_train, X_test = X[train_index], X[test_index] > y_train, y_test = y[train_index], y[test_index] > ---- > Here is question 1: Instead of using X_train, X_test = X[train_index], > X[test_index] and working with the todense()ed numpy array X, is there any > other way to use Kfold CV for scipy sparse matrix for text classification > without todense() it?
Use an indices=True keyword argument when constructing the KFold object. scipy.sparse matrices have a different indexing API than NumPy arrays. > To apply classifiers on the densed X, I tried both MultinomialNB > and BernoulliNB along with others, such as LinearSVC, KNeighborsClassifier, > RidgeClassifier. > However, while working on the numpy dense array which is todense()ed from > scipy text parse matrix, both Naive Bayes classifiers get error as shown > below. This problem is reproducible simply add two lines of todense() of X > test and X train from the example. > Why only NB has error here? I cannot tell much from the error message I got. > Would like to learn more. Thanks for your kind help! .todense() converts a scipy.sparse matrix to a numpy.matrix and numpy.matrix input is only supported in very recent development versions of scikit-learn. If you use .toarray() or .A instead, it should work with older versions as well. If you were using a recent dev version, please tell us, because then it's a bug. HTH, -- Lars Buitinck Scientific programmer, ILPS University of Amsterdam ------------------------------------------------------------------------------ The demand for IT networking professionals continues to grow, and the demand for specialized networking skills is growing even more rapidly. Take a complimentary Learning@Cisco Self-Assessment and learn about Cisco certifications, training, and career opportunities. http://p.sf.net/sfu/cisco-dev2dev _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
