Hi, I try to apply the MPLclassifier to a subset (100 data points, 2 classes) of the 20newsgroup dataset. I created (ok, copied) the following pipeline
model_MLP = Pipeline([('vect', CountVectorizer()), ('tfidf', TfidfTransformer()), ('model_MLP', MLPClassifier(solver='lbfgs', alpha=1e-5, hidden_layer_sizes=(5, 2), random_state=1) ) ]) model_MLP.fit(twenty_train.data, twenty_train.target) predicted_MLP = model_MLP.predict(twenty_test.data) print(metrics.classification_report(twenty_test.target, predicted_MLP, target_names=twenty_test.target_names)) The numbers I get are hopeless, precision recall f1-score support alt.atheism 0.00 0.00 0.00 34 sci.electronics 0.66 1.00 0.80 66 The only reason I can think of is that the dictionaries of the training and the test set are not the same (testset: 5204 words, training set: 5402 words). That should not be a problem (if I understand Bayes correctly), but it certainly gives rubbish (see the numbers). The same setup with the SVD routine works great, all values are around .95 thanks, Andreas
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn