I don't think this is an issue directly related to scikit-learn. Your classifier is learning to always predict the majority class. If you do not have good training performance, then you either need more data or your model is in appropriate. You're trying to learn lots of parameters from 100 examples. Use a simpler model. Use stronger regularisation (higher alpha). Work through some tutorials on machine learning diagnostics and modelling choices.
On 13 Jan 2018 3:42 am, "andreas heiner" <ap.hei...@gmail.com> wrote: > Hi, > > I try to apply the MPLclassifier to a subset (100 data points, 2 classes) > of the 20newsgroup dataset. I created (ok, copied) the following pipeline > > model_MLP = Pipeline([('vect', CountVectorizer()), > ('tfidf', TfidfTransformer()), > ('model_MLP', MLPClassifier(solver='lbfgs', > alpha=1e-5, > hidden_layer_sizes=(5, 2), > random_state=1) > ) > ]) > > model_MLP.fit(twenty_train.data, twenty_train.target) > > predicted_MLP = model_MLP.predict(twenty_test.data) > > print(metrics.classification_report(twenty_test.target, predicted_MLP, > target_names=twenty_test.target_names)) > > The numbers I get are hopeless, > > precision recall f1-score support > alt.atheism 0.00 0.00 0.00 34 > sci.electronics 0.66 1.00 0.80 66 > > The only reason I can think of is that the dictionaries of the training > and the test set are not the same (testset: 5204 words, training set: 5402 > words). That should not be a problem (if I understand Bayes correctly), but > it certainly gives rubbish (see the numbers). > > The same setup with the SVD routine works great, all values are around .95 > > thanks, > > Andreas > > > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn > >
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn