Hi all. I’m working for text classification to classify Wikipedia documents. I using a word count approach to extract feature from my text so I obtain a big vocabulary that contains all documents word (train dataset) after lemmatization and deleted stop word. Now I have 70000 features. I think that for this problems (word based) is not good to make feature selection (with SVD or PCA). Actual accuracy is 77%.
Do you think that I need to do feature selection to grow up the accuracy? Thank you for answer. Regards. Luigi _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn