>
> The documentation on feature extraction is up-to-date, quite complete
> and has an example snippet to use nltk for text pre-processing
> (lemmatization and tokenization):
>
>
> http://scikit-learn.org/dev/modules/feature_extraction.html#text-feature-extraction
Nice, but two issues:
1. WordNetLemmatizer is a bad example unless you can provide it with a POS
tag.
2. You might want to incorporate heterogenous features in text
classification, e.g. bag of named entities, syntactic dependencies,
document-level attributes, etc.
However this is just for text classification / clustering.
>
Which is all Tom wanted :)
------------------------------------------------------------------------------
This SF.net email is sponsored by Windows:
Build for Windows Store.
http://p.sf.net/sfu/windows-dev2dev
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general