Re: [Scikit-learn-general] Text processing using nltk, sklearn and pandas

Joel Nothman Sun, 07 Jul 2013 03:09:14 -0700

>
> The documentation on feature extraction is up-to-date, quite complete
> and has an example snippet to use nltk for text pre-processing
> (lemmatization and tokenization):
>
>
> http://scikit-learn.org/dev/modules/feature_extraction.html#text-feature-extraction



Nice, but two issues:
1. WordNetLemmatizer is a bad example unless you can provide it with a POS
tag.
2. You might want to incorporate heterogenous features in text
classification, e.g. bag of named entities, syntactic dependencies,
document-level attributes, etc.

However this is just for text classification  / clustering.
>

Which is all Tom wanted :)

------------------------------------------------------------------------------
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev

_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] Text processing using nltk, sklearn and pandas

Reply via email to