[Scikit-learn-general] creating bigrams by respecting the punctuation

ali hürriyetoglu Wed, 15 Jul 2015 03:09:02 -0700

Dear all,

I am using the Tfidfvectorizer to define and extract textual features. The
default setting for creating bigrams is to ignore the punctuation. However
I want to take the punctuation into account. For example, for a text like
 "Today, he learn machine learning", I do not want to have a bigram of
"today he", but just bigrams that have a space in between. Moreover, would
that be generalized for stop words as well? How I can check it?


Thanks for your time.

Greetings,

Ali

------------------------------------------------------------------------------
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/

_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

[Scikit-learn-general] creating bigrams by respecting the punctuation

Reply via email to