[Scikit-learn-general] Vectorizer preprocessor gets truncated text

Florian Lindner Sun, 05 Jan 2014 05:27:32 -0800

Hello,

I use this code to classify a text:


    def classify(self, text):
        vectorizer = CountVectorizer(vocabulary=self.vocabulary, 
decode_error='replace', strip_accents='unicode',
                                     preprocessor=self.mail_preprocessor, 
stop_words='english', lowercase=True)
        transformer = TfidfTransformer()
        vectors = vectorizer.transform(text)
        X = transformer.fit_transform(vectors)
        return self.classifier.predict(X)


self.classifier and self.vocabulary has been pickled before and loaded in this 
session. text is loaded from a file: open("testmail").read().

When I debug into self.mail_preprocessor the text to classify is simply an 'R', 
though text at the function above is the content of testmail. Fitting with the 
same preprocessor and similiar code, but with input='filename' to the 
vectorizer works fine.

Why is the text truncated?

Thanks!
Florian

------------------------------------------------------------------------------
Rapidly troubleshoot problems before they affect your business. Most IT 
organizations don't have a clear picture of how application performance 
affects their revenue. With AppDynamics, you get 100% visibility into your 
Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

[Scikit-learn-general] Vectorizer preprocessor gets truncated text

Reply via email to