Hello, I use this code to classify a text:
def classify(self, text): vectorizer = CountVectorizer(vocabulary=self.vocabulary, decode_error='replace', strip_accents='unicode', preprocessor=self.mail_preprocessor, stop_words='english', lowercase=True) transformer = TfidfTransformer() vectors = vectorizer.transform(text) X = transformer.fit_transform(vectors) return self.classifier.predict(X) self.classifier and self.vocabulary has been pickled before and loaded in this session. text is loaded from a file: open("testmail").read(). When I debug into self.mail_preprocessor the text to classify is simply an 'R', though text at the function above is the content of testmail. Fitting with the same preprocessor and similiar code, but with input='filename' to the vectorizer works fine. Why is the text truncated? Thanks! Florian ------------------------------------------------------------------------------ Rapidly troubleshoot problems before they affect your business. Most IT organizations don't have a clear picture of how application performance affects their revenue. With AppDynamics, you get 100% visibility into your Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro! http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general