Re: [Scikit-learn-general] encoding with TfidfVectorizer

2012-08-12 Thread Andreas Mueller
Hi Zach. I am no expert on the text extraction module but I'm pretty sure your guess is correct and this is a problem with the encoding of the file. You coult use the "charset error" option to just ignore these characters. See the docs here

[Scikit-learn-general] encoding with TfidfVectorizer

2012-08-11 Thread Zach Bastick
TfidfVectorizer is giving me an error on some texts that I am importing. I am importing them like this: for location in humanRatedText: if location[-3:].lower() == 'txt': f = open(dir+location, "r") t = f.read() texts.append(t) f.close() if location[-