On Wed, 30 Nov 2005 13:41:54 -0500 Kent Johnson <[EMAIL PROTECTED]> wrote:
> >>>This is the full error: > >>>Traceback (most recent call last): > >>> File > >>>"C:\Python23\Lib\site-packages\pythonwin\pywin\framework\scriptutils.py", > >>>line 310, in RunScript > >>> exec codeObject in __main__.__dict__ > >>> File "C:\Python\BA\Oversett.py", line 47, in ? > >>> File "C:\Python\BA\Oversett.py", line 23, in kjor > >>> en = i.split('\t')[0] > >>> File "C:\Python23\lib\codecs.py", line 388, in readlines > >>> return self.reader.readlines(sizehint) > >>> File "C:\Python23\lib\codecs.py", line 314, in readlines > >>> return self.decode(data, self.errors)[0].splitlines(1) > >>>UnicodeDecodeError: 'utf8' codec can't decode bytes in position 168-170: > >>>invalid data > > > > > >>This is fairly strange as the line > >> en = i.split('\t')[0] > >>should not call any method in codecs. I don't know how you can get such a > >>stack trace. > > > > The file f where en comes from does contain lots of lines with one english > > word followed by a tab and a norwegian one. (Approximately 25000 lines) It > > can look like this: core\tkjærne > > Yes, I understand that. > > > So en is supposed to be the english word that the program need to find in > > MS Word, and to is the replacement word. So wouldn't that be a string that > > should be handeled by codecs? > > > > for i in self.f.readlines(): > > en = i.split('\t')[0] > > The thing is, it's the line > for i in self.f.readlines(): > that is calling the codecs module, not the line > en = i.split('\t')[0] > but it is the latter line that is in the stack trace. > > Can any of the other tutors make any sense of this stack trace? As far as I see here, isn't the line return self.decode(data, self.errors)[0].splitlines(1) causing the traceback? I haven't read all of this thread, but maybe you are trying to pass a non-utf8 string to the utf8 codec? Michael _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor